SlideShare a Scribd company logo
1 of 89
Download to read offline
Introduction to Statistical Modeling
November 2 & 5, 2018
FANR 6750
Richard Chandler and Bob Cooper
Looking ahead
Linear models
Generalized linear models
Model selection and multi-model inference
Motivation Linear models Example Matrix notation 2 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 3 / 51
Motivation
Why do we need this part of the course?
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
• Often, however, it isn’t possible (or even desirable) to
control some aspects of the system being investigated
Motivation Linear models Example Matrix notation 4 / 51
Motivation
Why do we need this part of the course?
• We have been modeling all along
• Good experimental design + ANOVA is usually the most
direct route to causal inference
• Often, however, it isn’t possible (or even desirable) to
control some aspects of the system being investigated
• When manipulative experiments aren’t possible,
observational studies and predictive models can be the
next best option
Motivation Linear models Example Matrix notation 4 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Motivation Linear models Example Matrix notation 5 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Types of models
• Conceptual
• Mathematical
• Statistical
Motivation Linear models Example Matrix notation 5 / 51
What is a model?
Definition
A model is an abstraction of reality used to describe the
relationship between two or more variables
Types of models
• Conceptual
• Mathematical
• Statistical
Important point
“All models are wrong but some are useful” (George Box, 1976)
Motivation Linear models Example Matrix notation 5 / 51
Statistical models
What are they useful for?
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
• Evaulating hypotheses by confronting models with data
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
What are they useful for?
• Formalizing hypotheses using math and probability
• Evaulating hypotheses by confronting models with data
• Predicting future outcomes
Motivation Linear models Example Matrix notation 6 / 51
Statistical models
Two important pieces
(1) Deterministic component
Equation for the expected value of the response
variable
Motivation Linear models Example Matrix notation 7 / 51
Statistical models
Two important pieces
(1) Deterministic component
Equation for the expected value of the response
variable
(2) Stochastic component
Probability distribution describing the differences
between the expected values and the observed values
In parametric statistics, we assume we know the
distribution, but not the parameters of the
distribution
Motivation Linear models Example Matrix notation 7 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 8 / 51
Is this a linear model?
y = 20 + 0.5x
0 2 4 6 8 10
202122232425
x
y
Motivation Linear models Example Matrix notation 9 / 51
Is this a linear model?
y = 20 + 0.5x − 0.3x2
0 2 4 6 8 10
−505101520
x
y
Motivation Linear models Example Matrix notation 10 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
Motivation Linear models Example Matrix notation 11 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
This equation is often expressed in matrix notation as:
y = Xβ + ε
where X is a design matrix and β is a vector of coefficients.
Motivation Linear models Example Matrix notation 11 / 51
Linear model
A linear model is an equation of the form:
yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi
where the β’s are coefficients, and the x values are predictor
variables (or dummy variables for categorical predictors).
This equation is often expressed in matrix notation as:
y = Xβ + ε
where X is a design matrix and β is a vector of coefficients. More
on matrix notation later. . .
Motivation Linear models Example Matrix notation 11 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
A linear model might have dozens of continuous and categorical
predictors variables, with dozens of associated β coefficients.
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
You must be able to interpret the β coefficients for any model that
you fit to your data.
A linear model might have dozens of continuous and categorical
predictors variables, with dozens of associated β coefficients.
Linear models can also include polynomial terms and interactions
between continuous and categorical predictors
Motivation Linear models Example Matrix notation 12 / 51
Interpretting the β’s
The intercept β0 is the expected value of y, when all x’s are 0
Motivation Linear models Example Matrix notation 13 / 51
Interpretting the β’s
The intercept β0 is the expected value of y, when all x’s are 0
If x is a continuous explanatory variable:
• β can usually be interpretted as a slope parameter.
• In this case, β is the change in y resulting from a 1 unit
change in x (while holding the other predictors constant).
Motivation Linear models Example Matrix notation 13 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
In R, the default method for creating dummy variables from
unordered factors works like this:
• One level of the factor is treated as a reference level
• The reference level is associated with the intercept
• The β coefficients for the other levels of the factor are
differences from the reference level.
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Things are more complicated for categorical explantory variables
(i.e., factors) because they must be converted to dummy variables
There are many ways of creating dummy variables
In R, the default method for creating dummy variables from
unordered factors works like this:
• One level of the factor is treated as a reference level
• The reference level is associated with the intercept
• The β coefficients for the other levels of the factor are
differences from the reference level.
The default method corresponds to:
options(contrasts=c("contr.treatment","contr.poly"))
Motivation Linear models Example Matrix notation 14 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
Motivation Linear models Example Matrix notation 15 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
With this method:
• The β associated with each level of the factor is the difference
from the intercept
• The intercept can be interpetted as the grand mean if the
continuous variables have been centered
• One of the levels of the factor will not be displayed because it
is redundant when the intercept is estimated
Motivation Linear models Example Matrix notation 15 / 51
Interpretting β’s for categorical explantory variables
Another common method for creating dummy variables results in
βs that can be interpretted as the α’s from the additive models
that we saw earlier in the class.
With this method:
• The β associated with each level of the factor is the difference
from the intercept
• The intercept can be interpetted as the grand mean if the
continuous variables have been centered
• One of the levels of the factor will not be displayed because it
is redundant when the intercept is estimated
This method corresponds to:
options(contrasts=c("contr.sum","contr.poly"))
Motivation Linear models Example Matrix notation 15 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 16 / 51
Example
The Island
Scrub-Jay
Example
Santa Cruz Island
Christy Beach
Main
Ranch
UC Field
Station
Scorpion
Anchorage
Prisoners’
Harbor
0 2.5 51.25
Kilometers
Elevation, meters
7480
Census Location
Los
Angeles
VenturaSanta Cruz
Island
Santa Barbara
Santa Cruz Data
Habitat data for all 2787 grid cells covering the island
head(cruz2)
## x y elevation forest chaparral habitat seeds
## 1 230736.7 3774324 241 0 0 Oak Low
## 2 231036.7 3774324 323 0 0 Pine Med
## 3 231336.7 3774324 277 0 0 Pine High
## 4 230436.7 3774024 13 0 0 Oak Med
## 5 230736.7 3774024 590 0 0 Oak High
## 6 231036.7 3774024 533 0 0 Oak Low
Motivation Linear models Example Matrix notation 19 / 51
Maps of predictor variables
Elevation
Easting
Northing
500
1000
1500
2000
Motivation Linear models Example Matrix notation 20 / 51
Maps of predictor variables
Forest Cover
0.0
0.2
0.4
0.6
0.8
1.0
Motivation Linear models Example Matrix notation 21 / 51
Questions
(1) How many jays are on the island?
(2) What environmental variables influence abundance?
(3) Can we predict consequences of environmental change?
Motivation Linear models Example Matrix notation 22 / 51
Maps of predictor variables
Chaparral and survey plots
0.0
0.2
0.4
0.6
0.8
1.0
Motivation Linear models Example Matrix notation 23 / 51
The (fake) jay data
head(jayData)
## x y elevation forest chaparral habitat seeds jays
## 2345 258636.7 3764124 423 0.00 0.02 Oak Med 34
## 740 261936.7 3769224 506 0.10 0.45 Oak Med 38
## 2304 246336.7 3764124 859 0.00 0.26 Oak High 40
## 2433 239436.7 3763524 1508 0.02 0.03 Pine Med 43
## 1104 239436.7 3767724 483 0.26 0.37 Oak Med 36
## 607 236436.7 3769524 830 0.00 0.01 Oak Low 39
Motivation Linear models Example Matrix notation 24 / 51
Simple linear regression
fm1 <- lm(jays ~ elevation, data=jayData)
summary(fm1)
##
## Call:
## lm(formula = jays ~ elevation, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4874 -1.7539 0.1566 1.6159 4.6155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.082808 0.453997 72.87 <2e-16 ***
## elevation 0.008337 0.000595 14.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.285 on 98 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.6636
## F-statistic: 196.3 on 1 and 98 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 25 / 51
Simple linear regression
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Motivation Linear models Example Matrix notation 26 / 51
Multiple linear regression
fm2 <- lm(jays ~ elevation+forest, data=jayData)
summary(fm2)
##
## Call:
## lm(formula = jays ~ elevation + forest, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4717 -1.7384 0.1552 1.5993 4.6319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.065994 0.467624 70.711 <2e-16 ***
## elevation 0.008337 0.000598 13.943 <2e-16 ***
## forest 0.294350 1.793079 0.164 0.87
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.296 on 97 degrees of freedom
## Multiple R-squared: 0.6671, Adjusted R-squared: 0.6603
## F-statistic: 97.21 on 2 and 97 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 27 / 51
Multiple linear regression
Forest cover
0.0
0.2
0.4
0.6
0.8
1.0
Elevation
0
500
1000
1500
2000
Expectednumberofjays
30
35
40
45
50
55
Motivation Linear models Example Matrix notation 28 / 51
One-way ANOVA
fm3 <- lm(jays ~ habitat, data=jayData)
summary(fm3)
##
## Call:
## lm(formula = jays ~ habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9143 -2.3684 -0.3684 3.0857 8.6316
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.875 1.356 26.456 <2e-16 ***
## habitatOak 3.493 1.448 2.413 0.0177 *
## habitatPine 2.039 1.503 1.357 0.1780
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.835 on 97 degrees of freedom
## Multiple R-squared: 0.07126, Adjusted R-squared: 0.05211
## F-statistic: 3.721 on 2 and 97 DF, p-value: 0.02773
Motivation Linear models Example Matrix notation 29 / 51
One-way ANOVA
Bare Oak Pine
Habitat
Expectednumberofjays
010203040
Motivation Linear models Example Matrix notation 30 / 51
ANCOVA
fm4 <- lm(jays ~ elevation+habitat, data=jayData)
summary(fm4)
##
## Call:
## lm(formula = jays ~ elevation + habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0327 -1.5356 0.0091 1.4686 4.2391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.072e+01 8.084e-01 37.997 < 2e-16 ***
## elevation 8.289e-03 5.414e-04 15.308 < 2e-16 ***
## habitatOak 3.166e+00 7.850e-01 4.034 0.00011 ***
## habitatPine 1.695e+00 8.148e-01 2.081 0.04010 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.078 on 96 degrees of freedom
## Multiple R-squared: 0.7301, Adjusted R-squared: 0.7217
## F-statistic: 86.56 on 3 and 96 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 31 / 51
ANCOVA
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Oak
Pine
Bare
Motivation Linear models Example Matrix notation 32 / 51
Continuous-categorical interaction
fm5 <- lm(jays ~ elevation*habitat, data=jayData)
summary(fm5)
##
## Call:
## lm(formula = jays ~ elevation * habitat, data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.008 -1.581 -0.103 1.420 4.184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.654383 1.446322 21.886 < 2e-16 ***
## elevation 0.006781 0.001999 3.393 0.00101 **
## habitatOak 2.428682 1.565227 1.552 0.12411
## habitatPine 0.399953 1.579874 0.253 0.80070
## elevation:habitatOak 0.001204 0.002153 0.559 0.57737
## elevation:habitatPine 0.002046 0.002151 0.951 0.34414
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.087 on 94 degrees of freedom
## Multiple R-squared: 0.7334, Adjusted R-squared: 0.7192
## F-statistic: 51.72 on 5 and 94 DF, p-value: < 2.2e-16Motivation Linear models Example Matrix notation 33 / 51
ANCOVA
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Oak
Pine
Bare
Motivation Linear models Example Matrix notation 34 / 51
Quadratic effect of elevation
fm6 <- lm(jays ~ elevation+I(elevation^2), data=jayData)
summary(fm6)
##
## Call:
## lm(formula = jays ~ elevation + I(elevation^2), data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8429 -1.4608 0.1304 1.5908 4.7854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.162e+01 7.631e-01 41.434 < 2e-16 ***
## elevation 1.368e-02 2.342e-03 5.843 6.86e-08 ***
## I(elevation^2) -3.542e-06 1.503e-06 -2.357 0.0204 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.233 on 97 degrees of freedom
## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6786
## F-statistic: 105.5 on 2 and 97 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 35 / 51
Quadratic effect of elevation
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 500 1000 1500
30354045
Elevation (m)
Jays
Motivation Linear models Example Matrix notation 36 / 51
Interaction and quadratic effects
fm7 <- lm(jays ~ habitat * forest + elevation +
I(elevation^2), data=jayData)
summary(fm7)
##
## Call:
## lm(formula = jays ~ habitat * forest + elevation + I(elevation^2),
## data = jayData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2574 -1.4400 0.0487 1.4055 3.7924
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.920e+01 1.030e+00 28.338 < 2e-16 ***
## habitatOak 3.705e+00 8.433e-01 4.394 2.98e-05 ***
## habitatPine 2.216e+00 8.757e-01 2.531 0.0131 *
## forest 4.007e+01 2.780e+01 1.441 0.1529
## elevation 1.215e-02 2.300e-03 5.285 8.41e-07 ***
## I(elevation^2) -2.554e-06 1.484e-06 -1.721 0.0886 .
## habitatOak:forest -4.292e+01 2.785e+01 -1.541 0.1267
## habitatPine:forest -3.918e+01 2.784e+01 -1.407 0.1627
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.044 on 92 degrees of freedom
## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7307
## F-statistic: 39.37 on 7 and 92 DF, p-value: < 2.2e-16
Motivation Linear models Example Matrix notation 37 / 51
Predict jay abundance at each grid cell
E7 <- predict(fm7, type="response", newdata=cruz2,
interval="confidence")
Motivation Linear models Example Matrix notation 38 / 51
Predict jay abundance at each grid cell
E7 <- predict(fm7, type="response", newdata=cruz2,
interval="confidence")
E7 <- cbind(cruz2[,c("x","y")], E7)
head(E7)
## x y fit lwr upr
## 1 230736.7 3774324 35.68349 34.86313 36.50386
## 2 231036.7 3774324 35.07284 34.22917 35.91652
## 3 231336.7 3774324 34.58427 33.72668 35.44186
## 4 230436.7 3774024 33.06042 31.55907 34.56177
## 5 230736.7 3774024 39.18440 38.49766 39.87113
## 6 231036.7 3774024 38.65512 37.98859 39.32165
Motivation Linear models Example Matrix notation 38 / 51
Map the predictions
Expected number of jays per grid cell
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 39 / 51
Map the predictions
Lower CI
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 40 / 51
Map the predictions
Upper CI
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 41 / 51
Future scenarios
What if pine and oak disapper?
Expected number of jays per grid cell
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 42 / 51
Future scenarios
What if pine and oak disapper?
Expected values
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 42 / 51
Future scenarios
What if sea level rises?
Motivation Linear models Example Matrix notation 43 / 51
Future scenarios
What if sea level rises?
Expected values
25 30 35 40 45 50 55
Motivation Linear models Example Matrix notation 43 / 51
Outline
1 Motivation
2 Linear models
3 Example
4 Matrix notation
Motivation Linear models Example Matrix notation 44 / 51
Linear model
All of the fixed effects models that we have covered can be
expressed this way:
y = Xβ + ε
where
ε ∼ Normal(0, σ2
)
Motivation Linear models Example Matrix notation 45 / 51
Linear model
All of the fixed effects models that we have covered can be
expressed this way:
y = Xβ + ε
where
ε ∼ Normal(0, σ2
)
Examples include
• Completely randomized ANOVA
• Randomized complete block designs with fixed block effects
• Factorial designs
• ANCOVA
Motivation Linear models Example Matrix notation 45 / 51
Then how do they differ?
• The design matrices are different
• And so are the number of parameters (coefficients) to be
estimated
• Important to understand how to construct design matrix
that includes categorical variables
Motivation Linear models Example Matrix notation 46 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
In R, the design matrix is created internally based on the formula
that you provide
Motivation Linear models Example Matrix notation 47 / 51
Design matrix
A design matrix has N rows and K columns, where N is the total
sample size and K is the number of coefficients (parameters) to be
estimated.
The first column contains just 1’s. This column corresponds to the
intercept (β0)
Continuous predictor variables appear unchanged in the design
matrix
Categorical predictor variables appear as dummy variables
In R, the design matrix is created internally based on the formula
that you provide
The design matrix can be viewed using the model.matrix
function
Motivation Linear models Example Matrix notation 47 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Motivation Linear models Example Matrix notation 48 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Design matrix
X1 <- model.matrix(~age,
data=dietData)
head(X1, n=10)
## (Intercept) age
## 1 1 11.622260
## 2 1 13.555397
## 3 1 15.357372
## 4 1 7.950214
## 5 1 5.493861
## 6 1 18.874970
## 7 1 12.811297
## 8 1 17.402436
## 9 1 7.379666
## 10 1 18.611817
Motivation Linear models Example Matrix notation 48 / 51
Design matrix for linear regression
Data
dietData <- read.csv("dietData.csv")
head(dietData, n=10)
## weight diet age
## 1 23.83875 Control 11.622260
## 2 25.98799 Control 13.555397
## 3 30.29572 Control 15.357372
## 4 25.88463 Control 7.950214
## 5 18.48077 Control 5.493861
## 6 31.57542 Control 18.874970
## 7 23.79069 Control 12.811297
## 8 29.79574 Control 17.402436
## 9 21.66387 Control 7.379666
## 10 30.86618 Control 18.611817
Design matrix
X1 <- model.matrix(~age,
data=dietData)
head(X1, n=10)
## (Intercept) age
## 1 1 11.622260
## 2 1 13.555397
## 3 1 15.357372
## 4 1 7.950214
## 5 1 5.493861
## 6 1 18.874970
## 7 1 12.811297
## 8 1 17.402436
## 9 1 7.379666
## 10 1 18.611817
How do we multiply this design matrix (X) by the vector of
regression coefficients (β)?
Motivation Linear models Example Matrix notation 48 / 51
Matrix multiplication
E(y) = Xβ
Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ
=


a b c d
e f g h
i j k l

 ×




w
x
y
z




Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ


aw + bx + cy + dz
ew + fx + gy + hz
iw + jx + ky + lz

 =


a b c d
e f g h
i j k l

 ×




w
x
y
z




Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
E(y) = Xβ


aw + bx + cy + dz
ew + fx + gy + hz
iw + jx + ky + lz

 =


a b c d
e f g h
i j k l

 ×




w
x
y
z




In this example
• The first matrix corresponds to the expected values of y
• The second matrix corresponds to the design matrix X
• The third matrix (a column vector) corresponds to β
Motivation Linear models Example Matrix notation 49 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
Motivation Linear models Example Matrix notation 50 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
E(y) = Xβ or yi = β0 + β1xi
Motivation Linear models Example Matrix notation 50 / 51
Matrix multiplication
The vector of coefficients
beta <- coef(lm(weight ~ age, dietData))
beta
## (Intercept) age
## 21.325234 0.518067
E(y) = Xβ or yi = β0 + β1xi
Ey1 <- X1 %*% beta
head(Ey1, 5)
## [,1]
## 1 27.34634
## 2 28.34784
## 3 29.28138
## 4 25.44398
## 5 24.17142
Motivation Linear models Example Matrix notation 50 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
Motivation Linear models Example Matrix notation 51 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
They can be used to model a wide array of biological processes, and
they can be easily extended when their assumptions do not hold
Motivation Linear models Example Matrix notation 51 / 51
Summary
Linear models are the foundation of modern statistical modeling
techniques
They can be used to model a wide array of biological processes, and
they can be easily extended when their assumptions do not hold
One of the most important extensions is to cases where the
residuals are not normally distributed. Generalized linear models
address this issue.
Motivation Linear models Example Matrix notation 51 / 51

More Related Content

What's hot

Supervised Corpus-based Methods for Word Sense Disambiguation
Supervised Corpus-based Methods for Word Sense DisambiguationSupervised Corpus-based Methods for Word Sense Disambiguation
Supervised Corpus-based Methods for Word Sense Disambiguationbutest
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response TheoryNathan Thompson
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation ModelingAzmi Mohd Tamil
 
Introduction to Mediation using SPSS
Introduction to Mediation using SPSSIntroduction to Mediation using SPSS
Introduction to Mediation using SPSSsmackinnon
 
Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine LearningAnkit Tewari
 
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingA Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingOpenThink Labs
 
CABT SHS Statistics & Probability - Estimation of Parameters (intro)
CABT SHS Statistics & Probability -  Estimation of Parameters (intro)CABT SHS Statistics & Probability -  Estimation of Parameters (intro)
CABT SHS Statistics & Probability - Estimation of Parameters (intro)Gilbert Joseph Abueg
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-completeDr Hemant Sharma
 
Chap13 additional topics in regression analysis
Chap13 additional topics in regression analysisChap13 additional topics in regression analysis
Chap13 additional topics in regression analysisJudianto Nugroho
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfCarlo Magno
 

What's hot (18)

Data analysis
Data analysisData analysis
Data analysis
 
Supervised Corpus-based Methods for Word Sense Disambiguation
Supervised Corpus-based Methods for Word Sense DisambiguationSupervised Corpus-based Methods for Word Sense Disambiguation
Supervised Corpus-based Methods for Word Sense Disambiguation
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling
 
Introduction to Mediation using SPSS
Introduction to Mediation using SPSSIntroduction to Mediation using SPSS
Introduction to Mediation using SPSS
 
Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingA Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
CABT SHS Statistics & Probability - Estimation of Parameters (intro)
CABT SHS Statistics & Probability -  Estimation of Parameters (intro)CABT SHS Statistics & Probability -  Estimation of Parameters (intro)
CABT SHS Statistics & Probability - Estimation of Parameters (intro)
 
HS CCSS Math Rational Expressions
HS CCSS Math Rational ExpressionsHS CCSS Math Rational Expressions
HS CCSS Math Rational Expressions
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Chap13 additional topics in regression analysis
Chap13 additional topics in regression analysisChap13 additional topics in regression analysis
Chap13 additional topics in regression analysis
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
Item Response Theory (IRT)
Item Response Theory (IRT)Item Response Theory (IRT)
Item Response Theory (IRT)
 
Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdf
 

Similar to Introduction to statistical modeling in R

Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear RegressionSara Hooker
 
Exploratory
Exploratory Exploratory
Exploratory toby2036
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docxRubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docxdaniely50
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spssDr Nisha Arora
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlationdomsr
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlationdomsr
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6Daria Bogdanova
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)mohamedchaouche
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
Psy-524 - lecture - 23 - SEM -123456.ppt
Psy-524 - lecture - 23 - SEM -123456.pptPsy-524 - lecture - 23 - SEM -123456.ppt
Psy-524 - lecture - 23 - SEM -123456.pptyummyrecipes6688
 

Similar to Introduction to statistical modeling in R (20)

Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
 
Matlab: Regression
Matlab: RegressionMatlab: Regression
Matlab: Regression
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Topic 1.4
Topic 1.4Topic 1.4
Topic 1.4
 
Exploratory
Exploratory Exploratory
Exploratory
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docxRubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
Rubic_Print_FormatCourse CodeClass CodeADM-560ADM-560-O500Your Per.docx
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
 
Recommender system
Recommender systemRecommender system
Recommender system
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Psy-524 - lecture - 23 - SEM -123456.ppt
Psy-524 - lecture - 23 - SEM -123456.pptPsy-524 - lecture - 23 - SEM -123456.ppt
Psy-524 - lecture - 23 - SEM -123456.ppt
 
Gbs1
Gbs1Gbs1
Gbs1
 

More from richardchandler

Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inferencerichardchandler
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Modelsrichardchandler
 
Repeated measures analysis in R
Repeated measures analysis in RRepeated measures analysis in R
Repeated measures analysis in Rrichardchandler
 
Lab on contrasts, estimation, and power
Lab on contrasts, estimation, and powerLab on contrasts, estimation, and power
Lab on contrasts, estimation, and powerrichardchandler
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
 
Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750richardchandler
 
Hierarchichal species distributions model and Maxent
Hierarchichal species distributions model and MaxentHierarchichal species distributions model and Maxent
Hierarchichal species distributions model and Maxentrichardchandler
 
The role of spatial models in applied ecological research
The role of spatial models in applied ecological researchThe role of spatial models in applied ecological research
The role of spatial models in applied ecological researchrichardchandler
 

More from richardchandler (17)

Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inference
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
ANCOVA in R
ANCOVA in RANCOVA in R
ANCOVA in R
 
Repeated measures analysis in R
Repeated measures analysis in RRepeated measures analysis in R
Repeated measures analysis in R
 
Split-plot Designs
Split-plot DesignsSplit-plot Designs
Split-plot Designs
 
Nested Designs
Nested DesignsNested Designs
Nested Designs
 
Factorial designs
Factorial designsFactorial designs
Factorial designs
 
Blocking lab
Blocking labBlocking lab
Blocking lab
 
Assumptions of ANOVA
Assumptions of ANOVAAssumptions of ANOVA
Assumptions of ANOVA
 
Lab on contrasts, estimation, and power
Lab on contrasts, estimation, and powerLab on contrasts, estimation, and power
Lab on contrasts, estimation, and power
 
One-way ANOVA
One-way ANOVAOne-way ANOVA
One-way ANOVA
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750Introduction to R - Lab slides for UGA course FANR 6750
Introduction to R - Lab slides for UGA course FANR 6750
 
Hierarchichal species distributions model and Maxent
Hierarchichal species distributions model and MaxentHierarchichal species distributions model and Maxent
Hierarchichal species distributions model and Maxent
 
Slides from ESA 2015
Slides from ESA 2015Slides from ESA 2015
Slides from ESA 2015
 
The role of spatial models in applied ecological research
The role of spatial models in applied ecological researchThe role of spatial models in applied ecological research
The role of spatial models in applied ecological research
 
2014 ISEC slides
2014 ISEC slides2014 ISEC slides
2014 ISEC slides
 

Recently uploaded

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 

Recently uploaded (20)

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 

Introduction to statistical modeling in R

  • 1. Introduction to Statistical Modeling November 2 & 5, 2018 FANR 6750 Richard Chandler and Bob Cooper
  • 2. Looking ahead Linear models Generalized linear models Model selection and multi-model inference Motivation Linear models Example Matrix notation 2 / 51
  • 3. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 3 / 51
  • 4. Motivation Why do we need this part of the course? Motivation Linear models Example Matrix notation 4 / 51
  • 5. Motivation Why do we need this part of the course? • We have been modeling all along Motivation Linear models Example Matrix notation 4 / 51
  • 6. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference Motivation Linear models Example Matrix notation 4 / 51
  • 7. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference • Often, however, it isn’t possible (or even desirable) to control some aspects of the system being investigated Motivation Linear models Example Matrix notation 4 / 51
  • 8. Motivation Why do we need this part of the course? • We have been modeling all along • Good experimental design + ANOVA is usually the most direct route to causal inference • Often, however, it isn’t possible (or even desirable) to control some aspects of the system being investigated • When manipulative experiments aren’t possible, observational studies and predictive models can be the next best option Motivation Linear models Example Matrix notation 4 / 51
  • 9. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Motivation Linear models Example Matrix notation 5 / 51
  • 10. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Types of models • Conceptual • Mathematical • Statistical Motivation Linear models Example Matrix notation 5 / 51
  • 11. What is a model? Definition A model is an abstraction of reality used to describe the relationship between two or more variables Types of models • Conceptual • Mathematical • Statistical Important point “All models are wrong but some are useful” (George Box, 1976) Motivation Linear models Example Matrix notation 5 / 51
  • 12. Statistical models What are they useful for? Motivation Linear models Example Matrix notation 6 / 51
  • 13. Statistical models What are they useful for? • Formalizing hypotheses using math and probability Motivation Linear models Example Matrix notation 6 / 51
  • 14. Statistical models What are they useful for? • Formalizing hypotheses using math and probability • Evaulating hypotheses by confronting models with data Motivation Linear models Example Matrix notation 6 / 51
  • 15. Statistical models What are they useful for? • Formalizing hypotheses using math and probability • Evaulating hypotheses by confronting models with data • Predicting future outcomes Motivation Linear models Example Matrix notation 6 / 51
  • 16. Statistical models Two important pieces (1) Deterministic component Equation for the expected value of the response variable Motivation Linear models Example Matrix notation 7 / 51
  • 17. Statistical models Two important pieces (1) Deterministic component Equation for the expected value of the response variable (2) Stochastic component Probability distribution describing the differences between the expected values and the observed values In parametric statistics, we assume we know the distribution, but not the parameters of the distribution Motivation Linear models Example Matrix notation 7 / 51
  • 18. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 8 / 51
  • 19. Is this a linear model? y = 20 + 0.5x 0 2 4 6 8 10 202122232425 x y Motivation Linear models Example Matrix notation 9 / 51
  • 20. Is this a linear model? y = 20 + 0.5x − 0.3x2 0 2 4 6 8 10 −505101520 x y Motivation Linear models Example Matrix notation 10 / 51
  • 21. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). Motivation Linear models Example Matrix notation 11 / 51
  • 22. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). This equation is often expressed in matrix notation as: y = Xβ + ε where X is a design matrix and β is a vector of coefficients. Motivation Linear models Example Matrix notation 11 / 51
  • 23. Linear model A linear model is an equation of the form: yi = β0 + β1xi1 + β2xi2 + . . . + βpxip + εi where the β’s are coefficients, and the x values are predictor variables (or dummy variables for categorical predictors). This equation is often expressed in matrix notation as: y = Xβ + ε where X is a design matrix and β is a vector of coefficients. More on matrix notation later. . . Motivation Linear models Example Matrix notation 11 / 51
  • 24. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. Motivation Linear models Example Matrix notation 12 / 51
  • 25. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. A linear model might have dozens of continuous and categorical predictors variables, with dozens of associated β coefficients. Motivation Linear models Example Matrix notation 12 / 51
  • 26. Interpretting the β’s You must be able to interpret the β coefficients for any model that you fit to your data. A linear model might have dozens of continuous and categorical predictors variables, with dozens of associated β coefficients. Linear models can also include polynomial terms and interactions between continuous and categorical predictors Motivation Linear models Example Matrix notation 12 / 51
  • 27. Interpretting the β’s The intercept β0 is the expected value of y, when all x’s are 0 Motivation Linear models Example Matrix notation 13 / 51
  • 28. Interpretting the β’s The intercept β0 is the expected value of y, when all x’s are 0 If x is a continuous explanatory variable: • β can usually be interpretted as a slope parameter. • In this case, β is the change in y resulting from a 1 unit change in x (while holding the other predictors constant). Motivation Linear models Example Matrix notation 13 / 51
  • 29. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables Motivation Linear models Example Matrix notation 14 / 51
  • 30. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables Motivation Linear models Example Matrix notation 14 / 51
  • 31. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables In R, the default method for creating dummy variables from unordered factors works like this: • One level of the factor is treated as a reference level • The reference level is associated with the intercept • The β coefficients for the other levels of the factor are differences from the reference level. Motivation Linear models Example Matrix notation 14 / 51
  • 32. Interpretting β’s for categorical explantory variables Things are more complicated for categorical explantory variables (i.e., factors) because they must be converted to dummy variables There are many ways of creating dummy variables In R, the default method for creating dummy variables from unordered factors works like this: • One level of the factor is treated as a reference level • The reference level is associated with the intercept • The β coefficients for the other levels of the factor are differences from the reference level. The default method corresponds to: options(contrasts=c("contr.treatment","contr.poly")) Motivation Linear models Example Matrix notation 14 / 51
  • 33. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. Motivation Linear models Example Matrix notation 15 / 51
  • 34. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. With this method: • The β associated with each level of the factor is the difference from the intercept • The intercept can be interpetted as the grand mean if the continuous variables have been centered • One of the levels of the factor will not be displayed because it is redundant when the intercept is estimated Motivation Linear models Example Matrix notation 15 / 51
  • 35. Interpretting β’s for categorical explantory variables Another common method for creating dummy variables results in βs that can be interpretted as the α’s from the additive models that we saw earlier in the class. With this method: • The β associated with each level of the factor is the difference from the intercept • The intercept can be interpetted as the grand mean if the continuous variables have been centered • One of the levels of the factor will not be displayed because it is redundant when the intercept is estimated This method corresponds to: options(contrasts=c("contr.sum","contr.poly")) Motivation Linear models Example Matrix notation 15 / 51
  • 36. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 16 / 51
  • 38. Example Santa Cruz Island Christy Beach Main Ranch UC Field Station Scorpion Anchorage Prisoners’ Harbor 0 2.5 51.25 Kilometers Elevation, meters 7480 Census Location Los Angeles VenturaSanta Cruz Island Santa Barbara
  • 39. Santa Cruz Data Habitat data for all 2787 grid cells covering the island head(cruz2) ## x y elevation forest chaparral habitat seeds ## 1 230736.7 3774324 241 0 0 Oak Low ## 2 231036.7 3774324 323 0 0 Pine Med ## 3 231336.7 3774324 277 0 0 Pine High ## 4 230436.7 3774024 13 0 0 Oak Med ## 5 230736.7 3774024 590 0 0 Oak High ## 6 231036.7 3774024 533 0 0 Oak Low Motivation Linear models Example Matrix notation 19 / 51
  • 40. Maps of predictor variables Elevation Easting Northing 500 1000 1500 2000 Motivation Linear models Example Matrix notation 20 / 51
  • 41. Maps of predictor variables Forest Cover 0.0 0.2 0.4 0.6 0.8 1.0 Motivation Linear models Example Matrix notation 21 / 51
  • 42. Questions (1) How many jays are on the island? (2) What environmental variables influence abundance? (3) Can we predict consequences of environmental change? Motivation Linear models Example Matrix notation 22 / 51
  • 43. Maps of predictor variables Chaparral and survey plots 0.0 0.2 0.4 0.6 0.8 1.0 Motivation Linear models Example Matrix notation 23 / 51
  • 44. The (fake) jay data head(jayData) ## x y elevation forest chaparral habitat seeds jays ## 2345 258636.7 3764124 423 0.00 0.02 Oak Med 34 ## 740 261936.7 3769224 506 0.10 0.45 Oak Med 38 ## 2304 246336.7 3764124 859 0.00 0.26 Oak High 40 ## 2433 239436.7 3763524 1508 0.02 0.03 Pine Med 43 ## 1104 239436.7 3767724 483 0.26 0.37 Oak Med 36 ## 607 236436.7 3769524 830 0.00 0.01 Oak Low 39 Motivation Linear models Example Matrix notation 24 / 51
  • 45. Simple linear regression fm1 <- lm(jays ~ elevation, data=jayData) summary(fm1) ## ## Call: ## lm(formula = jays ~ elevation, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.4874 -1.7539 0.1566 1.6159 4.6155 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 33.082808 0.453997 72.87 <2e-16 *** ## elevation 0.008337 0.000595 14.01 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.285 on 98 degrees of freedom ## Multiple R-squared: 0.667, Adjusted R-squared: 0.6636 ## F-statistic: 196.3 on 1 and 98 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 25 / 51
  • 46. Simple linear regression q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 500 1000 1500 30354045 Elevation (m) Jays Motivation Linear models Example Matrix notation 26 / 51
  • 47. Multiple linear regression fm2 <- lm(jays ~ elevation+forest, data=jayData) summary(fm2) ## ## Call: ## lm(formula = jays ~ elevation + forest, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.4717 -1.7384 0.1552 1.5993 4.6319 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 33.065994 0.467624 70.711 <2e-16 *** ## elevation 0.008337 0.000598 13.943 <2e-16 *** ## forest 0.294350 1.793079 0.164 0.87 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.296 on 97 degrees of freedom ## Multiple R-squared: 0.6671, Adjusted R-squared: 0.6603 ## F-statistic: 97.21 on 2 and 97 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 27 / 51
  • 48. Multiple linear regression Forest cover 0.0 0.2 0.4 0.6 0.8 1.0 Elevation 0 500 1000 1500 2000 Expectednumberofjays 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 28 / 51
  • 49. One-way ANOVA fm3 <- lm(jays ~ habitat, data=jayData) summary(fm3) ## ## Call: ## lm(formula = jays ~ habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -7.9143 -2.3684 -0.3684 3.0857 8.6316 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 35.875 1.356 26.456 <2e-16 *** ## habitatOak 3.493 1.448 2.413 0.0177 * ## habitatPine 2.039 1.503 1.357 0.1780 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.835 on 97 degrees of freedom ## Multiple R-squared: 0.07126, Adjusted R-squared: 0.05211 ## F-statistic: 3.721 on 2 and 97 DF, p-value: 0.02773 Motivation Linear models Example Matrix notation 29 / 51
  • 50. One-way ANOVA Bare Oak Pine Habitat Expectednumberofjays 010203040 Motivation Linear models Example Matrix notation 30 / 51
  • 51. ANCOVA fm4 <- lm(jays ~ elevation+habitat, data=jayData) summary(fm4) ## ## Call: ## lm(formula = jays ~ elevation + habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.0327 -1.5356 0.0091 1.4686 4.2391 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.072e+01 8.084e-01 37.997 < 2e-16 *** ## elevation 8.289e-03 5.414e-04 15.308 < 2e-16 *** ## habitatOak 3.166e+00 7.850e-01 4.034 0.00011 *** ## habitatPine 1.695e+00 8.148e-01 2.081 0.04010 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.078 on 96 degrees of freedom ## Multiple R-squared: 0.7301, Adjusted R-squared: 0.7217 ## F-statistic: 86.56 on 3 and 96 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 31 / 51
  • 53. Continuous-categorical interaction fm5 <- lm(jays ~ elevation*habitat, data=jayData) summary(fm5) ## ## Call: ## lm(formula = jays ~ elevation * habitat, data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.008 -1.581 -0.103 1.420 4.184 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 31.654383 1.446322 21.886 < 2e-16 *** ## elevation 0.006781 0.001999 3.393 0.00101 ** ## habitatOak 2.428682 1.565227 1.552 0.12411 ## habitatPine 0.399953 1.579874 0.253 0.80070 ## elevation:habitatOak 0.001204 0.002153 0.559 0.57737 ## elevation:habitatPine 0.002046 0.002151 0.951 0.34414 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.087 on 94 degrees of freedom ## Multiple R-squared: 0.7334, Adjusted R-squared: 0.7192 ## F-statistic: 51.72 on 5 and 94 DF, p-value: < 2.2e-16Motivation Linear models Example Matrix notation 33 / 51
  • 55. Quadratic effect of elevation fm6 <- lm(jays ~ elevation+I(elevation^2), data=jayData) summary(fm6) ## ## Call: ## lm(formula = jays ~ elevation + I(elevation^2), data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.8429 -1.4608 0.1304 1.5908 4.7854 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.162e+01 7.631e-01 41.434 < 2e-16 *** ## elevation 1.368e-02 2.342e-03 5.843 6.86e-08 *** ## I(elevation^2) -3.542e-06 1.503e-06 -2.357 0.0204 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.233 on 97 degrees of freedom ## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6786 ## F-statistic: 105.5 on 2 and 97 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 35 / 51
  • 56. Quadratic effect of elevation q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 500 1000 1500 30354045 Elevation (m) Jays Motivation Linear models Example Matrix notation 36 / 51
  • 57. Interaction and quadratic effects fm7 <- lm(jays ~ habitat * forest + elevation + I(elevation^2), data=jayData) summary(fm7) ## ## Call: ## lm(formula = jays ~ habitat * forest + elevation + I(elevation^2), ## data = jayData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.2574 -1.4400 0.0487 1.4055 3.7924 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.920e+01 1.030e+00 28.338 < 2e-16 *** ## habitatOak 3.705e+00 8.433e-01 4.394 2.98e-05 *** ## habitatPine 2.216e+00 8.757e-01 2.531 0.0131 * ## forest 4.007e+01 2.780e+01 1.441 0.1529 ## elevation 1.215e-02 2.300e-03 5.285 8.41e-07 *** ## I(elevation^2) -2.554e-06 1.484e-06 -1.721 0.0886 . ## habitatOak:forest -4.292e+01 2.785e+01 -1.541 0.1267 ## habitatPine:forest -3.918e+01 2.784e+01 -1.407 0.1627 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.044 on 92 degrees of freedom ## Multiple R-squared: 0.7497, Adjusted R-squared: 0.7307 ## F-statistic: 39.37 on 7 and 92 DF, p-value: < 2.2e-16 Motivation Linear models Example Matrix notation 37 / 51
  • 58. Predict jay abundance at each grid cell E7 <- predict(fm7, type="response", newdata=cruz2, interval="confidence") Motivation Linear models Example Matrix notation 38 / 51
  • 59. Predict jay abundance at each grid cell E7 <- predict(fm7, type="response", newdata=cruz2, interval="confidence") E7 <- cbind(cruz2[,c("x","y")], E7) head(E7) ## x y fit lwr upr ## 1 230736.7 3774324 35.68349 34.86313 36.50386 ## 2 231036.7 3774324 35.07284 34.22917 35.91652 ## 3 231336.7 3774324 34.58427 33.72668 35.44186 ## 4 230436.7 3774024 33.06042 31.55907 34.56177 ## 5 230736.7 3774024 39.18440 38.49766 39.87113 ## 6 231036.7 3774024 38.65512 37.98859 39.32165 Motivation Linear models Example Matrix notation 38 / 51
  • 60. Map the predictions Expected number of jays per grid cell 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 39 / 51
  • 61. Map the predictions Lower CI 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 40 / 51
  • 62. Map the predictions Upper CI 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 41 / 51
  • 63. Future scenarios What if pine and oak disapper? Expected number of jays per grid cell 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 42 / 51
  • 64. Future scenarios What if pine and oak disapper? Expected values 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 42 / 51
  • 65. Future scenarios What if sea level rises? Motivation Linear models Example Matrix notation 43 / 51
  • 66. Future scenarios What if sea level rises? Expected values 25 30 35 40 45 50 55 Motivation Linear models Example Matrix notation 43 / 51
  • 67. Outline 1 Motivation 2 Linear models 3 Example 4 Matrix notation Motivation Linear models Example Matrix notation 44 / 51
  • 68. Linear model All of the fixed effects models that we have covered can be expressed this way: y = Xβ + ε where ε ∼ Normal(0, σ2 ) Motivation Linear models Example Matrix notation 45 / 51
  • 69. Linear model All of the fixed effects models that we have covered can be expressed this way: y = Xβ + ε where ε ∼ Normal(0, σ2 ) Examples include • Completely randomized ANOVA • Randomized complete block designs with fixed block effects • Factorial designs • ANCOVA Motivation Linear models Example Matrix notation 45 / 51
  • 70. Then how do they differ? • The design matrices are different • And so are the number of parameters (coefficients) to be estimated • Important to understand how to construct design matrix that includes categorical variables Motivation Linear models Example Matrix notation 46 / 51
  • 71. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. Motivation Linear models Example Matrix notation 47 / 51
  • 72. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Motivation Linear models Example Matrix notation 47 / 51
  • 73. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Motivation Linear models Example Matrix notation 47 / 51
  • 74. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables Motivation Linear models Example Matrix notation 47 / 51
  • 75. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables In R, the design matrix is created internally based on the formula that you provide Motivation Linear models Example Matrix notation 47 / 51
  • 76. Design matrix A design matrix has N rows and K columns, where N is the total sample size and K is the number of coefficients (parameters) to be estimated. The first column contains just 1’s. This column corresponds to the intercept (β0) Continuous predictor variables appear unchanged in the design matrix Categorical predictor variables appear as dummy variables In R, the design matrix is created internally based on the formula that you provide The design matrix can be viewed using the model.matrix function Motivation Linear models Example Matrix notation 47 / 51
  • 77. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Motivation Linear models Example Matrix notation 48 / 51
  • 78. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Design matrix X1 <- model.matrix(~age, data=dietData) head(X1, n=10) ## (Intercept) age ## 1 1 11.622260 ## 2 1 13.555397 ## 3 1 15.357372 ## 4 1 7.950214 ## 5 1 5.493861 ## 6 1 18.874970 ## 7 1 12.811297 ## 8 1 17.402436 ## 9 1 7.379666 ## 10 1 18.611817 Motivation Linear models Example Matrix notation 48 / 51
  • 79. Design matrix for linear regression Data dietData <- read.csv("dietData.csv") head(dietData, n=10) ## weight diet age ## 1 23.83875 Control 11.622260 ## 2 25.98799 Control 13.555397 ## 3 30.29572 Control 15.357372 ## 4 25.88463 Control 7.950214 ## 5 18.48077 Control 5.493861 ## 6 31.57542 Control 18.874970 ## 7 23.79069 Control 12.811297 ## 8 29.79574 Control 17.402436 ## 9 21.66387 Control 7.379666 ## 10 30.86618 Control 18.611817 Design matrix X1 <- model.matrix(~age, data=dietData) head(X1, n=10) ## (Intercept) age ## 1 1 11.622260 ## 2 1 13.555397 ## 3 1 15.357372 ## 4 1 7.950214 ## 5 1 5.493861 ## 6 1 18.874970 ## 7 1 12.811297 ## 8 1 17.402436 ## 9 1 7.379666 ## 10 1 18.611817 How do we multiply this design matrix (X) by the vector of regression coefficients (β)? Motivation Linear models Example Matrix notation 48 / 51
  • 80. Matrix multiplication E(y) = Xβ Motivation Linear models Example Matrix notation 49 / 51
  • 81. Matrix multiplication E(y) = Xβ =   a b c d e f g h i j k l   ×     w x y z     Motivation Linear models Example Matrix notation 49 / 51
  • 82. Matrix multiplication E(y) = Xβ   aw + bx + cy + dz ew + fx + gy + hz iw + jx + ky + lz   =   a b c d e f g h i j k l   ×     w x y z     Motivation Linear models Example Matrix notation 49 / 51
  • 83. Matrix multiplication E(y) = Xβ   aw + bx + cy + dz ew + fx + gy + hz iw + jx + ky + lz   =   a b c d e f g h i j k l   ×     w x y z     In this example • The first matrix corresponds to the expected values of y • The second matrix corresponds to the design matrix X • The third matrix (a column vector) corresponds to β Motivation Linear models Example Matrix notation 49 / 51
  • 84. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 Motivation Linear models Example Matrix notation 50 / 51
  • 85. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 E(y) = Xβ or yi = β0 + β1xi Motivation Linear models Example Matrix notation 50 / 51
  • 86. Matrix multiplication The vector of coefficients beta <- coef(lm(weight ~ age, dietData)) beta ## (Intercept) age ## 21.325234 0.518067 E(y) = Xβ or yi = β0 + β1xi Ey1 <- X1 %*% beta head(Ey1, 5) ## [,1] ## 1 27.34634 ## 2 28.34784 ## 3 29.28138 ## 4 25.44398 ## 5 24.17142 Motivation Linear models Example Matrix notation 50 / 51
  • 87. Summary Linear models are the foundation of modern statistical modeling techniques Motivation Linear models Example Matrix notation 51 / 51
  • 88. Summary Linear models are the foundation of modern statistical modeling techniques They can be used to model a wide array of biological processes, and they can be easily extended when their assumptions do not hold Motivation Linear models Example Matrix notation 51 / 51
  • 89. Summary Linear models are the foundation of modern statistical modeling techniques They can be used to model a wide array of biological processes, and they can be easily extended when their assumptions do not hold One of the most important extensions is to cases where the residuals are not normally distributed. Generalized linear models address this issue. Motivation Linear models Example Matrix notation 51 / 51