Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
11. Linear Models
1.
2.
3.
4. Simple linear regression uses a independent variable
to predict the outcome of a dependent variable. This
is the most basic form of regression, numerous
complex modeling techniques can be learned by
understanding this basic complex.
5. In R several classical statistical models can be
implemented using the function: lm (linear model).
The lm function can be used for simple and multiple
linear regression
> ?lm
6. In R several classical statistical models can be
implemented using the function: lm (linear model).
The lm function can be used for simple and multiple
linear regression
> ?lm
Usage
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE,
qr = TRUE, singular.ok = TRUE, contrasts = NULL,
offset, ...)
7. The first argument in the lm function (formula) is
where you specify the structure of the statistical
model.
Common structure of a statistical model is:
y ~x Simple linear regression of y on x
y ~ x + z Multiple regression of y on x and z
>
https://drive.google.com/file/d/0B4-nNA2Ua3DoUmJMQ3JJTFpzTDg/view?usp=sharing
8. To demonstrate simple linear regression in R, we will
again use the Macedonian Soil Dataset. Here we will
regress Soil Organic Carbon on DEM.
DSM_table <- read.csv("DSM_table2.csv")
9. To demonstrate simple linear regression in R, we will
again use the Macedonian Soil Dataset. Here we will
regress Soil Organic Carbon on DEM.
> head(DSM_table2)
ID ProfID X Y UpperDepth LowerDepth Value Lambda
1 4 P0004 7485085 4653725 0 30 11.878804 0.1
2 7 P0007 7486492 4653203 0 30 3.490205 0.1
3 8 P0008 7485564 4656242 0 30 2.317673 0.1
4 9 P0009 7495075 4652933 0 30 1.936148 0.1
5 10 P0010 7494798 4651945 0 30 1.339719 0.1
6 11 P0011 7492500 4651760 0 30 2.285384 0.1
tsme slp prec dem
1 0.160096433 13 998.034 2327
2 0.002569598 35 1014.300 1986
3 0.002601836 6 779.994 1243
4 0.002841078 25 839.183 1120
5 0.002677120 30 843.919 1098
10. The summary statistics,
> summary(cbind(SOC = DSM_table$Value, Slope =DSM_table2$slp,
Precipitation=DSM_table2$prec, DEM=DSM_table$dem))
SOC Slope Precipitation DEM
Min. : 0.000 Min. : 0.000 Min. : 424.5 Min. : 45.0
1st Qu.: 1.005 1st Qu.: 0.000 1st Qu.: 532.3 1st Qu.: 404.2
Median : 1.493 Median : 3.000 Median : 564.3 Median : 592.0
Mean : 1.912 Mean : 7.414 Mean : 597.5 Mean : 642.3
3rd Qu.: 2.244 3rd Qu.:11.000 3rd Qu.: 641.8 3rd Qu.: 768.0
Max. :50.332 Max. :56.000 Max. :1180.3 Max. :2375.0
NA's :1
11. Our hypothesis here is that elevation is a good
predictor of SOC!?.To start, let’s have a look at what
the data looks like
plot(DSM_table$Value, DSM_table$dem)
12. Our hypothesis here is that elevation is a good
predictor of SOC!?.To start, let’s have a look at what
the data looks like
plot(DSM_table$Value, DSM_table$dem)
13. There appears there is not meaningful. To fit a linear
model, we can use the lm function:
model1 <- lm(Value ~ dem, data=DSM_table, y=TRUE, x = TRUE)
> model1
Call:
lm(formula = Value ~ dem, data = DSM_table, x = TRUE, y = TRUE)
Coefficients:
(Intercept) dem1
0.715117 0.001856
14. > summary(model1)
Call:
lm(formula = Value ~ dem1, data = DSM_table, x = TRUE, y = TRUE)
Residuals:
Min 1Q Median 3Q Max
-3.917 -0.769 -0.224 0.389 48.895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.151e-01 6.929e-02 10.32 <2e-16 ***
dem1 1.856e-03 9.367e-05 19.82 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.954 on 3300 degrees of freedom
Multiple R-squared: 0.1063, Adjusted R-squared: 0.1061
F-statistic: 392.6 on 1 and 3300 DF, p-value: < 2.2e-16
15. > class(model1)
[1] "lm"
the output from the lm function is an object of class
lm. An object of class "lm" is a list containing at least
the following components:
16. > class(model1)
[1] "lm"
the output from the lm function is an object of class
lm. An object of class "lm" is a list containing at least
the following components:
coefficients - a named vector of coefficients
residuals - the residuals, that is response minus fitted values.
fitted.values - the fitted mean values.
rank - the numeric rank of the fitted linear model.
weights - (only for weighted fits) the specified weights.
df.residual -the residual degrees of freedom.
call - the matched call.
terms - the terms object used.
contrasts - (only where relevant) the contrasts used.
xlevels -(only where relevant) a record of the levels of the factors
used in fitting.
offset- the offset used (missing if none were used).
y - if requested, the response used.
x- if requested, the model matrix used.
model - if requested (the default), the model frame used.
na.action - (where relevant) information returned by model.frame on
the special handling of NAs.
18. head(residuals(model1))
1 2 3 4 5 6
6.8445691 -1.2674728 -0.7045624 -0.5960802 -1.1628119 -1.1990168
names(summary(model1))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
Here is a list of what is available from the summary
function for this model:
19. summary(model1)[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.715116901 6.928677e-02 10.32112 1.333183e-24
dem1 0.001856089 9.367314e-05 19.81452 1.188533e-82
summary(model1)[[7]]
[1] 2 3300 2
To extract some of the information from the
summary which is of a list structure, we can use:
24. model2subset <-DSM_table2[, c("Value", "slp", "prec", "dem")]
summary(model2subset)
Value slp prec dem
Min. : 0.000 Min. : 0.000 Min. : 424.5 Min. : 45.0
1st Qu.: 1.005 1st Qu.: 0.000 1st Qu.: 532.3 1st Qu.: 404.2
Median : 1.493 Median : 3.000 Median : 564.3 Median : 592.0
Mean : 1.912 Mean : 7.414 Mean : 597.5 Mean : 642.3
3rd Qu.: 2.244 3rd Qu.:11.000 3rd Qu.: 641.8 3rd Qu.: 768.0
Max. :50.332 Max. :56.000 Max. :1180.3 Max. :2375.0
NA's :1
We will regress SOC on Precipitation,
Slope and Elevation. First lets subset these
data out, then get their summary statistics
25. cor(na.omit(model2subset))
Value slp prec dem
Value 1.0000000 0.2730310 0.3155474 0.3317814
slp 0.2730310 1.0000000 0.5765489 0.6011170
prec 0.3155474 0.5765489 1.0000000 0.8158338
dem 0.3317814 0.6011170 0.8158338 1.0000000
A quick way to look for relationships between
variables in a data frame is with the cor function.
Note the use of the na.omit function.