1. Extra: (In)dependence
Independent observations/errors
• Assumption of most parametric and non-parametric statistics
• Dependent observations: if the values of observations are related to the
value of other observations
Examples of dependence in data
• Case 1: Groups in your dataset. Observations within a group are not
independent. Group structure should be taken into account
• Case 2: Repeated measures. Observations are repeated over time in
the same plot. Repetition should be taken into account
1
Example: Fictive example of linear regression of species richness on soil pH. We model a linear regression and find a negative effect of soil pH
Example: However, imagine that we have two groups in our data. The green observations are all taken from 1 area with high anthropogenic disturbances (and hence, lower species richness). The blue observations are taken from 1 area with low disturbance (and hence, higher species richness). The samples within a group of observations are not independent. When looking at the residuals of all observations in the linear regression, we will see that the blue observations will have in general positive residuals and the green observations in general negative residuals.
Example: When including group as a covariate in our model, we are actually fitting two lines. If we now look at the residuals, we see no patterns anymore. Independence of errors is achieved (even without independence of observations)
Example: we measured species richness in Meerdaal forest in 1950 and again in 2010. We want to know if there is a significant difference in species richness between the two time periods. The difference between the means of the two groups would be tested with a t-test.
Example: However, imagine that, in 2010, we revisited the plots that were measured in 1950. If we would look at the residuals of the T-test, plots that have positive residuals (high species richness. Eg p1, p2 and p3) in 1950 are more likely to have that in 2010. So again there is no independence of residuals. In this case, we should have to do a paired T-test, which takes into account the non-independence of the data.