2. What is ANOVA ?
Analysis of Variance (ANOVA) is a statistical technique,
commonly used to studying differences between two or more group
means.
ANOVA test is centered on the different sources of variation in a
typical variable.
ANOVA in R primarily provides evidence of the existence of the
mean equality between the groups.
This statistical method is an extension of the t-test.
It is used in a situation where the factor variable has more than one
group.
3. One-way ANOVA
There are many situations where you need to compare the mean
between multiple groups. For instance, the marketing department
wants to know if three teams have the same sales performance.
Team: 3 level factor: A, B, and C
Sale: A measure of performance
The ANOVA test can tell if the three groups have similar
performances.
To clarify if the data comes from the same population, you can
perform a one-way analysis of variance
4. Hypothesis in one-way ANOVA test
H0: The means between groups are identical
H3: At least, the mean of one group is different
In other words, the H0 hypothesis implies that there is not enough
evidence to prove the mean of the group (factor) are different from
another.
This test is similar to the t-test, although ANOVA test is
recommended in situation with more than 2 groups.
5. Initials in R
install.packages(c("ggplot2", "ggpubr", "tidyverse", "broom",
"AICcmodavg"))
library(ggplot2)
library(ggpubr)
library(tidyverse)
library(broom)
library(AICcmodavg)
6. Loading Data in R
crop.data <- read.csv("path/to/your/file/crop.data.csv", header = TRUE, colClasses
= c("factor", "factor", "factor", "numeric"))
Specifying within the command whether each of the variables should be
quantitative (“numeric”) or categorical (“factor”).
Summary(crop.data)
read.csv(file.choose(), colClasses = c("factor", "factor", "factor", "numeric") )
8. The model summary first lists the independent variables being tested in the model (in this case
we have only one, ‘fertilizer’) and the model residuals (‘Residual’). All of the variation that is
not explained by the independent variables is called residual variance.
The rest of the values in the output table describe the independent variable and the residuals:
The Df column displays the degrees of freedom for the independent variable (the number of
levels in the variable minus 1), and the degrees of freedom for the residuals (the total number of
observations minus one and minus the number of levels in the independent variables).
The Sum Sq column displays the sum of squares (a.k.a. the total variation between the group
means and the overall mean).
The Mean Sq column is the mean of the sum of squares, calculated by dividing the sum of
squares by the degrees of freedom for each parameter.
The F-value column is the test statistic from the F test. This is the mean square of each
independent variable divided by the mean square of the residuals. The larger the F value, the
more likely it is that the variation caused by the independent variable is real and not due to
chance.
The Pr(>F) column is the p-value of the F-statistic. This shows how likely it is that the F-value
calculated from the test would have occurred if the null hypothesis of no difference among
group means were true.
The p-value of the fertilizer variable is low (p < 0.001), so it appears that the type of fertilizer
used has a real impact on the final crop yield
9. Two way ANOVA
In the two-way ANOVA example, we are modeling crop yield as a function of
type of fertilizer and planting density.
two.way <- aov(yield ~ fertilizer + density, data = crop.data
summary(two.way)
Adding planting density to the model seems to have made the model better: it
reduced the residual variance (the residual sum of squares went from 35.89 to
30.765), and both planting density and fertilizer are statistically significant (p-
values < 0.001)
10. Adding interactions between variables
Sometimes you have reason to think that two of your independent
variables have an interaction effect rather than an additive effect.
For example, in our crop yield experiment, it is possible that
planting density affects the plants’ ability to take up fertilizer. This
might influence the effect of fertilizer type in a way that isn’t
accounted for in the two-way model.
To test whether two variables have an interaction effect in ANOVA,
simply use an asterisk instead of a plus-sign in the model