Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not.
Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study.
The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918, when Ronald Fisher created the analysis of variance method.
ANOVA is also called the Fisher analysis of variance, and it is the extension of the t- and z-tests. The term became well-known in 1925, after appearing in Fisher's book, "Statistical Methods for Research Workers."
It was employed in experimental psychology and later expanded to subjects that were more complex.ANOVA (Analysis Of Variance) is a collection of statistical models used to assess the differences between the means of two independent groups by separating the variability into systematic and random factors. It helps to determine the effect of the independent variable on the dependent variable. Here are the three important ANOVA assumptions:
1. Normally distributed population derives different group samples.
2. The sample or distribution has a homogenous variance
3. Analysts draw all the data in a sample independently.
ANOVA test has other secondary assumptions as well, they are:
1. The observations must be independent of each other and randomly sampled.
2. There are additive effects for the factors.
3. The sample size must always be greater than 10.
4. The sample population must be uni-modal as well as symmetrical.
TYPES OF ANOVA
1. One way ANOVA analysis of variance is commonly called a one-factor test in relation to the dependent subject and independent variable. Statisticians utilize it while comparing the means of groups independent of each other using the Analysis of Variance coefficient formula. A single independent variable with at least two levels. The one way Analysis of Variance is quite similar to the t-test.
2 TWO WAY ANOVA
The pre-requisite for conducting a two-way anova test is the presence of two independent variables; one can perform it in two ways โ
Two way ANOVA with replication or repeated measures analysis of variance โ is done when the two independent groups with dependent variables do different tasks.
Two way ANOVA sans replication โ is done when one has a single group that they have to double test like one tests a player before and after a football game
1. 1
GOVERNMENT NAGARJUNA P.G.
COLLEGE OF SCIENCE RAIPUR (C.G.)
M.Sc. MICROBIOLOGY
SEMESTER - 2
2022-23
WRITE UP ON
ANALYSIS OF VARIENCE {ANOVA}
SUBMITTED BY :- SUBMITTED
TO:-
SNEHA AGRAWAL DEPT.OF MICROBIOLOGY
3. 3
ANOVA
INTRODUCTION
Test of significance for โtโ test is used to compare the mean values of
two samples. The difference in the means of more than two samples
or population cannot compared. The statistical technique used to
compare the means of variation of more than two samples for
population is called analysis of variance (ANOVA). ANOVA is based
on two types of variation:
a) Variations existing within the sample
b) Variations existing between the samples.
HISTORY
The concept of ANOVA was introduced by R.A.Fisher.
DEFINITIONS
ANOVA - analysis of variance is the synthesis of several ideas. It is
used for multiple purpose. As a consequence it is difficult to define
ANOVA concisely or. Precisely. ANOVA is a statistical test used to
analyse whether or not the means of several groups are equal.
4. 4
F-value / F โ ratio โthe ratio between two variations is an indication
of sample difference. Therefore, ANOVA helps in estimating whether
more variations exist among the group means or within the groups.
The ratio of these two variation is denoted by โFโ . The F- value or F-
ratio is the indication of difference between the samples.
HYPOTHESIS
It is a definite statement about the population parameters.
1. NULL HYPOTHESIS :- It states that no association exists
between the two cross- tabulated variables in the population,
and therefore the variables are statistically independent.
For Instance, if we compare 2 methods method A and B for its
superiority and if the assumptions is that both methods are
equally good, the this assumptions are said to be null
hypothesis.
โข Hโ=ฮผโ=,ฮผโ=,ฮผโ=โฆ.=ฮผฤธ
โข Where ฮผ=group mean ,K=no. of group
2. ALTERNATIVE HYPOTHESIS :- It says that the 2 variables
are related in the populations. If we assume that from 2
methods A is superior than B then assumptions is called as
alternative hypothesis.
5. 5
Hโ:ฮผโโ ฮผโโ ฮผโโ โฆโ ฮผฮบ
SUM OF SQUARES (SS)
Sum of square is computed as the sum of squares of deviation of
the value for mean of the sample. It means
SS = ๐ฎ(๐ โ ๐โฒ)๐
Or
SS= ๐ฎ๐๐
โ
[๐ฎ๐]๐
๐
Here, SS= represent sum of squares
x = represents mean of sample
x = represents value of variable x
n = represents total number of observation
MEAN SQUARE (MS)
Mean square (MS) is the mean of all the sum of squares. It is
obtained by dividing SS with appropriate degree of freedom(df)
MS = ๐บ๐๐ ๐๐ ๐๐๐๐๐๐/ ๐ ๐๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐๐ =
๐บ๐บ
๐ ๐
DEGREE OF FREEDOM
The degree of freedom is equal to the sum of the individual degrees of
freedom for each sample since each sample has degree of freedom
6. 6
equal to one less than sample size and there are k- sample, the total
degrees of freedom is k less than the total sample size
df = N โ k
F -TEST STATISTIC
It is the ratio , after scaling by the degree of freedom.
๐น =
๐๐๐ต
๐๐๐
ASSUMPTIONS
1. All ANOVA require random sampling
2. The items for analysis of variance should be independent to each
other or mutually exclusive.
3. Equality or homogeneity of variances in a group of sample is
important.
4. The residual components should be normally distributed.
5. The variable of interest for each population or group has a normal
probability distribution.
7. 7
PRINCIPLE OF ANOVA
The basic principle of ANOVA is to test for difference among the
means of the population by examining the amount of variation within
each of these samples, relative to amount of variation between the
samples. In terms of variation within the given population, it is
assumed that the values of (F) differ from the mean of this
population only because of random effects i.e. there are influencers
on (F) which are unexplainable, whereas in examining differences
between the mean of F population and grand means is attributable to
what is called a โspecific factorโ or what is technically described as
treatment effect.
Thus, while using ANOVA, we assume that each of the sample is
drawn from a normal population and that each of this population has
the same variance. We also assume that all factors other than the one
or more being tested are effectively controlled.This, in other words
means that we assume the absence of many factors that might affect
our conclusions concerning that factor to be studied.
8. 8
TYPES OF ANOVA
Observation for analysis of variance are classified according to one
factor analysis or two factor criteria into two groups:-
1.ONE WAY ANALYSIS OF VARIANCE
Simplest type of analysis of variance is known as one way analysis of
variance. In one way ANOVA only one source of variation (or factor)
is investigated. But investigations are carried out in three or more
samples simultaneously. ANOVA is used to test the null hypothesis
that three or more treatments are equally effective.
Examples:- to compare the effectiveness of two or more drugs in
controlling or curing a particular disease.
COMPUTATION OF ONE-WAY ANALYSIS OF
VARIANCE (ANOVA)
For the computation of ANOVA the variance is estimated at two
different levels.
A) POPULATION VARIANCE BETWEEN THE GROUPS.
The variation due to the interaction between the samples, denoted
SSB for sum of squares between groups. If the sample means are
close to each other (and therefore the grand means) this will be
9. 9
small. There are k samples involved with one data volume for each
sample (the sample mean) so there are k โ 1 degrees of freedom.
The variance due to the interaction between the samples, denoted
MSB for mean square between groups. This is the between group
variation divided by its degree of freedom.
B)POPULATION VARIANCE WITHIN THE GROUP
The variation due to differences within the individual samples,
denoted SSW for sum of squares within groups. Each sample is
considered independently, snow interaction between samples is
involved. The degree of freedom is equal to the sum of individual
degrees of freedom for each sample. Since each sample has degree
of freedom equal to one less than their sample size, and there are k
samples, the total degrees of freedom is k less than the total sample
size
df= N โ k
The variance due to the difference within individual samples,
denoted MSW for mean square within groups. This is the within
group variation divided by its degree of freedom.
10. 10
PROCEDURE FOR CALCULATION OF ONE WAY ANOVA
i. Obtain the mean of each sample.
ii. Take out the average mean of the sample means.
iii. Calculate sum of squares for variance between the sample ( or
SS between)
iv. Obtain variance for mean square (MS) between samples.
v. Calculate sum of squares for variance within samples (or SS
within).
vi. Obtain the various or means square (MS) within samples.
vii. Find sum of squares or deviation for total variance
viii. Finally, find F- rrati.
All the calculation for determining variance between
and within the samples or group can be represented in a tabular
form. This table is called ANOVA table.
11. 11
TABLE :- ANOVA TABLE: FOR ONE WAY CALCULATION.
S.No. Source of
variation
Sum of
squares
(SS)
Degree of
freedom
(df)
Mean of
squares
(MS)
Test
statistic
1 Between SSB(SBยฒ) K โ 1
๐ด๐บ๐ฉ
=
๐บ๐บ๐ฉ
๐ โ ๐ ๐ญ =
๐ด๐บ๐พ
๐ด๐บ๐พ
2 Within SSW(SWยฒ) N- k
๐ด๐บ๐พ
=
๐บ๐บ๐พ
๐ต โ ๐
Examples :1
Q) The yield of 3 varieties of wheat A,B,C in 4 separated fields is
show in following table. Test of significance of different in the yields
of 3 varieties of wheat.
12. 12
S.No.
Wheat
variety
Fields
Field 1 Field 2 Field 3 Field 4
1 A 12 18 14 16
2 B 19 17 15 13
3 C 14 16 18 20
STEP 1:- CALCULATION OF SAMPLE MEANOF EACH OF THE
SAMPLE
S.No. Fields Wheat variety
A B C
1 1 12 19 14
2 2 18 17 16
3 3 14 15 18
4 4 16 13 20
ฮฃx1= 60 ฮฃx2 = 64 ฮฃx3 = 68
STEP2 :- ๐ฅ1 = 60/4 = 15
x2 = 64/4 = 16
x3 = 68/4 = 17
STEP :- 3 GRAND SIMPLE MEAN
13. 13
X = ๐ฅ1 + x2 + X3 /3
= 15+16+17/3
=16
STEP 4 :- CALCULATE VARIANCE OF SUM OF SQUARE
BETWEEN THE GROUP.(SSB)
= ฮฃni (X1 โ x)2
= n1(X1 โ x)2
+ n2 (X2 โ x)2
+ n3 (X3 โ x)2
=
4(15-16) 2
+ 4(16-16) 2
+ 4(17-16) 2
= 4+ 0+ 4
= 8
Here, n = total number of observation.
STEP 5:- DEGREE OF FREEDOM
Degree of freedom = k โ 1
= 4 โ 1 = 3
= 3-1 = 2
MSB = SSB / (k โ 1)
= 8 / 2 = 4
14. 14
STEP 6:- CALCULATE THE VARIANCE OF SUM OF SQUARE
WITHIN THE GROUP (SSW).
SSW = (x1โxโ)ยฒ
TABLE :- CALCULATION OF SUM OF SQUARE WITHIN
SAMPLES.
Wheat varietysample 1 Wheat variety
sample 2
Wheat variety
sample 3
X1 (x1 โ x1)ยฒ x2 (x2 โ x2)ยฒ x3 (x3 โ x3)ยฒ
12 (12-15)ยฒ= 9 19 (19-16)ยฒ=9 14 (14-17)ยฒ=9
18 (18 -15)ยฒ=9 17 (17-16)ยฒ=1 16 (16-17)ยฒ=1
14 (14-15)ยฒ=1 15 (15-16)ยฒ=1 18 (18-17)ยฒ=1
16 (16-15)ยฒ=1 1 (13-16)ยฒ=9 20 (20-17)ยฒ=9
Total 20 20 20
SSW = ฮฃ(x1-x1)ยฒ + ฮฃ(x2-x2)ยฒ + ฮฃ(x3-x3)ยฒ
= 20 + 20 + 20
= 60
d.f. = N- k
15. 15
= 12-3
= 9
MEAN SQUARE = SSW + (N-k)
= 60/9
= 6.7
STEP :- 7 ANOVA TABLE: ONE WAY ANOVA TABLE
S.No. Source
of
variation
Sum of
square
Degree
of
freedom
Mean
square
Test
statistic
1 Between
sample
8 2 4 F = 6.7/4
= 1.57
2 Within
sample
60 9 6.7
TOTAL 68 (SST) N-1=11
16. 16
CONCLUSION
Here, F(calculated) is < F(tabulated) at 0.05 (2.l9)
i.e. F(calculated) = 1.67 < 4.26
The value of F is less than 4.26 at 5% level with degree of freedom
being V1 = 2 and V2 = 9 and hence could have arisen due to chance.
There is no significant difference in the yield of three wheat
varieties.
Hence, this analysis supports the the null hypothesis of no difference
in sample mean.
2.TWO WAY ANALYSIS OF VARIANCE
An extension to the one-way analysis of variance. There are two
independent variables. There are three sets of hypothesis with the two
way ANOVA. The first null hypothesis is that there is no interaction
between the two factors. The second null hypothesis is that the
population means of the first factor equal. The third null hypothesis
is that the population means of the second factor are equal.
17. 17
Example:- productivity of several nearby fields influenced by seed
varieties and types of manure used.
COMPUTATION OF TWO WAY ANALYSIS OF VARIANCE
( ANOVA).
The two-way analysis of variance is an extension to the the one-way
analysis of variance. There are two independent variables (hence the
name two way). Two way ANOVA is used to study the impact of two
different factors on a specific variable of the population such as
effect of different varieties of seeds and different types of fertilizers
on crop yield in several nearby field.
ASSUMPTIONS OF TWO WAY ANOVA
i. The two factors under study are independent i.e. they do not
interact.
ii. Variable under study has normal distribution.
iii. Selection of variable and population is at random.
PROCEDURE OF CALCULATION OF TWO WAY ANOVA
18. 18
The total variance for sum of squares of variance in two way
ANOVA includes three parts:-
a. Sum of squares between the column = SSC
b. Sum of square between the rows = SSR
c. Sum of square for the residuals = SSE
Therefore, sum of square of total variance= SST
i. SST = SSC + SSR + SSE , and
T = ฮฃx1+ ฮฃx2 + โฆ + ฮฃxk
ii. Correlation factor (CF) =
๐ป๐
๐ต
iii. SSC =
๐ฎ(๐ฎ๐ฒ๐)๐
๐๐
โ
๐ป
๐ต
Here, nc= number of units in each column
iv. SSR =
๐ฎ(๐ฎ๐ฒ๐)๐
๐๐
โ
๐ป๐
๐ต
v. SST = ฮฃxยฒ1+ ฮฃxยฒ2 + โฆ + ฮฃxยฒkโ
๐ป๐
๐ต
Here, nr= number of units in each row
vi. SSE = SST- (SSC + SSR)
vii. Total number of degree of freedom
(N- 1) = Cr โ 1
d.f between column = C-1
d.f between row. = r- 1
d.f between residuals = (C-1)(r-1)
19. 19
TABLE :- ANOVA TABLE: FOR TWO WAY ANOVA
CALCULATION.
S.No.
Source of
variation
Sum of
squares
SS
Degree
of
freedom
d.f
Mean square
MS
Total
Statistic
1
Between
column
SSC C-1
MSS=SSC+C-
1
๐น =
๐๐๐ถ
๐๐๐ธ
2
Between
Rows
SSR r-1
MSR=SSR+
r-1
๐น =
๐๐๐
๐๐๐ธ
3 Residual SSE
(C-1)(r-
1)
MSE=SSE
Total SST N-1
Example 1
Q. 1) In the the following table the crop yield of three varieties of
crop from 4 different fields is shown, test whether (i) the mean yield
of these varieties are equal and so also, (ii) the quality of field mean.
20. 20
Variety of crops Field
I II III IV
A 9 5 9 7
B 7 9 5 5
C 7 8 4 9
CALCULATION
STEP 1:- Substract 5 from each value and arrange the
data into columns and rows showings fields or blocks
and varieties that represent factors.
Variety of
crop
Fields
I II III IV Total of
rows
A 4 0 4 2 R1= 10
B 2 4 0 0 R2= 6
C 2 3 -1 4 R3= 8
Total of
columns
C1 =8 C2= 7 C3=3 C4=6 R=24
21. 21
Here, T = 24
N = 12
Correlation factor (F) =
๐2
๐
=
(24)2
12
=
576
12
= 48
STEP -2 SSC = SUM IF SQUARE BETWEEN COLUMNS (
VARIETY OF CROP)
=
(๐ด๐ถ1)2
๐๐
+
(๐ด๐ถ2)2
๐๐
+
(๐ด๐ถ3)2
๐๐
+
(๐ด๐ถ4)2
๐๐
โ
(๐)2
๐
=
(8)2
3
+
(7)2
3
+
(3)2
3
+
(6)2
3
โ
(24)2
12
= 21.3 + 16.3 + 3 + 12 โ 48
= 52.6 โ 48
= 4.6
d.f. = (C-1) = 4-1
= 3
STEP :- 3 SSR = SUM OF SQUARE BETWEEN ROWS
23. 23
= (4-1) (3-1)
= 3 ร 2
= 6
CONCLUSION
F(tabulated) is compared with F(calculated) at 5% for (C-1),
(r-1)(C-1) i.e. F(3,6) and for (r-1), (r-1)(C-1) i.e. F(2,6).
F(calculated) for F(3,6) = 0.29
SOURCE OF
VARIATION
SUM OF
SQUARES
DEGREE
OF
FREEDOM
MEAN
SQUARE
(MS)
VARIANC
E RATIO
BETWEEN
THE
COLUMN
(VARIETIES
)
2 2 2/2 = 1.00 F=
5.23/1.00
= 5.23
BETWEEN
THE ROW
(FIELDS)
4.6 3 4.6/3 = 1.53 F=5.23/
1.53 = 3.42
RESIDUAL 31.4 6 31.4/6 = 5.23
TOTAL 38.0 11
24. 24
F(calculated) for F (2,6) = 0.19
And, F(tabulated) for F(3,6)= 4.76
F(tabulated) for F(2,6)= 5.14
F (calculated) < F( tabulated)
Hence, F(calculated) is less than F(tabulated), so the null hypothesis
is accepted.
REFERENCE
๏ท BIOSTATISTICS BOOK BY VEER BALA RASTOGI
๏ท BIOSTATISTICS BOOK BY P.N. ARORA & P.K. MALHAN