3. Introduction
3
Named after Andrey Kolmogorov and Nikolai Smirnov
Used in situations where a comparison has to be made between an
observed sample distribution and theoretical distribution
A nonparametric test of the equality of continuous data
A very efficient way to determine if two samples are significantly different
from each other
Type of data: Continuous data, Univariate(univariate refers to an expression,
equation, function or polynomial of only one variable)
Null hypothesis: Two samples drawn from the same distribution
4. One Sample Test
4
Formula:
D=Maximum|F0(X)−Fr(X)|
Where :
F0(X) = Observed cumulative frequency distribution of a random sample
of n observations.
F0(X)=k/n = (No.of observations ≤ X)/(Total no.of observations)
Fr(X) = The theoretical frequency distribution.
Acceptance Criteria: If calculated value is less than critical value accept
null hypothesis.
Rejection Criteria: If calculated value is greater than table value reject
null hypothesis.
5. Example:
5
Streams No. of students interested in
joining
FO(X) FT(X) |FO(X)−FT(X)|
Observed(O) Theoretical (T)
B.Sc. 5 12 5/60 12/60 7/60
B.A. 9 12 14/60 24/60 10/60
B.COM. 11 12 25/60 36/60 11/60
M.A. 16 12 41/60 48/60 7/60
M.COM. 19 12 60/40 60/60 60/60
Total n=60
Source: Tutorialspoint[4]
6. Example contd...
6
Test statistic |D| is calculated as:
D = Maximum|F0(X)−FT(X)| = 11/60 = 0.183
The table value of D at 5% significance level is given by
D0 at .05=1.36/√n = 1.36/√60 = 0.175
Calculated value is greater than the critical value
We reject the null hypothesis and conclude that there is a difference
among students of different streams in their intention of joining the Club.
7. Two Sample Test
7
May also be used to test whether two underlying one-dimensional
probability distributions differ.
The Kolmogorov–Smirnov statistic is:
D = Maximum|F1,n(X)−F2,n(X)|
where,
F1,n(X) = the empirical distribution functions of the first sample
F2,n(X) = the empirical distribution functions of the second sample
The null hypothesis is rejected at level α if
Dn,m > c(α) √((n+m)/(n-m))
where,
n and m are the sizes of first and second samples respectively
The value of c(α) is given by :