# Kolmogorov Smirnov good-of-fit test

Kolmogorov Smirnov good-of-fit test. A description of KS one sample and two sample tests.

Kolmogorov-Smirnov test Dr. S. A. Rizwan, M.D., Public Health Specialist, Saudi Board of Community Medicine, Riyadh, Kingdom of Saudi Arabia. Nov 2019 1
Outline • Introduction • Uses • Advantages and disadvantages • Example • Take home messages Nov 2019 2
Introduction • KS test belongs a group of tests called goodness of fit tests • Other members are chi square test and Anderson-Darling test • Goodness of fit tests check whether there is a significant difference between an observed frequency distribution and a given theoretical (expected) frequency distribution • Similar to what the Chi-Square test does, but the K-S test has several advantages: – More powerful test. – Easier to compute and use, as no grouping of data is required. – The test statistic is independent of the expected frequency distribution. – It only depends on the sample size n. Nov 2019 3
Andrey Kolmogorov, Nikolai Smirnov Nov 2019 4
Introduction • The hypotheses: – H0: The observed frequency distribution is consistent with the theoretical frequency distribution (Good fit). – H1: The observed frequency distribution is not consistent with the theoretical frequency distribution (Bad fit). – α = Level of significance of the test. • Here we use the cumulative probability distribution (CDF) of observed and theoretical frequencies. Nov 2019 5
Introduction • The basis of the test is that it relates the distance between the cumulative fraction functions of the two samples as a number, D, which is then compared to the critical-D value for that data distribution. Nov 2019 6
K-S test statistic • Here, – Fe = the expected relative cumulative frequencies (CDF). – Fo = the observed relative cumulative frequencies (CDF). • If the gap between Fe and Fo is large then Ho should be rejected. • A K-S test is a one tailed test. • The critical values of Dn have been tabulated and can be found from the K- S table for corresponding levels of significance and sample size n. • The calculated value of Dn is compared with the critical value of Dn. • If the calculated value > critical value, then reject H0. Nov 2019 7
K-S tables Nov 2019 8
Uses of K-S test • We usually use this test to check the normality assumption. • This test is used to test goodness of fit or shape and can be used instead of the chi square test (one-sample K-S test). • It simply needs the maximum difference between two cumulative distribution curves. • It can either compare a sample with a reference probability distribution or it can directly compare two sample datasets (two-sample K-S test). Nov 2019 9
The KS test Nov 2019 10
Steps in K-S test • Sort data from smallest to largest. • Convert the data distribution to a cumulative distribution S(x). • Plot the cumulative distribution along with the comparison distribution. • Find the maximum absolute difference (D value). • If D is greater than critical-D, then it can be concluded that the distributions are indeed different, otherwise there is not enough evidence to prove difference between the two datasets. • A P-value can also be calculated from the D-value and the sample size of the two data sets Nov 2019 11
Strengths of the K-S test • It is nonparametric. • D-value result will not change if X values are transformed to logs or reciprocals or any other transformation. • No restriction on sample size. • The D-value is easy to compute and the graph can be understood easily. • One sample K-S test can serve as a goodness-of-fit test and can link data and theory. Nov 2019 12
Drawbacks • The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. • It works best only when EDFs deviate the most near the center of the distribution. • The situation in which normality tests are needed -small sample sizes- is also the situation in which they perform poorly Nov 2019 13
Example 1 • H0: The distn. is normal with μ= 6.80, σ= 1.24. • H1: Above not true. • We obtain Fe values from the normal table, z= (X- μ)/ σ. • The calculated value of Dn is the maximum value in the | Fe - Fo | column. Thus, 0.0373. • For the predecided level of significance, Dcritical = 0.0416. • Dn < Dcritical , so accept H0 and conclude that it is a good fit. Nov 2019 14
Example 2 Nov 2019 15
Example 3 Nov 2019 16
Take home messages • KS test belongs a group of tests called goodness of fit tests • Used to check normality assumption • The basis of the test is that it relates the distance between the cumulative fraction functions of the two samples as "D" • More powerful test than chi square test • Less sensitive when the differences between curves is greatest at the beginning or the end of the distributions • SPSS tutorial https://www.spss-tutorials.com/spss- kolmogorov-smirnov-test-for-normality/ Nov 2019 17
THANK YOU Kindly email your queries to sarizwan1986@outlook.com Nov 2019 18