Successfully reported this slideshow.
Upcoming SlideShare
×

# Kolmogorov Smirnov good-of-fit test

Kolmogorov Smirnov good-of-fit test. A description of KS one sample and two sample tests.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Kolmogorov Smirnov good-of-fit test

1. 1. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Kolmogorov-Smirnov test Dr. S. A. Rizwan, M.D., Public Health Specialist, Saudi Board of Community Medicine, Riyadh, Kingdom of Saudi Arabia. Nov 2019 1
2. 2. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Outline • Introduction • Uses • Advantages and disadvantages • Example • Take home messages Nov 2019 2
3. 3. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Introduction • KS test belongs a group of tests called goodness of fit tests • Other members are chi square test and Anderson-Darling test • Goodness of fit tests check whether there is a significant difference between an observed frequency distribution and a given theoretical (expected) frequency distribution • Similar to what the Chi-Square test does, but the K-S test has several advantages: – More powerful test. – Easier to compute and use, as no grouping of data is required. – The test statistic is independent of the expected frequency distribution. – It only depends on the sample size n. Nov 2019 3
4. 4. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Andrey Kolmogorov, Nikolai Smirnov Nov 2019 4
5. 5. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Introduction • The hypotheses: – H0: The observed frequency distribution is consistent with the theoretical frequency distribution (Good fit). – H1: The observed frequency distribution is not consistent with the theoretical frequency distribution (Bad fit). – α = Level of significance of the test. • Here we use the cumulative probability distribution (CDF) of observed and theoretical frequencies. Nov 2019 5
6. 6. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Introduction • The basis of the test is that it relates the distance between the cumulative fraction functions of the two samples as a number, D, which is then compared to the critical-D value for that data distribution. Nov 2019 6
7. 7. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course K-S test statistic • Here, – Fe = the expected relative cumulative frequencies (CDF). – Fo = the observed relative cumulative frequencies (CDF). • If the gap between Fe and Fo is large then Ho should be rejected. • A K-S test is a one tailed test. • The critical values of Dn have been tabulated and can be found from the K- S table for corresponding levels of significance and sample size n. • The calculated value of Dn is compared with the critical value of Dn. • If the calculated value > critical value, then reject H0. Nov 2019 7
8. 8. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course K-S tables Nov 2019 8
9. 9. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Uses of K-S test • We usually use this test to check the normality assumption. • This test is used to test goodness of fit or shape and can be used instead of the chi square test (one-sample K-S test). • It simply needs the maximum difference between two cumulative distribution curves. • It can either compare a sample with a reference probability distribution or it can directly compare two sample datasets (two-sample K-S test). Nov 2019 9
10. 10. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course The KS test Nov 2019 10
11. 11. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Steps in K-S test • Sort data from smallest to largest. • Convert the data distribution to a cumulative distribution S(x). • Plot the cumulative distribution along with the comparison distribution. • Find the maximum absolute difference (D value). • If D is greater than critical-D, then it can be concluded that the distributions are indeed different, otherwise there is not enough evidence to prove difference between the two datasets. • A P-value can also be calculated from the D-value and the sample size of the two data sets Nov 2019 11
12. 12. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Strengths of the K-S test • It is nonparametric. • D-value result will not change if X values are transformed to logs or reciprocals or any other transformation. • No restriction on sample size. • The D-value is easy to compute and the graph can be understood easily. • One sample K-S test can serve as a goodness-of-fit test and can link data and theory. Nov 2019 12
13. 13. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Drawbacks • The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. • It works best only when EDFs deviate the most near the center of the distribution. • The situation in which normality tests are needed -small sample sizes- is also the situation in which they perform poorly Nov 2019 13
14. 14. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Example 1 • H0: The distn. is normal with μ= 6.80, σ= 1.24. • H1: Above not true. • We obtain Fe values from the normal table, z= (X- μ)/ σ. • The calculated value of Dn is the maximum value in the | Fe - Fo | column. Thus, 0.0373. • For the predecided level of significance, Dcritical = 0.0416. • Dn < Dcritical , so accept H0 and conclude that it is a good fit. Nov 2019 14
15. 15. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Example 2 Nov 2019 15
16. 16. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Example 3 Nov 2019 16
17. 17. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course Take home messages • KS test belongs a group of tests called goodness of fit tests • Used to check normality assumption • The basis of the test is that it relates the distance between the cumulative fraction functions of the two samples as “D” • More powerful test than chi square test • Less sensitive when the differences between curves is greatest at the beginning or the end of the distributions • SPSS tutorial https://www.spss-tutorials.com/spss- kolmogorov-smirnov-test-for-normality/ Nov 2019 17
18. 18. Saudi Board of Preventive Medicine, Riyadh Ministry of Health, KSA Dr. S. A. Rizwan, M.D.Demystifying statistics series: Meta-analysis course THANK YOU Kindly email your queries to sarizwan1986@outlook.com Nov 2019 18