### Tests of significance

1. SPEAKER:-Shubhanshu gupta TEACHER I/C:-Dr.S.K.Verma DATE:-16/09/2014 1
2. Historical aspect. Basis of statistical inference. Hypothesis and it’s testing. Characteristics of Hypothesis. Null Hypothesis. Alternate Hypothesis. Interpreting the result of Hypothesis. Type I error and Type II error. One-tailed test , Two-tailed test. Effect of sample size. Test of significance. Parametric Vs Non Parametric test. Parametric test. Non-Parametric test. References. 2
3. The term statistical significance was coined by Ronald Fisher(1890-1962). Student t-test : William Sealy Gosset. 3
4. Statistical inference is the branch of statistics which is concerned with using probability concept to deal with uncertainly in decision making. It refers to the process of selecting and using a sample to draw inference about population from which sample is drawn. 4
5. Statistical Inference Estimation of population value Testing of hypothesis Point estimation Range estimation Mean, proportion estimation Confidence interval estimation 5
6. • During investigation there is assumption and presumption which subsequently in study must be proved or disproved. • Hypothesis is a supposition made from observation. On the basis of Hypothesis we collect data. • Hypothesis is a tentative justification, the validity of which remains to be tested. Two Hypothesis are made to draw inference from Sample value- A. Null Hypothesis or hypothesis of no difference. B. Alternative Hypothesis of significant difference. 6
7. The Null Hypothesis is symbolized as Ho and Alternative Hypothesis is symbolized as H1 or HA. In Hypothesis testing we proceed on the basis of Null Hypothesis. We always keep Alternative Hypothesis in mind. The Null Hypothesis and the Alternative Hypothesis are chosen before the sample is drawn. 7
8. 1. Hypothesis should be clear and precise. 2. Hypothesis should be capable of being tested. 3. It should state relationship between variables. 4. It must be specific. 5. It should be stated as simple as possible. 6. It should be amenable to testing within a reasonable time. 7. It should be consistent with known facts. 8
9. A Null Hypothesis or Hypothesis of no difference (Ho) between statistic of a sample and parameter of population or between statistic of two samples nullifies the claim that the experimental result is different from or better than the one observed already. In other words, Null Hypothesis states that the observed difference is entirely due to sampling error, that is - it has occurred purely by chance. 9
10. There is no difference between the operational procedures of open prostatectomy and TURP. There is no difference between open operation and transsphenoidal approach. There is no difference in the incidence of measles between vaccinated and non-vaccinated children. Drugs chloramphenicol is as good as drug cotrimoxazole in treating enteric fever. 10
11. Alternative Hypothesis of significant difference states that the sample result is different that is, greater or smaller than the hypothetical value of population. A test of significance such as Z-test, t-test, chi-square test, is performed to accept the Null Hypothesis or to reject it and accept the Alternative Hypothesis. 11
12.  The Hypothesis Ho is true - our test accepts it because the result falls within the zone of acceptance at 5% level of significance.  The Hypothesis Ho is false - test rejects it because the estimate falls in the area of rejection. 12
13. Zone of acceptance- If the results of a sample falls in the plain area i.e. within the mean+/-1.96 standard error the Null Hypothesis is accepted- the area is called zone of acceptance. Zone of rejection-If the result of a sample falls outside the plain area, i.e. beyond mean +/-1.96 standard error, it is significantly different from population value. So Null Hypothesis is rejected and alternative hypothesis is accepted. This area is called zone of rejection. 13
14. Setting a criterion Accept H0 Reject H0 Reject H0 H0  Zcrit Zcrit Null Hypothesis 14
15. When a Null Hypothesis is tested, there may be four possible outcomes: i. The Null Hypothesis is true but our test rejects it. ii. The Null Hypothesis is false but our test accepts it. iii. The Null Hypothesis is true and our test accepts it. iv. The Null Hypothesis is false but our test rejects it. Type 1 Error – rejecting Null Hypothesis when Null Hypothesis is true. It is called ‘α error’. Type 2 Error – accepting Null Hypothesis when Null Hypothesis is false. It is called ‘β-error’. 15
16. Decision Accept Ho Reject Ho Correct Type 1 error decision Ho true Correct decision Ho false Type 2 error 16
17. Decision Same effect More effective New regime is Correct Error not better decision New regime is Error Correct decision better 17
18. The Null Hypothesis is True False 1-α (confidence β (type 2 error) level) Accept if p>=0.05(non-significant) conclusion-negative 1-β (power of the test) Reject if α (type 1 error) p<0.05(significant ) conclusion-positive 18
19. The probability of committing Type 1 Error is called the P-value. Thus p-value is the chance that the presence of difference is concluded when actually there is none. When the p value is between 0.05 and 0.01 the result is usually called significant. When p value is less than 0.01, result is often called highly significant. When p value is less than 0.001 and 0.005, result is taken as very highly significant. 19
20. The statistical power of a test is the probability that a study or a trial will be able to detect a specified difference . This is calculated as 1- probability of type II error, i. e. probability of correctly concluding that a difference exists when it is indeed present. Thus, power = 1-β. 20
21. Confidence Interval : The interval within which a parameter value is expected to lie with a certain confidence level as could be revealed by repeated samples is called confidence interval. Confidence Level : The degree of assurance for an interval to contain the value of a parameter (1-α). 21
22. ONE-TAILED TEST If HA states  is < some value, critical region occupies left tail. If HA states  is > some value, critical region occupies right tail. 22
23. RIGHT-TAILED TEST H0: μ = 100 H1: μ > 100 Points Right Values that differ “significantly” Fail to reject H0 Reject H0 alpha 100 from 100 Zcrit 23
24. LEFT-TAILED TESTS H0: μ = 100 H1: μ < 100 100 Points Left Values that differ “significantly” from 100 Fail to reject H0 Reject H0 alpha Zcrit 24
25. TWO-TAILED HYPOTHESIS TESTING • HA is that μ is either greater or less than μH0 HA: μ ≠ μH0 •  is divided equally between the two tails of the critical region. 25
26. TWO-TAILED HYPOTHESIS TESTING H0: μ = 100 H1: μ  100 Fail to reject H0 Reject H0 Reject H0 100 Means less than or greater than Values that differ significantly from 100 alpha Zcrit Zcrit 26
27. With large n (say, n > 30), assumption of normal population distribution not important. For a given observed sample mean and standard deviation, the larger the sample size n, the larger the test statistic (because denominator is smaller) and the smaller the P-value. We’re more likely to reject a false H0 when we have a larger sample size (the test then has more “power”) With large n, “statistical significance” not the same as “practical significance.” 27
28. Test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to assess. Test of significance is used to test a claim about an unknown population parameter. A significance test uses data to evaluate a hypothesis by comparing sample point estimates of parameters to values predicted by the hypothesis. We answer a question such as, “If the hypothesis were true, would it be unlikely to get data such as we obtained?” 28
29. Based on specific distribution such as Gaussian Not based on any particular parameter such as mean Do not require that the means follow a particular distribution such as Gaussian. Used when the underlying distribution is far from Gaussian (applicable to almost all levels of distribution) and when the sample size is small 29
30. Parametric Tests Student’s t- test(one sample, two sample, and paired) Z test ANOVA F-test Pearson’s correlation(r) Non-Parametric Tests Sign test(for paired data) Wilcoxon Signed-Rank test for matched pair Wilcoxon Rank Sum test (for unpaired data) Chi-square test Spearman’s Rank Correlation(p) ANOCOVA Kruskal-Wallis test 30
31. Purpose of application Parametric test Non-Parametric test Comparison of two independent groups. ‘t’-test for independent samples Wilcoxon rank sum test Test the difference between paired observation ‘t’-test for paired observation Wilcoxon signed-rank test Comparison of several groups ANOVA Kruskal-Wallis test Quantify linear relationship between two variables Pearson’s Correlation Spearman’s Rank Correlation Test the association between two qualitative variables _ C h i - s q u a r e t e s t 31
32. Students t- tests - A statistical criterion to test the hypothesis that mean is superficial value, or that specified difference, or no difference exists between two means. It requires Gaussian distribution of the values, but is used when SD is not known. Proportion test - A statistical test of hypothesis based on Gaussian distribution, generally used to compare two means or two proportions in large samples, particularly when the SD is known. ANOVA F-test - used when the number of groups compared are three or more and when the objective is to compare the means of a quantitative variable. 32
33. One sample– only one group is studied and an externally determined claim is examined. Two sample– there are two groups to compare. Paired– used when two sets of measurements are available, but they are paired . 33
34. Find the difference between the actually observed mean and the claimed mean. Estimate the standard error (SE) of mean by S/n, where s is the standard deviation and n is the number of subjects in the actually studied sample. The SE measures the inter-sample variability Check the difference obtained in step 1 is sufficiently large relative to the SE. for this , calculate students t. this is called the test criterion. Rejection or non-rejection of the null depends on the value of this t . Reject the null hypothesis if the t-value so calculated is more than the critical value corresponding to the pre-fixed alpha level of significance and appropriate df. 34
35. There are 10 patients of arthritis. Suppose the reduction in pain after using newspirin is as follows on a 10-point visual analog scale: 0 3 6 1 1 4 0 2 1 5 Mean reduction, x = 2.3 points and SD, s=2.11. By using the formula : t = 2.3-3.0/(2.11/10^½) = -1.049 n = 10, df = 10-1 = 9 For one-tailed α = 0.05, and df = 9 , the critical value of t is 1.833 Since the calculated value 1.049 of t is less than the critical value 1.833, the Null Hypothesis that the mean reduction in pain is 3 point can not be rejected. 35
36. Sp – is the pooled SD. 36
37. A study on 24-hour creatinine excretion in male and female healthy adults to examine if a difference exists. For our illustration ,we give value obtained for 15 subjects in group in table: Me 16 19 17 15 20 24 18 17 22 24 18 16 21 17 n .6 .8 .1 .6 .3 .7 .5 .6 .0 .9 .4 .9 .1 .0 23 .3 W o m en 23 .2 22 .0 21 .9 14 .2 23 .2 24 .8 25 .5 28 .1 21 .8 20 .9 18 .0 19 .5 20 .6 16 .7 17 .3 37
38. df = n1+n2-2 = 15+15-2 = 28 in men, y1 = 19.59 and s1 = 3.03 in women, y2 = 21.18 and s2 = 3.65 sp = [(15-1)x(3.03)^²+(15-1)x(3.65) ^²/15+15-2]^½ =3.35 Thus, t = 19.59-21.18/3.35(1/15+1/15) ^½ = -1.59/1.2232 = -1.30 The critical value of t is 2.048, the calculated value is less than the critical value. Thus the Null Hypothesis of equality can not be rejected. 38
39. Obtain the difference for each pair and test the null hypothesis that the mean of these differences is zero(this null hypothesis is same as saying that the means before and after are equal). For paired samples : t = d/(Sd/(n)^1/2) d : is the sample mean of the differences Sd : is standard deviation. 39
40. Consider serum albumin level of 8 randomly chosen patients of dengue haemorrhagic fever before and after treatment. The value has been tabulated : Before treatm ent 5.1 3.8 4.0 4.7 4.5 4.8 4.1 3.6 After treat ment 4.8 3.7 3.8 4.7 4.6 5.0 4.0 3.4 Differ ence( d) 0.3 0.1 0.2 0 -0.1 -0.2 0.1 0.2 40
41. Mean difference, d = 0.6/8 = 0.075g/dl, and SD of difference, sd = 0.17. t = 0.075/0.17/(8)^½ = 1.25 df = 8-1 =7 The critical value of t is 2.365, since the calculated value is less, the null hypothesis of difference can not be rejected. 41
42. Used for large Quantitative data (i.e. n>30) .  Application: To find out Standard Error of difference between two sample means i.e. S. E. (X1 - X2) e.g. To find our significant difference between two different variables/groups i.e. Efficacy of two drugs, difference between two groups etc. 42
43. State the Null Hypothesis i.e. H0 and its Alternative Hypothesis i.e. H1 Find out the values of test statistic i.e. value of 'Z' as follows: _ _ _ _ Z = X1 – X2 / SE (X1 – X2) where, SE (X1 – X2)=√ (SD1)²/n1 + (SD2)² /n2 43
44. Situations where it is used are 1.in two sample situation 2. in paired set-up 3.in repeated measures, when the same subject is measured at different time points such as after 5 minutes, 15 minutes, 30 minutes, 60 minutes etc,. 4.removing the effect of a covariate 5. regression. 44
45. Correlation is the relationship between two or more paired factors or two or more sets. The degree of relationship is usually measured and represented by a correlation coefficient. A correlation coefficient is numerical measure of the linear relationship between two factors or sets of scores. Coefficient can be identified by either the letter r or the Greek letter rho. Or other symbols, depending on the manner the coefficient has been computed. Obtained correlation 45
46. The sign of the obtained correlation coefficient can range from coefficient indicates the directions of the relationship and the numerical value of its strength. Correlation Coefficient Degree of Relationship .00 - .20 Negligible .21 - .40 Low .41 - .60 Moderate .61 - .80 Substantial .81 – 1.00 High to Very High 46
47. Types of Correlation : Types Type 1 Type 2 Type 3 Type1 Positive Negative No Perfect 47
48. Type 2 Linear Non – linear Type 3 Simple Multiple Partial 48
49. r = [NSXY- (SX) (SY)] / [(NSX² - (SX²) - (NSY² - (SY) ²)]^½ Where: N = Number of paired observation SXY = sum of the cross products of C and Y SX = sum of the scores under Variable X SY = sum of the scores under variable Y (SX)² = Sum of x scores acquired (SY) = sum of y equated SX² = sum of squared X scores SY² = Sum of squared Y scores 49
50. Alternative to the test of significance of difference between two proportions O : Observed frequencies. E : Expected frequencies. 50
51. Do you know that prevalence of cataract is more in males or in females? Consider a study on prevalence of cataract in males and females of age 50 years and above. The results are as follows. Number of males examined (n1) = 60: found with cataract 37. Number of females examined (n2) = 40 : found with cataract 30. This is stated in a table format Gender Yes No Total Male 37 23 60 Female 30 10 40 Total 67 33 100 51
52. Expected frequency = Corresponding row total X Corresponding column total / Grand total =60x33/100 = 19.8 Applying the formula =(37-40.2)^2/40.2+(23-19.8) ^2/19.8+(30-26.8) ^2/26.8+(10- 13.2) ^2/13.2 = 0.2547+0.5172+0.3821+0.7758 = 1.93 The critical value of chi-square is 3.84 at 5% level of significance. Since the calculated value is less than the critical value, the Null Hypothesis can not be rejected. 52
53. For paired data It is a non parametric test based on signs(positive and negative) of the differences in the levels seen before and after therapy . 53
54. For matched pairs. It is better test than the sign test– assigns rank to the differences of n pairs after ignoring the + or – signs. The lowest difference gets rank 1 and the highest gets rank n. Sum of the only those ranks that are associated with positive difference obtained(Wilcoxon signed rank criteria). It is similar to Mann-Whitney test. 54
55. For unpaired two sample situation . If there are n1 subjects in the first sample and n2 in the second sample, these(n1+n2) values are jointly ranked from 1 to (n1+n2) {the sum of these ranks is obtained for those subjects only who are in smaller group}. 55
56. Spearman’s correlation is designed to measure the relationship between variables measured on an ordinal scale of measurement. Similar to Pearson’s Correlation, however it uses ranks as opposed to actual values. 56
57. 1. Convert the observed values to ranks (accounting for ties) 2. Find the difference between the ranks, square them and sum the squared differences. 3. Set up Hypothesis, carry out test and conclude based on findings. 4. If the Null is rejected then calculate the Spearman correlation coefficient to measure the strength of the relationship between the variables. 57
58. Where, 6 n   1   i i d 1 2 2 N N (  1)  di is the difference between the paired ranks n is the number of pairs. The Spearman rank correlation coefficient may lie between -1 to +1. Values close to +/-1 indicate a high correlation ; values close to zero indicate lack of relationship. 58
59. A Indrayan and L Satyanarayana-biostatistics, 20006 ed, Printice -Hall of India. MSN Rao, NS Murthy-applied statistics in health sciences, 2nd ed, 2010, jaypee. B Antonisamy, Solomon Christopher, P Prasanna Samuel – Boistatistics Principles and Practice. www. Wikipedia. org 59
60. 60