Principles of Statistical Inference - Gary Glonek


Published on

Published in: Health & Medicine, Technology
  • Be the first to comment

Principles of Statistical Inference - Gary Glonek

  1. 1. Principles of Statistical Inference Gary Glonek School of Mathematical Sciences, University of Adelaide December 2013 [1-1]
  2. 2. Two-sample t-test homework question In a study of the effects of diabetes on problems associated with the wearing of contact lenses, 16 diabetics and a control group of 16 non-diabetics wore contact lenses for a prescribed length of time. The swelling of their eyes was measured (as a percentage) immediately after removal of the lenses. The following data were obtained. Diabetic Subjects: Control Subjects: 6.1 9.9 8.5 7.9 7.8 10.2 10.9 9.0 8.4 6.8 10.4 10.1 9.0 6.7 8.9 8.9 7.6 5.3 10.6 11.5 9.1 11.3 8.6 12.3 6.8 9.1 12.0 13.4 10.0 9.2 9.1 7.1 [1-2]
  3. 3. Solution “We use the two-independent samples t-test. . .” x1 − x2 ¯ ¯ t= sp Diabetic Subjects Controls 1 n1 + n 16 16 1 n2 x ¯ 8.33 9.95 s 1.68 1.74 sp = 1.712 t = −2.67, df = 30, p-value = 0.012 Conclude swelling significantly lower for diabetic subjects. [1-3]
  4. 4. What are we actually concluding? Underlying the analysis is a model for the data. • The 16 diabetic subjects are sampled from a larger population of diabetics • The 16 controls are sampled from a population of non-diabetics Control Population 0.20 0.00 0.10 Density 0.10 0.00 Density 0.20 Diabetic Population 4 8 12 Swelling % 4 8 12 Swelling % 16 [1-4]
  5. 5. What are we actually concluding? • Let µ1 be the mean swelling percentage for the diabetic population; • Let µ2 be the mean swelling percentage for the control population; • When we conclude that “swelling was significantly lower for the diabetic subjects” we are saying that the data give us reason to believe that µ1 < µ2 . [1-5]
  6. 6. More about the model Important: • The data in each group is a random sample from its parent population • The two samples are independent Specific to the t-test we used: The two populations are normal: and have the same standard deviation, σ1 = σ2 Diabetic Population N(µ1,σ1) Control Population N(µ2,σ2) [1-6]
  7. 7. More about the conlusion • The sample means for our data are x1 = 8.33% for the ¯ diabetic patients and x2 = 9.95% for the controls. ¯ • Eventually, we conclude that we have reason to believe that the population means satisfy µ1 < µ2 . • This is not just because the sample means satisfy x1 < x2 ¯ ¯ • An alternative explanation is that there is no difference between diabetics and controls at the population level and the observed difference occurred by chance. • The variability in the data suggests this could happen. 6 8 10 12 Swelling % • The purpose of the t-test is to decide whether such an explanation is really plausible. [1-7]
  8. 8. Some notation • The population means, µ1 and µ2 , are unknown parameters. • The purpose of the experiment is to make conclusions about µ1 − µ2 based on the data. • The statement H0 : µ1 = µ2 is the Null Hypothesis. • This is the “chance explanation” for the data. • The statement H1 : µ1 = µ2 is the Alternative Hypothesis. • This is what we will conclude if we can eliminate the null hypothesis, H0 . • The hypothesis test is a procedure that uses data to reach one of the conclusions: Accept H0 or Reject H0 . [1-8]
  9. 9. Anatomy of the t-statistic x1 − x2 ¯ ¯ t= sp 1 n1 + 1 n2 • Think of x1 as an estimate of µ1 and x2 as an estimate of µ2 . ¯ ¯ • Therefore x1 − x2 is an estimate of µ1 − µ2 . ¯ ¯ • The quantity sp 1 n1 + 1 n2 is called the standard error. • It measures the accuracy of the estimate. • As a rule of thumb, estimates are accurate to ±2 standard errors. [1-9]
  10. 10. Statistical Significance • If H0 is true (i.e. µ1 − µ2 = 0) then x1 − x2 will be estimating ¯ ¯ zero and should be small relative to the standard error. • If H0 is not true then x1 − x2 will be estimating a non-zero ¯ ¯ quantity and may be large relative to the standard error. • When we reject H0 , it is because x1 − x2 is too large relative to ¯ ¯ the standard error for us to believe it is just estimating zero. • The evidence for our decision is summarised by the P-value. The P-value is the probability that we would obtain a t-statistic as different from zero as that actually observed if the null hypothesis were true. [1-10]
  11. 11. The P-value For the diabetes data, the t-statistic is -2.67 and the P-value is 1.2%. −3 −2 −1 0 1 2 3 t−statistic [1-11]
  12. 12. Rejecting H0 • In the diabetes example the P-value is 1.2%. • If there really was no difference between the diabetes and control populations, then the chance that we would have obtained data that led to a t-statistic ≤ −2.67 or ≥ 2.67 is only 1.2%. • This is a low probability • Therefore the data are not what we would have expected if H0 were true. • Consequently we reject H0 and conclude the two populations are not the same. • In general, it is conventional to reject H0 for P-values ≤ 0.05 and accept H0 otherwise. [1-12]
  13. 13. Recap • Hypothesis tests are used to decide whether to accept or reject the null hypothesis on the basis of observed data. • The null hypothesis is formulated in terms of the (unknown) population parameters. • The test statistic calculates the discrepancy between what was observed in the data and what we would expect if the null hypothesis were true. • The evidence in the test statistic can be expressed as a P-value. • H0 is rejected for small P-values and accepted otherwise. • In conventional applications, the threshold is P-value ≤ 0.05. [1-13]
  14. 14. Error rates Hypothesis tests can be cast as a decision problem. H0 True H0 False Accept H0 Correct Conclusion Type II Error Reject H0 Type I Error Correct Conclusion • A Type I error occurs when H0 is actually true but the test leads us to reject. • This corresponds to a false positive finding. • The Type I error rate is the probability of falsely rejecting when H0 is true. • A Type II error occurs when H0 is no true but the test leads us to accept. • This corresponds to a false negative. [1-14]
  15. 15. Controlling the error rates. • It can be seen that the Type I error rate is the P-value threshold for rejection. • If we reject for P-value ≤ 0.05 the Type I error rate is 5%. • That is, for every 100 true hypotheses we test, we would expect to falsely reject 5 by chance. • Therefore we can adjust the Type 1 error rate by changing the threshold of rejection. • For example, if we reject for P-value ≤ 0.01 the Type I error rate becomes 1%. • But this will increase the chance of Type II errors. • Type II errors are not controlled by a separate parameter. Once the Type I error rate is fixed, the Type II errors can only be reduced by selecting a suitable sample size and appropriate experimental design. [1-15]
  16. 16. Bioinformatics • An important application of hypothesis testing is differential expression. • To test differential expression of a single gene under two different conditions is like a two-sample t-test. • For microarray data, the moderated t-statistic produced by LIMMA is literally a variation on the t-statistic described here. • For RNA-Seq data, a different type of test statistic is used but the notions of P-value and error rate still apply. • In all cases, a major factor is the large number of tests being conducted in parallel. • For example, the Affymetrix HU-G133 set comprises 45,000 probe sets derived from 33,000 established human genes. • If we screen for DE between two groups, this means performing a test for each gene. [1-16]
  17. 17. Multiple comparisons • Suppose now we conduct a sequence of tests to screen for DE in 20,000 genes. • If we were to just use the standard 5% level of significance, we would be swamped by false positives. True non-DE genes 20,000 19,500 19,000 True DE genes 0 500 1,000 Expected false +ves 1,000 975 900 • Conventional adjustments for multiple testing were introduced in the context of a relatively small number of tests. • For example, 5 or 10 or 20 tests. • Applying methods such as the Bonferroni adjustment to large scale multiple testing problems lead to inefficient procedures. [1-17]
  18. 18. False Discovery Rates • In large scale multiple testing problems, it is more useful to consider quantities such as the false discovery rate (FDR). • Roughly speaking, the false discovery rate is FDR = E False Positives False Positives + True Positives . • As a hypothetical illustration, suppose out of 20,000 genes 500 are actually DE and the remaining 19,500 are non-DE. • Suppose we reject H0 for all 500 of the true DE genes. • Suppose we also make 975 false rejections from the non-DE genes. • In this case the rate false discoveries is 975 975+500 = 66.1%. • Although it is not possible to discern a false positive from a true positive in any single test, there are methods for estimating and controlling the rate of false positives. [1-18]
  19. 19. Final remarks • Statistical inference is concerned with making conclusions about a model assumed to have generated the observed data. • The framework we considered was that of sample and population. • In reality such a simple framework is not usually realistic. • For example, with the diabetes data we don’t really have a random sample from the population and observations made on subjects are not guaranteed to be reproducible. • If the framework is well understood and appropriately modelled, the statistical conclusions can be taken at face value. [1-19]
  20. 20. Final remarks • In bioinformatics applications, the framework can be immensely complicated. • Biological material is often not randomly sampled. • There may be several levels of technical variability and biological variability. • There may also be non-random components of error that need to be estimated and allowed for. • Perhaps the worst errors occur when the analysis does not account for the true complexity of the framework and the numbers are treated on face value. • Nevertheless, you can’t do good bioinformatics without good statistics! [1-20]