Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Lecture 7 Hypothesis Testing Two Sa... by Ahmadullah 29029 views
- C2 st lecture 11 the t-test handout by fatima d 1195 views
- Influences on achievement? John Hattie by - 7874 views
- Effect Size by Sympa 14050 views
- Aron chpt 7 ed effect size f2011 by Sandra Nicks 1774 views
- John Hattie: Effect Sizes on Achiev... by richardcookau 53987 views

977 views

Published on

No Downloads

Total views

977

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

57

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Principles of Statistical Inference Gary Glonek School of Mathematical Sciences, University of Adelaide December 2013 [1-1]
- 2. Two-sample t-test homework question In a study of the eﬀects of diabetes on problems associated with the wearing of contact lenses, 16 diabetics and a control group of 16 non-diabetics wore contact lenses for a prescribed length of time. The swelling of their eyes was measured (as a percentage) immediately after removal of the lenses. The following data were obtained. Diabetic Subjects: Control Subjects: 6.1 9.9 8.5 7.9 7.8 10.2 10.9 9.0 8.4 6.8 10.4 10.1 9.0 6.7 8.9 8.9 7.6 5.3 10.6 11.5 9.1 11.3 8.6 12.3 6.8 9.1 12.0 13.4 10.0 9.2 9.1 7.1 [1-2]
- 3. Solution “We use the two-independent samples t-test. . .” x1 − x2 ¯ ¯ t= sp Diabetic Subjects Controls 1 n1 + n 16 16 1 n2 x ¯ 8.33 9.95 s 1.68 1.74 sp = 1.712 t = −2.67, df = 30, p-value = 0.012 Conclude swelling signiﬁcantly lower for diabetic subjects. [1-3]
- 4. What are we actually concluding? Underlying the analysis is a model for the data. • The 16 diabetic subjects are sampled from a larger population of diabetics • The 16 controls are sampled from a population of non-diabetics Control Population 0.20 0.00 0.10 Density 0.10 0.00 Density 0.20 Diabetic Population 4 8 12 Swelling % 4 8 12 Swelling % 16 [1-4]
- 5. What are we actually concluding? • Let µ1 be the mean swelling percentage for the diabetic population; • Let µ2 be the mean swelling percentage for the control population; • When we conclude that “swelling was signiﬁcantly lower for the diabetic subjects” we are saying that the data give us reason to believe that µ1 < µ2 . [1-5]
- 6. More about the model Important: • The data in each group is a random sample from its parent population • The two samples are independent Speciﬁc to the t-test we used: The two populations are normal: and have the same standard deviation, σ1 = σ2 Diabetic Population N(µ1,σ1) Control Population N(µ2,σ2) [1-6]
- 7. More about the conlusion • The sample means for our data are x1 = 8.33% for the ¯ diabetic patients and x2 = 9.95% for the controls. ¯ • Eventually, we conclude that we have reason to believe that the population means satisfy µ1 < µ2 . • This is not just because the sample means satisfy x1 < x2 ¯ ¯ • An alternative explanation is that there is no diﬀerence between diabetics and controls at the population level and the observed diﬀerence occurred by chance. • The variability in the data suggests this could happen. 6 8 10 12 Swelling % • The purpose of the t-test is to decide whether such an explanation is really plausible. [1-7]
- 8. Some notation • The population means, µ1 and µ2 , are unknown parameters. • The purpose of the experiment is to make conclusions about µ1 − µ2 based on the data. • The statement H0 : µ1 = µ2 is the Null Hypothesis. • This is the “chance explanation” for the data. • The statement H1 : µ1 = µ2 is the Alternative Hypothesis. • This is what we will conclude if we can eliminate the null hypothesis, H0 . • The hypothesis test is a procedure that uses data to reach one of the conclusions: Accept H0 or Reject H0 . [1-8]
- 9. Anatomy of the t-statistic x1 − x2 ¯ ¯ t= sp 1 n1 + 1 n2 • Think of x1 as an estimate of µ1 and x2 as an estimate of µ2 . ¯ ¯ • Therefore x1 − x2 is an estimate of µ1 − µ2 . ¯ ¯ • The quantity sp 1 n1 + 1 n2 is called the standard error. • It measures the accuracy of the estimate. • As a rule of thumb, estimates are accurate to ±2 standard errors. [1-9]
- 10. Statistical Signiﬁcance • If H0 is true (i.e. µ1 − µ2 = 0) then x1 − x2 will be estimating ¯ ¯ zero and should be small relative to the standard error. • If H0 is not true then x1 − x2 will be estimating a non-zero ¯ ¯ quantity and may be large relative to the standard error. • When we reject H0 , it is because x1 − x2 is too large relative to ¯ ¯ the standard error for us to believe it is just estimating zero. • The evidence for our decision is summarised by the P-value. The P-value is the probability that we would obtain a t-statistic as diﬀerent from zero as that actually observed if the null hypothesis were true. [1-10]
- 11. The P-value For the diabetes data, the t-statistic is -2.67 and the P-value is 1.2%. −3 −2 −1 0 1 2 3 t−statistic [1-11]
- 12. Rejecting H0 • In the diabetes example the P-value is 1.2%. • If there really was no diﬀerence between the diabetes and control populations, then the chance that we would have obtained data that led to a t-statistic ≤ −2.67 or ≥ 2.67 is only 1.2%. • This is a low probability • Therefore the data are not what we would have expected if H0 were true. • Consequently we reject H0 and conclude the two populations are not the same. • In general, it is conventional to reject H0 for P-values ≤ 0.05 and accept H0 otherwise. [1-12]
- 13. Recap • Hypothesis tests are used to decide whether to accept or reject the null hypothesis on the basis of observed data. • The null hypothesis is formulated in terms of the (unknown) population parameters. • The test statistic calculates the discrepancy between what was observed in the data and what we would expect if the null hypothesis were true. • The evidence in the test statistic can be expressed as a P-value. • H0 is rejected for small P-values and accepted otherwise. • In conventional applications, the threshold is P-value ≤ 0.05. [1-13]
- 14. Error rates Hypothesis tests can be cast as a decision problem. H0 True H0 False Accept H0 Correct Conclusion Type II Error Reject H0 Type I Error Correct Conclusion • A Type I error occurs when H0 is actually true but the test leads us to reject. • This corresponds to a false positive ﬁnding. • The Type I error rate is the probability of falsely rejecting when H0 is true. • A Type II error occurs when H0 is no true but the test leads us to accept. • This corresponds to a false negative. [1-14]
- 15. Controlling the error rates. • It can be seen that the Type I error rate is the P-value threshold for rejection. • If we reject for P-value ≤ 0.05 the Type I error rate is 5%. • That is, for every 100 true hypotheses we test, we would expect to falsely reject 5 by chance. • Therefore we can adjust the Type 1 error rate by changing the threshold of rejection. • For example, if we reject for P-value ≤ 0.01 the Type I error rate becomes 1%. • But this will increase the chance of Type II errors. • Type II errors are not controlled by a separate parameter. Once the Type I error rate is ﬁxed, the Type II errors can only be reduced by selecting a suitable sample size and appropriate experimental design. [1-15]
- 16. Bioinformatics • An important application of hypothesis testing is diﬀerential expression. • To test diﬀerential expression of a single gene under two diﬀerent conditions is like a two-sample t-test. • For microarray data, the moderated t-statistic produced by LIMMA is literally a variation on the t-statistic described here. • For RNA-Seq data, a diﬀerent type of test statistic is used but the notions of P-value and error rate still apply. • In all cases, a major factor is the large number of tests being conducted in parallel. • For example, the Aﬀymetrix HU-G133 set comprises 45,000 probe sets derived from 33,000 established human genes. • If we screen for DE between two groups, this means performing a test for each gene. [1-16]
- 17. Multiple comparisons • Suppose now we conduct a sequence of tests to screen for DE in 20,000 genes. • If we were to just use the standard 5% level of signiﬁcance, we would be swamped by false positives. True non-DE genes 20,000 19,500 19,000 True DE genes 0 500 1,000 Expected false +ves 1,000 975 900 • Conventional adjustments for multiple testing were introduced in the context of a relatively small number of tests. • For example, 5 or 10 or 20 tests. • Applying methods such as the Bonferroni adjustment to large scale multiple testing problems lead to ineﬃcient procedures. [1-17]
- 18. False Discovery Rates • In large scale multiple testing problems, it is more useful to consider quantities such as the false discovery rate (FDR). • Roughly speaking, the false discovery rate is FDR = E False Positives False Positives + True Positives . • As a hypothetical illustration, suppose out of 20,000 genes 500 are actually DE and the remaining 19,500 are non-DE. • Suppose we reject H0 for all 500 of the true DE genes. • Suppose we also make 975 false rejections from the non-DE genes. • In this case the rate false discoveries is 975 975+500 = 66.1%. • Although it is not possible to discern a false positive from a true positive in any single test, there are methods for estimating and controlling the rate of false positives. [1-18]
- 19. Final remarks • Statistical inference is concerned with making conclusions about a model assumed to have generated the observed data. • The framework we considered was that of sample and population. • In reality such a simple framework is not usually realistic. • For example, with the diabetes data we don’t really have a random sample from the population and observations made on subjects are not guaranteed to be reproducible. • If the framework is well understood and appropriately modelled, the statistical conclusions can be taken at face value. [1-19]
- 20. Final remarks • In bioinformatics applications, the framework can be immensely complicated. • Biological material is often not randomly sampled. • There may be several levels of technical variability and biological variability. • There may also be non-random components of error that need to be estimated and allowed for. • Perhaps the worst errors occur when the analysis does not account for the true complexity of the framework and the numbers are treated on face value. • Nevertheless, you can’t do good bioinformatics without good statistics! [1-20]

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment