- 1. Hypothesis Testing with One Sample
- 6. Hypothesis testingHypothesis testing Draw inferences about a population based on a sample Testing a claim about a property of a population
- 7. Statistical Inference ∗ Inferences about a population are made on the basis of results obtained from a sample drawn from that population ∗ Want to talk about the larger population from which the subjects are drawn, not the particular subjects!
- 8. What Do We Test ? ∗ Effect or Difference we are interested in ∗ Difference in Means or Proportions ∗ Odds Ratio (OR) ∗ Relative Risk (RR) ∗ Correlation Coefficient ∗ Clinically important difference ∗ Smallest difference considered biologically or clinically relevant
- 12. Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data ∗Elements of a hypothesis test: ∗ Null hypothesis - Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality) ∗ Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality)
- 13. Null Hypothesis ∗ Usually that there is no effect ∗ Mean = 0 ∗ OR = 1 ∗ RR = 1 ∗ Correlation Coefficient = 0
- 14. Alternative Hypothesis ∗ Contradicts the null ∗ There is an effect ∗ What you want to prove ?
- 15. Null Hypothesis expresses no difference Example: H0: µ = 0 Often said “H naught” Or any number Later……. H0: µ1 = µ2
- 16. Alternative Hypothesis H0: µ = 0; Null Hypothesis HA: µ = 0; Alternative Hypothesis Researcher’s predictions should be a priori, i.e. before looking at the data
- 17. Estimation: From the Sample ∗ Point estimation ∗ Mean ∗ Median ∗ Change in mean/median ∗ Interval estimation ∗ 95% Confidence interval ∗ Variation
- 18. Parameters and Reference Distributions Continuous outcome data ∗ Normal distribution: N( μ, σ2 ) ∗ t distribution: tω (ω = degrees of freedom) ∗Mean = (sample mean) ∗Variance = s2 (sample variance) Binary outcome data ∗ Binomial distribution: B (n, p) X
- 20. t – Distribution
- 22. Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: ∗ Test statistic - Quantity based on sample data and null hypothesis used to test between null and alternative hypotheses. ∗ The test statistic is found by converting the sample statistic (proportion, mean or standard deviation) to a score (z, tz, t or xx22 ))
- 23. ∗ Critical region (Rejection region): Values of the test statistic for which we reject the null in favor of the alternative hypothesis Critical Region, Significant level, Critical value and p-value
- 24. ∗ Significant level (α ): the probability that the test statistic will fall in the critical region when the null hypothesis is actually true. Critical Region, Significant level, Critical value and p-value
- 25. ∗ Critical value: is any value that separates the critical region from the values of the test statistic that do not lead to rejection of the null hypothesis . Critical Region, Significant level, Critical value and p-value
- 26. ∗ Two tailed: the critical region is in the two extreme regions (tails) under the curve Two-Tailed, Left Tailed, Right Tailed
- 27. ∗ Left tailed: the critical region is in the extreme left region (tails) under the curve Two-Tailed, Left Tailed, Right Tailed
- 28. ∗ Right tailed: the critical region is in the extreme right region (tails) under the curve Two-Tailed, Left Tailed, Right Tailed
- 29. ∗ P-value (p-value or probability value: is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data assuming the null hypothesis is true. ∗ The null hypothesis is rejected if the p-value is very small such as 0.05 or less. Critical Region, Significant level, Critical value and p-value
- 30. Reject the null hypothesis (or other) Fail to reject the null hypothesis Prove the null hypothesis to be true Accept the null hypothesis Support the null hypothesis Statistically correct Ok but misleading
- 31. ∗ Traditional Method: Rejection of the null hypothesis if the statistic falls within the critical region Fail to reject the null hypothesis if the test statistic does not fall within the critical region ∗ P – value methodP – value method: rejection H0 if p-value < α (where α is the significant level such as 0.05) Decision Criterion
- 32. ∗ Another option:Another option: Instead of using a significant level such as α = 0.05, simply identify the P value and leave the decision to the reader ∗ Confidence intervals:Confidence intervals: Because a Confidence interval estimate of the population parameter contains the likely values of that parameter, reject a claim that the population parameter has a value that is not included in the confidence interval Decision Criterion
- 33. Statistical Error Sometimes H0 will be rejected (based on large test statistic & small p-value) even though H0 is really true i.e., if you had been able to measure the entire population, not a sample, you would have found no difference between and µ some value but based on X you see a difference. The mistake of rejecting a true H0 will happen with frequency α So, if H0 is true, it will be rejected ~5% of the time as α frequently = 0.05
- 34. 0 0 20 Population mean = 0 Sample mean = 20 Conclude based on sample mean that population mean ≠ 0, but it really does (H0 true), therefore you have falsely rejected H0 Type I Error population=“True” Sample=What you see H0 : mean = 0
- 35. Statistical Error Sometimes H0 will be accepted (based on small test statistic & large p-value) even though H0 is really false i.e., if you had been able to measure the entire population, not a sample, you would have found a difference between and µ some value- but based on X you do not see a difference. The mistake of accepting a false H0 will happen with frequency β
- 36. 0 Sample mean = 0 0 20 Sample mean = 20 Conclude based on sample mean that population mean = 0, but it really does not (H0 really false), therefore you have falsely failed to reject H0 Type II Error Population= “True” Sample= what you see H0 : mean = 0 20
- 37. 1. The treatments do not differ, and we correctly conclude that they do not differ. 2. The treatments do not differ, but we conclude that they do differ. 3. The treatments differ, but we conclude that they do not differ. 4. The treatments do differ, and we correctly conclude that they do differ. Four Possibilities in Testing Whether the Treatments Differ
- 39. Type I error ∗ Concluded that there is difference while in reality there is no difference ∗ α probability Type II error ∗Concluded that there is no difference while in reality there is a difference ∗β probability
- 42. Controlling Type I & Type II Errors α β Power (1 – β) Sample Size
- 43. ∗ The power of the hypothesis test is the probability (1-(1- ββ)) rejecting a false null hypothesis, which is computed by using: ∗ A particular significant level α ∗ Sample size nn ∗ A particular assumed value of the population parameter in the null hypothesis ∗ A particular assumed value of the population parameter that is alternative to the value in the null hypothesis Power of the test
- 45. Term Definitions α = Probability of making a type I error = Probability of concluding the treatments differ when in reality they do not differ β = Probability of making a type II error = Probability of concluding that the treatments do not differ when in reality they do differ Power = 1 - Probability of making a type II error = 1 - β = Probability of correctly concluding that the treatments differ = Probability of detecting a difference between the treatments if the treatments do in fact differ

- Randomized trials can be used for many purposes. They can be used for evaluating new drugs and other treatments of disease, including tests of new health and medical care technology. Such trials can be used to assess new programs for screening and early detection, or new ways of organizing and delivering health services.
- A planned trial was described by the Scottish surgeon James Lind in 1747. Lind became interested in scurvy, which killed thousands of British seamen each year. He was intrigued by the story of a sailor who had developed scurvy and had been put ashore on an isolated island where he subsisted on a diet of grasses and then recovered from the scurvy. Lind conducted an experiment which he described as follows: I took 12 patients in the scurvy on board the Salisbury at sea. The cases were as similar as I could have them … they lay together in one place and had one diet common to them all. Two of these were ordered a quart of cider per day.… Two others took 25 gutts of elixir vitriol.… Two others took two spoonfuls of vinegar.… Two were put under a course of sea water.… Two others had two oranges and one lemon given them each day.… Two others took the bigness of nutmeg. The most sudden and visible good effects were perceived from the use of oranges and lemons, one of those who had taken them being at the end of 6 days fit for duty.… The other … was appointed nurse to the rest of the sick. Interestingly, the idea of a dietary cause of scurvy proved unacceptable in Lind's day. Only 47 years later did the British Admiralty permit him to repeat his experiment-this time on an entire fleet of ships. The results were so dramatic that, in 1795, the Admiralty made lemon juice a required part of the standard diet of British seamen and later changed this to lime juice. Scurvy essentially disappeared from British sailors, who, even today, are referred to as "limeys."
- When we carry out a study we are only looking at the sample of subjects in our study, such as a sample of patients with a certain illness who are being treated with treatment A or with treatment B. From the study results, we want to draw a conclusion that goes beyond the study population-is treatment A more effective than treatment B in the total universe of all patients with this disease who might be treated with treatment A or treatment B?
- Under the assumption that the binomial distribution will approximate the normal distribution The probability of at least 52 of 100 attempts 0.3821 (high probability to be due to chance)
- In probability and statistics, Student’s t-distribution (or simply the t-distribution) is a continuous probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown. It plays a role in a number of widely-used statistical analyses, including the Student’s t-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. The Student’s t-distribution also arises in the Bayesian analysis of data from a normal family. The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean. This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero. The Student’s t-distribution is a special case of the generalized hyperbolic distribution.
- Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties: ■ The experiment consists of n repeated trials. ■ Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. ■ The probability of success, denoted by P, is the same on every trial. ■ The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because: ■ The experiment consists of repeated trials. We flip a coin 2 times. ■ Each trial can result in just two possible outcomes - heads or tails. ■ The probability of success is constant - 0.5 on every trial. ■ The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials. Notation The following notation is helpful, when we talk about binomial probability. ■ x: The number of successes that result from the binomial experiment. ■ n: The number of trials in the binomial experiment. ■ P: The probability of success on an individual trial. ■ Q: The probability of failure on an individual trial. (This is equal to 1 - P.) ■ b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly x successes, when the probability of success on an individual trial is P. ■ nCr: The number of combinations of n things, taken r at a time. Binomial Distribution A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (also known as a Bernoulli distribution). Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.
- A large z value would be the one above 2 02 below -2
- Z = + or – 1.96 is equal to an alpha level of .05
- Alpha is divided between the two tails 0.05 .025 each
- Alpha is one side 0.05
- Alpha is in one side 0.05
- Z = + or – 1.96 is equal to an alpha level of .05
- Given this background, let us now consider a trial in which groups receiving one of two therapies, therapy A and therapy B, are being compared. (Keep in mind the sampling of beads just discussed.) Before beginning our study, we can list the four possible study outcomes: It is possible that in reality there is no difference in efficacy between therapy A and therapy B (i.e., therapy A is no better and no worse than therapy B), and when we do our study we correctly conclude on the basis of our samples that the two groups do not differ. It is possible that in reality there is no difference in efficacy between therapy A and therapy B (i.e., therapy A is no better and no worse than therapy B), but in our study we found a difference between the groups and therefore concluded, on the basis of our samples, that there is a difference between the therapies. This conclusion, based on our samples, is in error. It is possible that in reality there is a difference between therapy A and therapy B, but when we examine the groups in our study we find no difference between them. We therefore conclude, on the basis of our samples, that there is no difference between therapy A and therapy B. This conclusion is in error. It is possible that in reality there is a difference between therapy A and therapy B, and when we examine the groups in our study we find that they differ. On the basis of these samples, we correctly conclude that therapy A differs from therapy B.
- These four possibilities constitute the universe of outcomes after we complete our study. Let us look at these four possibilities as presented in a 2 × 2 table: Two columns represent reality-either therapy A differs from therapy B or therapy A does not differ from therapy B. The two rows represent our decision: We conclude either that they differ or that they do not differ. In this figure, the four possibilities that were just listed are represented as four cells in the 2 × 2 table. If there is no difference, and on the basis of the samples included in our study we conclude there is no difference, this is a correct decision (cell a). If there is a difference, and on the basis of our study we conclude that there is a difference (cell d), this too is a correct decision. In the best of all worlds, all of the possibilities would fall into one of these two cells. Unfortunately, this is rarely, if ever, the case. There are times when there is no difference between the therapies, but on the basis of the samples of subjects included in our study, we erroneously conclude that they differ (cell c). This is called a type I error. It is also possible that there really is a difference between the therapies, but on the basis of the samples included in our study we erroneously conclude that there is no difference (cell b); this is called a type II error. (In this situation, the therapies differ, but we fail to detect the difference in our study samples.) The probability that we will make a type I error is designated α, and the probability that we will make a type II error is designated β.
- α is the so-called P value, which is seen in many published papers and has been sanctified by many years of use. When you see " P < .05," the reference is to α. What does P < .05 mean? It tells us that we have concluded that therapy A differs from therapy B on the basis of the sample of subjects included in our study, which we found to differ. The probability that such a difference could have arisen by chance alone, and that this difference between our groups does not reflect any true difference between therapies A and B, is only .05 (or 1 in 20).
- How do these concepts help us to arrive at an estimate of the sample size that we need? If we ask the question, "How many people do we have to study in a clinical trial?" we must be able to specify a number of items
- Use the same method as described in Figure 7-6 . Use the standard normal distribution (Table A-2).