• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hypothesis testing
 

Hypothesis testing

on

  • 984 views

Hypothesis testing by Dr. Badr Aljaser as part of the 5th Research Summer School - Jeddah at KAIMRC - WR

Hypothesis testing by Dr. Badr Aljaser as part of the 5th Research Summer School - Jeddah at KAIMRC - WR

Statistics

Views

Total Views
984
Views on SlideShare
984
Embed Views
0

Actions

Likes
0
Downloads
101
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Randomized trials can be used for many purposes. They can be used for evaluating new drugs and other treatments of disease, including tests of new health and medical care technology. Such trials can be used to assess new programs for screening and early detection, or new ways of organizing and delivering health services.
  • A planned trial was described by the Scottish surgeon James Lind in 1747. Lind became interested in scurvy, which killed thousands of British seamen each year. He was intrigued by the story of a sailor who had developed scurvy and had been put ashore on an isolated island where he subsisted on a diet of grasses and then recovered from the scurvy. Lind conducted an experiment which he described as follows: I took 12 patients in the scurvy on board the Salisbury at sea. The cases were as similar as I could have them … they lay together in one place and had one diet common to them all. Two of these were ordered a quart of cider per day.… Two others took 25 gutts of elixir vitriol.… Two others took two spoonfuls of vinegar.… Two were put under a course of sea water.… Two others had two oranges and one lemon given them each day.… Two others took the bigness of nutmeg. The most sudden and visible good effects were perceived from the use of oranges and lemons, one of those who had taken them being at the end of 6 days fit for duty.… The other … was appointed nurse to the rest of the sick. Interestingly, the idea of a dietary cause of scurvy proved unacceptable in Lind's day. Only 47 years later did the British Admiralty permit him to repeat his experiment-this time on an entire fleet of ships. The results were so dramatic that, in 1795, the Admiralty made lemon juice a required part of the standard diet of British seamen and later changed this to lime juice. Scurvy essentially disappeared from British sailors, who, even today, are referred to as "limeys."
  • When we carry out a study we are only looking at the sample of subjects in our study, such as a sample of patients with a certain illness who are being treated with treatment A or with treatment B. From the study results, we want to draw a conclusion that goes beyond the study population-is treatment A more effective than treatment B in the total universe of all patients with this disease who might be treated with treatment A or treatment B?
  • Under the assumption that the binomial distribution will approximate the normal distribution The probability of at least 52 of 100 attempts 0.3821 (high probability to be due to chance)
  • In probability and statistics, Student’s t-distribution (or simply the t-distribution) is a continuous probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown. It plays a role in a number of widely-used statistical analyses, including the Student’s t-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. The Student’s t-distribution also arises in the Bayesian analysis of data from a normal family. The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean. This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero. The Student’s t-distribution is a special case of the generalized hyperbolic distribution.
  • Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties: ■ The experiment consists of n repeated trials. ■ Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. ■ The probability of success, denoted by P, is the same on every trial. ■ The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because: ■ The experiment consists of repeated trials. We flip a coin 2 times. ■ Each trial can result in just two possible outcomes - heads or tails. ■ The probability of success is constant - 0.5 on every trial. ■ The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials. Notation The following notation is helpful, when we talk about binomial probability. ■ x: The number of successes that result from the binomial experiment. ■ n: The number of trials in the binomial experiment. ■ P: The probability of success on an individual trial. ■ Q: The probability of failure on an individual trial. (This is equal to 1 - P.) ■ b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly x successes, when the probability of success on an individual trial is P. ■ nCr: The number of combinations of n things, taken r at a time. Binomial Distribution A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (also known as a Bernoulli distribution). Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.
  • A large z value would be the one above 2 02 below -2
  • Z = + or – 1.96 is equal to an alpha level of .05
  • Alpha is divided between the two tails 0.05 .025 each
  • Alpha is one side 0.05
  • Alpha is in one side 0.05
  • Z = + or – 1.96 is equal to an alpha level of .05
  • Given this background, let us now consider a trial in which groups receiving one of two therapies, therapy A and therapy B, are being compared. (Keep in mind the sampling of beads just discussed.) Before beginning our study, we can list the four possible study outcomes: It is possible that in reality there is no difference in efficacy between therapy A and therapy B (i.e., therapy A is no better and no worse than therapy B), and when we do our study we correctly conclude on the basis of our samples that the two groups do not differ. It is possible that in reality there is no difference in efficacy between therapy A and therapy B (i.e., therapy A is no better and no worse than therapy B), but in our study we found a difference between the groups and therefore concluded, on the basis of our samples, that there is a difference between the therapies. This conclusion, based on our samples, is in error. It is possible that in reality there is a difference between therapy A and therapy B, but when we examine the groups in our study we find no difference between them. We therefore conclude, on the basis of our samples, that there is no difference between therapy A and therapy B. This conclusion is in error. It is possible that in reality there is a difference between therapy A and therapy B, and when we examine the groups in our study we find that they differ. On the basis of these samples, we correctly conclude that therapy A differs from therapy B.
  • These four possibilities constitute the universe of outcomes after we complete our study. Let us look at these four possibilities as presented in a 2 × 2 table: Two columns represent reality-either therapy A differs from therapy B or therapy A does not differ from therapy B. The two rows represent our decision: We conclude either that they differ or that they do not differ. In this figure, the four possibilities that were just listed are represented as four cells in the 2 × 2 table. If there is no difference, and on the basis of the samples included in our study we conclude there is no difference, this is a correct decision (cell a). If there is a difference, and on the basis of our study we conclude that there is a difference (cell d), this too is a correct decision. In the best of all worlds, all of the possibilities would fall into one of these two cells. Unfortunately, this is rarely, if ever, the case. There are times when there is no difference between the therapies, but on the basis of the samples of subjects included in our study, we erroneously conclude that they differ (cell c). This is called a type I error. It is also possible that there really is a difference between the therapies, but on the basis of the samples included in our study we erroneously conclude that there is no difference (cell b); this is called a type II error. (In this situation, the therapies differ, but we fail to detect the difference in our study samples.) The probability that we will make a type I error is designated α, and the probability that we will make a type II error is designated β.
  • α is the so-called P value, which is seen in many published papers and has been sanctified by many years of use. When you see " P < .05," the reference is to α. What does P < .05 mean? It tells us that we have concluded that therapy A differs from therapy B on the basis of the sample of subjects included in our study, which we found to differ. The probability that such a difference could have arisen by chance alone, and that this difference between our groups does not reflect any true difference between therapies A and B, is only .05 (or 1 in 20).
  • How do these concepts help us to arrive at an estimate of the sample size that we need? If we ask the question, "How many people do we have to study in a clinical trial?" we must be able to specify a number of items
  • Use the same method as described in Figure 7-6 . Use the standard normal distribution (Table A-2).

Hypothesis testing Hypothesis testing Presentation Transcript

  • Hypothesis Testing with One Sample
  • James Lind’s experiment
  • Hypothesis testingHypothesis testingDraw inferences about a populationbased on a sampleTesting a claim about a property ofa population
  • Statistical Inference∗ Inferences about a population are made onthe basis of results obtained from a sampledrawn from that population∗ Want to talk about the larger population fromwhich the subjects are drawn, not theparticular subjects!
  • What Do We Test ?∗ Effect or Difference we are interested in∗ Difference in Means or Proportions∗ Odds Ratio (OR)∗ Relative Risk (RR)∗ Correlation Coefficient∗ Clinically important difference∗ Smallest difference considered biologicallyor clinically relevant
  • Example: Gender Selection
  • Hypothesis TestingGoal: Make statement(s) regarding unknown populationparameter values based on sample data∗Elements of a hypothesis test:∗ Null hypothesis - Statement regarding the value(s) ofunknown parameter(s). Typically will imply no associationbetween explanatory and response variables in ourapplications (will always contain an equality)∗ Alternative hypothesis - Statement contradictory to the nullhypothesis (will always contain an inequality)
  • Null Hypothesis∗ Usually that there is no effect∗ Mean = 0∗ OR = 1∗ RR = 1∗ Correlation Coefficient = 0
  • Alternative Hypothesis∗ Contradicts the null∗ There is an effect∗ What you want to prove ?
  • Null Hypothesis expresses no differenceExample:H0: µ = 0Often said“H naught” Or any numberLater…….H0: µ1 = µ2
  • Alternative HypothesisH0: µ = 0; Null HypothesisHA: µ = 0; Alternative HypothesisResearcher’s predictions should bea priori, i.e. before looking at the data
  • Estimation: From the Sample∗ Point estimation∗ Mean∗ Median∗ Change in mean/median∗ Interval estimation∗ 95% Confidence interval∗ Variation
  • Parameters andReference DistributionsContinuous outcome data∗ Normal distribution: N( μ, σ2)∗ t distribution: tω (ω = degrees of freedom)∗Mean = (sample mean)∗Variance = s2(sample variance)Binary outcome data∗ Binomial distribution: B (n, p)X
  • Normal Distribution
  • t – Distribution
  • Binomial Distribution
  • Hypothesis TestingGoal: Make statement(s) regarding unknown populationparameter values based on sample dataElements of a hypothesis test:∗ Test statistic - Quantity based on sample data and nullhypothesis used to test between null and alternativehypotheses.∗ The test statistic is found by converting the sample statistic(proportion, mean or standard deviation) to a score (z, tz, t or xx22))
  • ∗ Critical region (Rejection region): Values of the teststatistic for which we reject the null in favor of thealternative hypothesisCritical Region, Significant level,Critical value and p-value
  • ∗ Significant level (α ): the probability that the teststatistic will fall in the critical region when the nullhypothesis is actually true.Critical Region, Significant level,Critical value and p-value
  • ∗ Critical value: is any value that separates the criticalregion from the values of the test statistic that do notlead to rejection of the null hypothesis .Critical Region, Significant level,Critical value and p-value
  • ∗ Two tailed: the critical region is in the two extremeregions (tails) under the curveTwo-Tailed, Left Tailed, Right Tailed
  • ∗ Left tailed: the critical region is in the extreme leftregion (tails) under the curveTwo-Tailed, Left Tailed, Right Tailed
  • ∗ Right tailed: the critical region is in the extreme rightregion (tails) under the curveTwo-Tailed, Left Tailed, Right Tailed
  • ∗ P-value (p-value or probability value: is theprobability of getting a value of the test statistic thatis at least as extreme as the one representing thesample data assuming the null hypothesis is true.∗ The null hypothesis is rejected if the p-value is verysmall such as 0.05 or less.Critical Region, Significant level,Critical value and p-value
  • Reject the null hypothesis (or other)Fail to reject the null hypothesisProve the null hypothesis to be trueAccept the null hypothesisSupport the null hypothesisStatisticallycorrectOk but misleading
  • ∗ Traditional Method: Rejection of the null hypothesisif the statistic falls within the critical region Fail to reject the null hypothesis if the test statistic doesnot fall within the critical region∗ P – value methodP – value method: rejection H0 if p-value < α (where αis the significant level such as 0.05)Decision Criterion
  • ∗ Another option:Another option: Instead of using a significant levelsuch as α = 0.05, simply identify the P value and leavethe decision to the reader∗ Confidence intervals:Confidence intervals: Because a Confidence intervalestimate of the population parameter contains thelikely values of that parameter, reject a claim that thepopulation parameter has a value that is not includedin the confidence intervalDecision Criterion
  • Statistical ErrorSometimes H0 will be rejected (based on large teststatistic & small p-value) even though H0 is really truei.e., if you had been able to measure the entirepopulation, not a sample, you would have foundno difference between and µ some value butbased on X you see a difference.The mistake of rejecting a true H0 will happen with frequency αSo, if H0 is true, it will be rejected ~5% of the time as α frequently = 0.05
  • 00 20Population mean = 0Sample mean = 20Conclude based on sample mean that population mean ≠ 0, but it reallydoes (H0 true), therefore you have falsely rejected H0Type I Errorpopulation=“True”Sample=What you seeH0 : mean = 0
  • Statistical ErrorSometimes H0 will be accepted (based on small test statistic & largep-value) even though H0 is really falsei.e., if you had been able to measure the entirepopulation, not a sample, you would have founda difference between and µ some value- butbased on X you do not see a difference.The mistake of accepting a false H0 will happenwith frequency β
  • 0Sample mean = 00 20Sample mean = 20Conclude based on sample mean that population mean = 0, but it reallydoes not (H0 really false), therefore you have falsely failed to reject H0Type II ErrorPopulation= “True”Sample= what you seeH0 : mean = 020
  • 1. The treatments do not differ, and we correctly concludethat they do not differ.2. The treatments do not differ, but we conclude that theydo differ.3. The treatments differ, but we conclude that they do notdiffer.4. The treatments do differ, and we correctly conclude thatthey do differ.Four Possibilities in Testing Whetherthe Treatments Differ
  • Type I error∗ Concluded thatthere is differencewhile in realitythere is nodifference∗ α probabilityType II error∗Concluded thatthere is nodifference while inreality there is adifference∗β probability
  • Controlling Type I & Type II ErrorsαβPower (1 – β)Sample Size
  • ∗ The power of the hypothesis test is the probability (1-(1- ββ))rejecting a false null hypothesis, which is computed byusing:∗ A particular significant level α∗ Sample size nn∗ A particular assumed value of the population parameterin the null hypothesis∗ A particular assumed value of the population parameterthat is alternative to the value in the null hypothesisPower of the test
  • Term Definitionsα = Probability of making a type I error= Probability of concluding the treatments differ when in realitythey do not differβ = Probability of making a type II error= Probability of concluding that the treatments do not differ whenin reality they do differPower = 1 - Probability of making a type II error= 1 - β= Probability of correctly concluding that the treatments differ= Probability of detecting a difference between the treatments ifthe treatments do in fact differ