3. Testing hypothesis
• Define Null and alternate Hypothesis
• Assume the Null is true
• Set alpha (5%)
• Collect data and calculate test
• Get p-value
• Compare p-value to alpha
• If p-value bigger than alpha fail to reject Null
hypothesis
• If p-value smaller than alpha reject null
hypothesis
3
4. More on testing hypothesis
• Here we have to address 3 things
1.When is a probability considered big and when it
is considered small
2.How do we define the null hypothesis
3.How can I get the probability
4
5. 1-When is the Probability small and
when it is big ?
• There is a conventional agreement for setting
the cut point at 0.05 ie 5% (called alpha)
• When probability (p-value) < 0.05 then we
have evidence that the assumption made is
not acceptable (like getting a 6 eight times)
• When p-value ≥ 0.05, then the event is not
considered rare, and we can not reject the
assumption (null hypothesis) (like getting a 6
only one time)
5
6. Definition of the p-value
• P-value is the probability of observing the
results received from data collection (or more
extreme) if the null hypothesis is true
7. 2-How do we define the Null
hypothesis
• Presumption of innocence (innocent till
proven guilty)
• Assumption of no difference between the
groups.
• Assumption of no effect (of the drug for
example)
• Assumption of no change.
• This is why in the dice example, the
assumption was fair, ie innocent.
7
8. Example
• What is the null hypothesis for each of the following
examples:
– Comparing long-term cost of surgical procedure to medical
treatment using drugs.
– Ho: there is no difference in cost between surgical
procedure and drugs.
– If the data observed show that those receiving surgical
treatment have ended up paying more over the 12
months, and the p-value is 0.002 , then we can reject the
null, and conclude that surgical treatment is costlier.
– But if the p-value was 0.0871 then we can not reject the
null, and thus we can not conclude that one method is cost
more than the other.
8
9. Example
• What is the null hypothesis for each of the following
examples:
– Comparing the risk/benefit ratio of different doses of
“lipitor” in treating patients
– Ho: there is no difference risk/benefit ratio among the
groups taking different doses of “lipitor”.
– If the data observed show that those receiving higher dose
have more benefit then risk, and the p-value is 0.01, then
we can reject the Ho, and conclude that the risk/benefit
ratio for higher doses of lipitor is better than lower doeses
– But if the p-value was 0.245 then we can not reject the
null.
9
10. More on the Ho
• Cannot prove something is true, but CAN
prove that something is false. (refutation)
• You only need one evidence against a
hypothesis to show it is wrong, but not finding
that evidence doesn’t automatically mean the
hypothesis is true, it can be that you haven’t
found it yet …
12. Part 1: P-value
• What is the definition of p-value?
• It is a probability
• Used to make a statistical conclusion (reject or
fail to reject the Null Hypothesis)
• The complete definition: Probability of getting
the results observed (or more extreme results)
IF the null hypothesis was TRUE
13. Application of definition
• Researcher testing the LOS of patients treated
with Meropenem vs those treated with
Levofloxacin.
• Ho the LOS is the same (there is no difference)
• Results show a difference of 1.4 days, p-value =
0.012
• Interpret the p-value ?
• The probability that a difference of 1.4 days or
longer would be observed in our sample IF IN
FACT there was no difference in the population is
12 times per 1000 studies.
15. When running a test of hypothesis
• 4 Scenarios could occur
Null Hypothesis
True False
Result of the
test statistics
Fail to
reject
Ho
Reject
Ho
Correct
Decision
1-alpha = 95%
Wrong Decision
Type I error =
alpha = 5%
Wrong Decision
Type II error =
Beta = 20%
Correct Decision
1-Beta = power =
80%
17. II-A Why Power ?
• Power is a very important concept in research
• If a drug company is making a drug that
lowers BP
• To test the drug the company is going to run a
research study
• In this study the Ho: BP doesn’t change
18. Power
• The researchers of the company believe that
the drug work. Thus they believe the Ha is
true, thus they really want to reject Ho, they
don’t wan to miss the chance to provide this
evidence.
• What is describe above is the Power,
19. Application on Power
• Example of the Meropenem vs Levofloxacin.
• Setting the study to have a 80% power means?
• That IF IN FACT there is a difference in the LOS
between those patients (you would have to
define the minimum difference expected) then
there is 80% chance of detecting this difference
(ie rejecting the Ho).
• A 1000 studies would yield 800 rejecting the Ho
and 200 failing the reject the Ho.
20. II-B Application for Type I error
• If in reality there is no effect of coated stents
as compared to non-coated stents with regard
to growth of vegetation, nonetheless a study
found a higher rate among the non-coated vs
the coated with a p-value = 0.042, and thus
ended up rejecting the Ho.
• Here the Ho was true (no difference)
• The study made a error in rejecting it, this
committing a Type I error
21. II-C Application of Type II error
• If in reality mixed insulin regimen vs basal
insulin regimen leads to better FBS control,
and a study failed to show a statistical
significance (p-value=0.0712).
• The researchers failed to reject a False Ho
• Thus committing a Type II error
22. Part III - 4 parameters
• The following 4 parameters are interrelated:
– Alpha
– Beta or power
– Effect size
– Sample size
23. 4 parameters
• Researchers use the relation between the 4
parameters, sets 3 of them, and estimate the
last one.
• Most of the time, researchers are after the
sample size required
24. Effect Size
• Effect size is simply the standardized difference
expected to be seen between the groups.
• For example, 2 groups with equal hypertension
levels, one treated and another not treated, by
how much I would expect the BP to drop = ES
(this definition is simplified because the
difference needs to be standardized)
25. How power is affected
Everything else kept constant:
Bigger sample size yield bigger power
Bigger effect size yield bigger power