SlideShare a Scribd company logo
1 of 55
Download to read offline
The ABC of statistics
     Jonas Ranstam
A scientific report
The idea is to try and give all the information to help
others to judge the value of your contributions, not
just the information that leads to judgment in one
particular direction or another.

                                 Richard P. Feynman
Uncertainty is ubiquitous
1. Random variation (precision)
   a) measurement errors
   b) sampling variation

2. Systematic deviation (validity)
   a) selection bias
   b) information bias
   c) confounding bias
Random variation
 Measurement errors

 How uncertain is a body weight measurement?
Error distribution
Variation in observed body weight during 133 consecutive daily
measurements (residuals, detrended using lowess)




                Deviation from mean value (kg)
How uncertain is an observed weight of 77kg?


       Degree of uncertainty

       68.0%       ±1.5kg
       95.0%       ±3.0kg
       99.7%       ±4.5kg
How uncertain is a weight change, between two
consecutive measurements?



       Degree of uncertainty

       68.0%       ±2.1kg
       95.0%       ±4.2kg
       99.7%       ±6.4kg
Random variation
 Sampling variation

 How uncertain is a mean value,

 or, what does “statistically significant” mean?
Statistics
           Numerical descriptions
Observed
 sample
Significance related statements

      “There was no difference in...”
        “No difference in … could be observed”
         “There was a difference...”
           Etc.



                         Statistics
                         Numerical descriptions
Observed
 sample
Unobserved
            population




           Statistics
           In singular: The scientific method of assessing
                        the uncertainty of generalizations
Observed
 sample    In plural:    Numerical descriptions
Problem

The sample is usually easy to identify, but what
“population” are we talking about?
To what population do experiment A
belong?




               Experiment A
To what population do experiment A
belong?
               The mother of all possible realizations of

                              Experiment A




  Experiment A Experiment A   Experiment A Experiment A Experiment A
To what population do experiment A
belong?
                The mother of all possible repetitions of

                              Experiment A




  Experiment A Experiment A   Experiment A Experiment A Experiment A




                                  μ

                          Sampling variability
To what population do experiment A
belong?
                The mother of all possible repetitions of

                              Experiment A




  Experiment A Experiment A   Experiment A Experiment A Experiment A




                                  μ

                          Sampling variability
What is the sampling variability of these
experiments?
                The mother of all possible repetitions of

                              Experiment A




  Experiment A Experiment A   Experiment A Experiment A Experiment A




                                μ
                   Observed sampling variability after
                      thousands of experiments
Can we say anything about sampling
uncertainty if only one experiment is
performed?



                   Experiment A   SD
                                  n




                        μ

         Sampling uncertainty?
Can we say anything about sampling
uncertainty if only one experiment is
performed?



                          Experiment A   SD
                                         n



                                         SEM = SD/√n




 -1.96SEM                 +1.96SEM
        Sampling uncertainty
Is sampling uncertainty a problem only in
         experimental studies?

                 Unobserved
                  population




                The properties of the population, like effect, risk,
                quality, etc., cannot be directly observed.

                These can only be indirectly observed in the sample,
                contaminated by measurement error, sampling
   Observed     variation and bias (case-mix). This information needs
    sample      to be statistically analyzed.

                A number of the items in the sample may be interesting
                for the subjects in the sample, but cannot be interpret-
                ed in terms of independent effect, risk, quality, etc.
Do different ranks in league tables
represent differences in “hospital
quality”?




Hospital A   Hospital B    Hospital C         Hospital D   Hospital E




                      Sampling variability?
Or do the differences just
             reflect sampling variation?
              The mother of all possible repetitions of

                                Hospital A




Hospital A     Hospital A        Hospital A        Hospital A   Hospital A




                                    μ

                            Sampling variability
It depends on the degree of uncertainty!




 Hospital A   Hospital B    Hospital C         Hospital D   Hospital E




                       Sampling variability?                  ICC ≈ 1.0
It depends on the degree of uncertainty!




 Hospital A   Hospital B    Hospital C         Hospital D   Hospital E




                       Sampling variability?                  ICC = 0
Evaluating uncertainty
Alt. 1. Hypothesis testing

        H0: μ = 0
        HA: μ ≠ 0

        P(   | H0 )

        P < 0.05 →    H0 is rejected
                      statistically significant
Clinical vs. statistical significance
A clinically significant difference is important for all
subjects, whether it is statistically significant or not.

Example

An increase in systolic blood pressure by 30mmHg.
Clinical vs. statistical significance
A statistically significant finding is not necessarily
clinically significant.

Example

A decrease (p = 0.023) in body weight of 0.1kg.
Probability
You know, the most amazing thing happened to me
tonight. I was coming here, on the way to the lecture,
and I came in through the parking lot. And you won't
believe what happened. I saw a car with the
registration plate ARW 357. Can you imagine? Of all
the millions of license plates in the state, what was
the chance that I would see this particular one
tonight? Amazing...

                                Richard P. Feynman
Fundamentals of hypothesis testing
The p-value represents the probability of a false positive
outcome of a pre-defined hypothesis test


Results from testing observed differences (in “fishing
expeditions”) are unreliable.
Fundamentals of hypothesis testing
The probability of a false positive outcome increases with the
number of tests performed.


Results from performing multiple tests without recognizing the
implicit multiplicity issues are unreliable.
Fundamentals of hypothesis testing
Presenting only the statistical significance of findings is common
but should be avoided.


The description “existing” or “not existing” (depending on their p-
values)

- misleads the reader about what the investigator actually observed

- says nothing about the clinical relevance of the finding

- is misleading with respect to false positive and false negative
  findings.
Fundamentals of hypothesis testing
H0: μ = 0
HA: μ ≠ 0                       Two-sided test
H0: μ > 0 or μ = 0
HA: μ < 0
                                      One-sided test

Example: The effect of an anti-hypertensive drug.
Fundamentals of hypothesis testing
H0: μ = 0
HA: μ ≠ 0                        Two-sided test
H0: μ > 0 or μ = 0
HA: μ < 0
                                        One-sided test

Example: The effect of an anti-hypertensive drug.

Is the null hypothesis clinically meaningful?
Fundamentals of hypothesis testing

The statistical power to detect a hypothetical difference is used
when calculating the required sample size.

The post-hoc power of an observed difference is often calculated
but is not meaningful. It does not reveal any other information than
the p-value.

The statistical precision of an estimated parameter is best
described by its confidence interval.
Fundamentals of hypothesis testing
Statistical precision depends on the variability of subjects
(independent observations) in the population and on the
number of observations in the sample.


Testing body parts, e.g. hips, knee, feet, fingers, etc., (intraclass-
correlated observations) usually leads to an under-estimation of
between-subject variability and to an overestimation of the number
of observations.

The consequence is usually too optimistic p-values.
Fundamentals of hypothesis testing
Tests of imbalance are usually meaningless:

   1. baseline imbalance in randomized trials

   2. imbalance of matched sets in a matched case-control or
      cohort study

   3. imbalance of exposure related risk factors for use in
      confounding adjustments.


This imbalance is a property of the sample, and the tests are about
properties of the population.
Evaluating uncertainty

Alt. 2. Interval estimation


    A 95% confidence interval

      - 2SEM < µ <     + 2SEM

    Includes with 95% confidence the
    estimated parameter
Note!

 ± 3SEM   99.7% confidence interval

 ± 2SEM   95% confidence interval

 ± 1SEM   68% confidence interval
Note!

 ± 3SEM   99.7% confidence interval

 ± 2SEM   95% confidence interval

 ± 1SEM   68% confidence interval

 ± 1SD    is a measure of observed dispersion
Note!

  ± 3SEM    99.7% confidence interval

  ± 2SEM    95% confidence interval

  ± 1SEM    68% confidence interval

  ± 1SD     is a measure of observed dispersion

± SD ≈ 95%Ci for the mean when n = 6
P-value and confidence interval
          Information in p-values   Information in confidence intervals
         [2 possibilities]          [2 possibilities]




                 p < 0.05
                                           Statistically significant effect


                 n.s.               Inconclusive




Effect
                            0
P-value and confidence interval
              Information in p-values         Information in confidence intervals
             [2 possibilities]                [6 possibilities]




                      p < 0.05              Statistically but not clinically significant effect

                                                   Statistically and clinically significant effect
                      p < 0.05


                      p < 0.05              Statistically, but not necessarily clinically, significant effect



                      n.s.
                                              Inconclusive


               n.s.                      Neither statistically nor clinically significant effect


  p < 0.05                                Statistically significant reversed effect



Effect
                                 0      Clinically significant effects
Superiority vs. non-inferiority


                                                                                   Superiority shown


                                                                      Superiority shown less strongly



Non-inferiority not shown                                       Superiority not shown


       Non-inferiority shown                                      Superiority not shown


          Equivalence shown                                Superiority not shown



     Control better                                                New agent better
                                          0

                               Margin of non-inferiority
                                   or equivalence
Experimental vs. observational studies

Experiments: Bias is eliminated by design:
“Block what you can, randomize what you cannot”.
Statistical analysis: Protect the type-1 error rate


Observation: Blocking and randomization is impossible.
The results must be adjusted in the statistical analysis.
Statistical analysis: Prioritize validity
Observational studies
Validity

●   Selection bias     (systematic differences between
                       comparison groups caused by
                       non-random allocation of subjects)

●   Information bias   (misclassification, measurement
                       errors, etc.)

●   Confounding bias   (inadequate analysis, flawed
                       interpretation of results)
Testing for confounding
Univariate screening for statistically significant effects, or
stepwise regression, is often used to select covariates for
inclusion in a regression model.


Confounding bias is a property of the sample, not of the
population. What relevance have hypothesis tests?
Thank you for your attention!

More Related Content

Viewers also liked

STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...Musfera Nara Vadia
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Chapter 11
Chapter 11Chapter 11
Chapter 11bmcfad01
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpointkellula
 
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...Pat Barlow
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesisvikramlawand
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt finalpiyushdhaker
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testShakehand with Life
 

Viewers also liked (12)

STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
STATISTICS : Changing the way we do: Hypothesis testing, effect size, power, ...
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpoint
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
 
P value
P valueP value
P value
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesis
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
 

Similar to Abc4

Diagnotic and screening tests
Diagnotic and screening testsDiagnotic and screening tests
Diagnotic and screening testsjfwilson2
 
Malimu statistical significance testing.
Malimu statistical significance testing.Malimu statistical significance testing.
Malimu statistical significance testing.Miharbi Ignasm
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and ProbablityAbdul Wasay Baloch
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1shoffma5
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notesBob Smullen
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiologicalshefali jain
 
RMH Concise Revision Guide - the Basics of EBM
RMH Concise Revision Guide -  the Basics of EBMRMH Concise Revision Guide -  the Basics of EBM
RMH Concise Revision Guide - the Basics of EBMAyselTuracli
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiranKiran Ramakrishna
 
Screening test (basic concepts)
Screening test (basic concepts)Screening test (basic concepts)
Screening test (basic concepts)Tarek Tawfik Amin
 
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for RProbability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for RSakhileKhoza2
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminardrdeepika87
 

Similar to Abc4 (20)

Oac guidelines
Oac guidelinesOac guidelines
Oac guidelines
 
Lund 2009
Lund 2009Lund 2009
Lund 2009
 
Diagnotic and screening tests
Diagnotic and screening testsDiagnotic and screening tests
Diagnotic and screening tests
 
Copenhagen 2008
Copenhagen 2008Copenhagen 2008
Copenhagen 2008
 
Malimu statistical significance testing.
Malimu statistical significance testing.Malimu statistical significance testing.
Malimu statistical significance testing.
 
Oarsi jr1
Oarsi jr1Oarsi jr1
Oarsi jr1
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and Probablity
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Module7_RamdomError.pptx
Module7_RamdomError.pptxModule7_RamdomError.pptx
Module7_RamdomError.pptx
 
Overview of different statistical tests used in epidemiological
Overview of different  statistical tests used in epidemiologicalOverview of different  statistical tests used in epidemiological
Overview of different statistical tests used in epidemiological
 
Hypo
HypoHypo
Hypo
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
RMH Concise Revision Guide - the Basics of EBM
RMH Concise Revision Guide -  the Basics of EBMRMH Concise Revision Guide -  the Basics of EBM
RMH Concise Revision Guide - the Basics of EBM
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiran
 
Screening test (basic concepts)
Screening test (basic concepts)Screening test (basic concepts)
Screening test (basic concepts)
 
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for RProbability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for R
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 

More from Jonas Ranstam PhD (20)

The SPSS-effect on medical research
The SPSS-effect on medical researchThe SPSS-effect on medical research
The SPSS-effect on medical research
 
Sof stat issues_pro
Sof stat issues_proSof stat issues_pro
Sof stat issues_pro
 
Sof klin forsk_stat
Sof klin forsk_statSof klin forsk_stat
Sof klin forsk_stat
 
Rcsyd pres nara
Rcsyd pres naraRcsyd pres nara
Rcsyd pres nara
 
Prague 2008
Prague 2008Prague 2008
Prague 2008
 
Odense 2010
Odense 2010Odense 2010
Odense 2010
 
Oac beijing jr
Oac beijing jrOac beijing jr
Oac beijing jr
 
Norsminde 2009
Norsminde 2009Norsminde 2009
Norsminde 2009
 
Nara guidelines-jr
Nara guidelines-jrNara guidelines-jr
Nara guidelines-jr
 
Malmo 30 03-2012
Malmo 30 03-2012Malmo 30 03-2012
Malmo 30 03-2012
 
Lund 2010
Lund 2010Lund 2010
Lund 2010
 
London 2008
London 2008London 2008
London 2008
 
Lecture jr
Lecture jrLecture jr
Lecture jr
 
Karlskrona 2009
Karlskrona 2009Karlskrona 2009
Karlskrona 2009
 
Datavalidering jr1
Datavalidering jr1Datavalidering jr1
Datavalidering jr1
 
Brussels 2010
Brussels 2010Brussels 2010
Brussels 2010
 
Amsterdam 2008
Amsterdam 2008Amsterdam 2008
Amsterdam 2008
 
Actalecturerungsted
ActalecturerungstedActalecturerungsted
Actalecturerungsted
 
Stockholm 6 7.11.2008
Stockholm 6 7.11.2008Stockholm 6 7.11.2008
Stockholm 6 7.11.2008
 
Prague 02.10.2008
Prague 02.10.2008Prague 02.10.2008
Prague 02.10.2008
 

Abc4

  • 1. The ABC of statistics Jonas Ranstam
  • 2.
  • 3.
  • 4. A scientific report The idea is to try and give all the information to help others to judge the value of your contributions, not just the information that leads to judgment in one particular direction or another. Richard P. Feynman
  • 5.
  • 6. Uncertainty is ubiquitous 1. Random variation (precision) a) measurement errors b) sampling variation 2. Systematic deviation (validity) a) selection bias b) information bias c) confounding bias
  • 7.
  • 8. Random variation Measurement errors How uncertain is a body weight measurement?
  • 9. Error distribution Variation in observed body weight during 133 consecutive daily measurements (residuals, detrended using lowess) Deviation from mean value (kg)
  • 10. How uncertain is an observed weight of 77kg? Degree of uncertainty 68.0% ±1.5kg 95.0% ±3.0kg 99.7% ±4.5kg
  • 11. How uncertain is a weight change, between two consecutive measurements? Degree of uncertainty 68.0% ±2.1kg 95.0% ±4.2kg 99.7% ±6.4kg
  • 12. Random variation Sampling variation How uncertain is a mean value, or, what does “statistically significant” mean?
  • 13. Statistics Numerical descriptions Observed sample
  • 14. Significance related statements “There was no difference in...” “No difference in … could be observed” “There was a difference...” Etc. Statistics Numerical descriptions Observed sample
  • 15. Unobserved population Statistics In singular: The scientific method of assessing the uncertainty of generalizations Observed sample In plural: Numerical descriptions
  • 16. Problem The sample is usually easy to identify, but what “population” are we talking about?
  • 17. To what population do experiment A belong? Experiment A
  • 18. To what population do experiment A belong? The mother of all possible realizations of Experiment A Experiment A Experiment A Experiment A Experiment A Experiment A
  • 19. To what population do experiment A belong? The mother of all possible repetitions of Experiment A Experiment A Experiment A Experiment A Experiment A Experiment A μ Sampling variability
  • 20. To what population do experiment A belong? The mother of all possible repetitions of Experiment A Experiment A Experiment A Experiment A Experiment A Experiment A μ Sampling variability
  • 21. What is the sampling variability of these experiments? The mother of all possible repetitions of Experiment A Experiment A Experiment A Experiment A Experiment A Experiment A μ Observed sampling variability after thousands of experiments
  • 22. Can we say anything about sampling uncertainty if only one experiment is performed? Experiment A SD n μ Sampling uncertainty?
  • 23. Can we say anything about sampling uncertainty if only one experiment is performed? Experiment A SD n SEM = SD/√n -1.96SEM +1.96SEM Sampling uncertainty
  • 24. Is sampling uncertainty a problem only in experimental studies? Unobserved population The properties of the population, like effect, risk, quality, etc., cannot be directly observed. These can only be indirectly observed in the sample, contaminated by measurement error, sampling Observed variation and bias (case-mix). This information needs sample to be statistically analyzed. A number of the items in the sample may be interesting for the subjects in the sample, but cannot be interpret- ed in terms of independent effect, risk, quality, etc.
  • 25. Do different ranks in league tables represent differences in “hospital quality”? Hospital A Hospital B Hospital C Hospital D Hospital E Sampling variability?
  • 26. Or do the differences just reflect sampling variation? The mother of all possible repetitions of Hospital A Hospital A Hospital A Hospital A Hospital A Hospital A μ Sampling variability
  • 27. It depends on the degree of uncertainty! Hospital A Hospital B Hospital C Hospital D Hospital E Sampling variability? ICC ≈ 1.0
  • 28. It depends on the degree of uncertainty! Hospital A Hospital B Hospital C Hospital D Hospital E Sampling variability? ICC = 0
  • 29. Evaluating uncertainty Alt. 1. Hypothesis testing H0: μ = 0 HA: μ ≠ 0 P( | H0 ) P < 0.05 → H0 is rejected statistically significant
  • 30. Clinical vs. statistical significance A clinically significant difference is important for all subjects, whether it is statistically significant or not. Example An increase in systolic blood pressure by 30mmHg.
  • 31. Clinical vs. statistical significance A statistically significant finding is not necessarily clinically significant. Example A decrease (p = 0.023) in body weight of 0.1kg.
  • 32. Probability You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won't believe what happened. I saw a car with the registration plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see this particular one tonight? Amazing... Richard P. Feynman
  • 33. Fundamentals of hypothesis testing The p-value represents the probability of a false positive outcome of a pre-defined hypothesis test Results from testing observed differences (in “fishing expeditions”) are unreliable.
  • 34. Fundamentals of hypothesis testing The probability of a false positive outcome increases with the number of tests performed. Results from performing multiple tests without recognizing the implicit multiplicity issues are unreliable.
  • 35. Fundamentals of hypothesis testing Presenting only the statistical significance of findings is common but should be avoided. The description “existing” or “not existing” (depending on their p- values) - misleads the reader about what the investigator actually observed - says nothing about the clinical relevance of the finding - is misleading with respect to false positive and false negative findings.
  • 36. Fundamentals of hypothesis testing H0: μ = 0 HA: μ ≠ 0 Two-sided test H0: μ > 0 or μ = 0 HA: μ < 0 One-sided test Example: The effect of an anti-hypertensive drug.
  • 37. Fundamentals of hypothesis testing H0: μ = 0 HA: μ ≠ 0 Two-sided test H0: μ > 0 or μ = 0 HA: μ < 0 One-sided test Example: The effect of an anti-hypertensive drug. Is the null hypothesis clinically meaningful?
  • 38. Fundamentals of hypothesis testing The statistical power to detect a hypothetical difference is used when calculating the required sample size. The post-hoc power of an observed difference is often calculated but is not meaningful. It does not reveal any other information than the p-value. The statistical precision of an estimated parameter is best described by its confidence interval.
  • 39. Fundamentals of hypothesis testing Statistical precision depends on the variability of subjects (independent observations) in the population and on the number of observations in the sample. Testing body parts, e.g. hips, knee, feet, fingers, etc., (intraclass- correlated observations) usually leads to an under-estimation of between-subject variability and to an overestimation of the number of observations. The consequence is usually too optimistic p-values.
  • 40. Fundamentals of hypothesis testing Tests of imbalance are usually meaningless: 1. baseline imbalance in randomized trials 2. imbalance of matched sets in a matched case-control or cohort study 3. imbalance of exposure related risk factors for use in confounding adjustments. This imbalance is a property of the sample, and the tests are about properties of the population.
  • 41. Evaluating uncertainty Alt. 2. Interval estimation A 95% confidence interval - 2SEM < µ < + 2SEM Includes with 95% confidence the estimated parameter
  • 42. Note! ± 3SEM 99.7% confidence interval ± 2SEM 95% confidence interval ± 1SEM 68% confidence interval
  • 43. Note! ± 3SEM 99.7% confidence interval ± 2SEM 95% confidence interval ± 1SEM 68% confidence interval ± 1SD is a measure of observed dispersion
  • 44. Note! ± 3SEM 99.7% confidence interval ± 2SEM 95% confidence interval ± 1SEM 68% confidence interval ± 1SD is a measure of observed dispersion ± SD ≈ 95%Ci for the mean when n = 6
  • 45. P-value and confidence interval Information in p-values Information in confidence intervals [2 possibilities] [2 possibilities] p < 0.05 Statistically significant effect n.s. Inconclusive Effect 0
  • 46. P-value and confidence interval Information in p-values Information in confidence intervals [2 possibilities] [6 possibilities] p < 0.05 Statistically but not clinically significant effect Statistically and clinically significant effect p < 0.05 p < 0.05 Statistically, but not necessarily clinically, significant effect n.s. Inconclusive n.s. Neither statistically nor clinically significant effect p < 0.05 Statistically significant reversed effect Effect 0 Clinically significant effects
  • 47. Superiority vs. non-inferiority Superiority shown Superiority shown less strongly Non-inferiority not shown Superiority not shown Non-inferiority shown Superiority not shown Equivalence shown Superiority not shown Control better New agent better 0 Margin of non-inferiority or equivalence
  • 48. Experimental vs. observational studies Experiments: Bias is eliminated by design: “Block what you can, randomize what you cannot”. Statistical analysis: Protect the type-1 error rate Observation: Blocking and randomization is impossible. The results must be adjusted in the statistical analysis. Statistical analysis: Prioritize validity
  • 49. Observational studies Validity ● Selection bias (systematic differences between comparison groups caused by non-random allocation of subjects) ● Information bias (misclassification, measurement errors, etc.) ● Confounding bias (inadequate analysis, flawed interpretation of results)
  • 50.
  • 51.
  • 52.
  • 53. Testing for confounding Univariate screening for statistically significant effects, or stepwise regression, is often used to select covariates for inclusion in a regression model. Confounding bias is a property of the sample, not of the population. What relevance have hypothesis tests?
  • 54.
  • 55. Thank you for your attention!