Interpretation& Causality
By: Mehdi Ehtesham
MPH/ Epidemiologist,
Avicenna Research Consulting
London, UK
ASSOCIATION AND CAUSATION
 1) INTRODUCTION – Descriptive studies help indicate a hypothesis by
studying time, place person, agent, host and environment. Suggest possible
etiological hypothesis. Analytical and experimental studies test it i.e. confirm/
refute association
 2) DEFINITION
a) ASSOCIATION – When events occur together frequently not by chance.
b) CO-RELATION -1.0 to +1.0 , however co-relation does not imply causation
 3) TYPES OF ASSOCIATION
a) SPURIOUS
b) INDIRECT
c) DIRECT
i) One to one causal
ii) Multifactorial
Associations may be due to
Chance (random error)
Bias (Systematic error)
Confounding
Causation
Associations may be due to
Chance (random error)
Bias (Systematic error)
Confounding
Causation
Dealing with chance error
 During design of study
 Sample size
 Power
 During analysis (Statistical measures of chance)
 Test of statistical significance (P value)
 Confidence intervals
Statistical measures of chance I
(Test of statistical significance)
Observed association
Association in Reality
Yes No
Yes
No
Type I
error
Type II
error
P-value
the probability the observed results
occurred by chance
the probability that an effect at least
as extreme as that observed could
have occurred by chance alone,
given there is truly no relationship
between exposure and disease (Ho)
statistically non-significant results are
not necessarily attributable to
chance due to small sample size
Statistical Power
 Power = 1 – type II error
 Power = 1 - ß
P value
Clinical Importance
VS
Statistical Significance
Statistical measures of chance II
(Confidence intervals)
Question?
 20 out of 100 participants: 20%
 200 out of 1000 participants: 20%
 2000 out of 10000 participants: 20%
 What is the difference?
Answer: Confidence Interval
 Definition: A range of values for a variable of interest
constructed so that this range has a specified probability of
including the true value of the variable for the population
 Characteristics:
 a measure of the precision (stability) of an observed effect
 the range within which the true magnitude of effect lies with a
particular degree of certainty
 95% C.I. means that true estimate of effect (mean, risk, rate) lies
within 2 standard errors of the population mean 95 times out of
100
 Confidence intervals get smaller (i.e. more precise or more
certain) if the underlying data have less variation/scatter
 Confidence intervals get smaller if there are more people in your
sample
95% Confidence Interval (95%
CI)
 20 out of 100 participants: 20%
95% CI: 12 to 28
 80 out of 400 participants: 20%
95% CI: 16 to 24
 2000 out of 10000 participants: 20%
95% CI: 19.2 to 20.8
Associations may be due to
Chance (random error)
Bias (Systematic error)
Confounding
Causation
Bias
 Any systematic error that results in an incorrect
estimate of the association between risk factors and
outcome
Types of Bias
Selection bias – identification of
individual subjects for inclusion in
study on the basis of either exposure
or disease status depends in some
way on the other axis of interest
Observation (information) bias –
results from systematic differences in
the way data on exposure or
outcome are obtained from the
various study groups
Selection Bias
 Self-selection bias
 Healthy (or diseased) people may seek out participation in
the study
 Referral bias
 Sicker patients are referred to major health centers
 Diagnostic bias
 Diagnosis of outcome associated with exposure
 Non-response bias
 Response, or lack of it, depends on exposure
 Differential loss to follow-up
 Exposed (or unexposed) group followed with different
intensity
 Berkson’s bias
 Hospitalization rates differ for by disease and
presence/absence of the exposure of interest
Information Bias
 Data collection bias
 Bias that results from abstracting charts, interviews
or surrogate interviews
 Recall bias
 Disease occurrence enhances recall about
potential exposures
 Surveillance or detection bias
 The exposure promoted more careful evaluation for
the outcome of interest
 Reporting bias
 Exposure may be under-reported because of
attitudes, perceptions, or beliefs
Bias results from systematic
flaws
 study design,
 data collection,
 analysis
 interpretation of results
Control of Bias
 Can only be prevented and controlled during the
design and conduct of a study:
 Careful planning of measurements
 Formal assessments of validity
 Regular calibration of instruments
 Training of data collection personnel
Confounding
 Confounding results when the effect of an exposure
on the disease (or outcome) is distorted because of
the association of exposure with other factor(s) that
influence the outcome under study.
Confounding
Coffee MI
Smoking,
Stress
Observed association, presumed causation
Observed association
True association
Properties of a confounder
A confounding factor must be a risk
factor for the disease.
 The confounding factor must be
associated with the exposure under
study in the source population.
The confounder cannot be an
intermediate step in the causal path
between the exposure and the
disease.
Control of Confounding
 During design of study
 Restriction
 Matching
 Randomization
 During analysis
 Stratified analysis
 Multivariate analysis
Confounding
Coffee MI
Smoking,
Stress
Observed association, presumed causation
Observed association
True association
Coffee drinking
Myocardial Infarction
Yes No
Yes
36 24
No
24 36
OD= ?
OD = 36×36/24×24
OD= 2.25
Is there any confounding effect?
Stratification
Coffee
Drinking
Myocardial
Infarction
Yes No
Yes 2 6
No 18 34
Coffee
Drinking
Myocardial
Infarction
Yes No
Yes 34 18
No 6 2
Coffee
Drinking
Myocardial
Infarction
Yes No
Yes 36 24
No 24 36
Smokers
Non
Smokers
Coffee
Drinking
Myocardial
Infarction
Yes No
Yes 2 6
No 18 34
Coffee
Drinking
Myocardial
Infarction
Yes No
Yes 34 18
No 6 2
Smokers
Non Smokers
OD= ?
OD = 34×2/18×6
OD= 0.63
OD= ?
OD = 2×34/6×18
OD= 0.63
Stratification
Crude OR = 2.25
Common OR = 0.63
So
There is confounding effect.
Crude and Common OR
Chance (random error)
Bias (Systematic error)
Confounding
Causation
Associations may be due to
The numerical value ascribed to these relationships is the correlation coefficient, its
value ranging between +1 (a perfect positive correlation) and −1 (a perfect negative
correlation).
Pearson’s and Spearman’s are the two best-known correlation coefficients. Of the
two, Pearson’s product–moment correlation coefficient is the most widely used, to
the extent that it has now become the default. It must nevertheless be used with
care as it is sensitive to deviations from normality, as might be the case if there were
a number of outliers.
With the correlation coefficient, the +/− sign indicates the direction of the
association, a positive correlation meaning that high values of one variable are
associated with high values of the other. A positive correlation exists between height
and weight, for example.
Conversely, when high values of one variable usually mean low values of the other
(as is the case in the relationship between obesity and physical activity), the
correlation will be negative.
Correlation coefficients
In addition to the problems related to normally distributed data, correlation
coefficients can be misused in other ways. Detailed below are some of the most
common ways in which such misuse tends to occur:
• When variables are repeatedly measured over a period of time, time trends can
lead to spurious relationships between the variables
• Violating the assumption of random samples can lead to the calculation of invalid
correlation coefficients
• Correlating the difference between two measurements to one of them leads to
error. This typically occurs in the context of attempting to correlate pre-test values to
the difference between pre-test and post-test scores. Since pre-test values figure in
both variables, a correlation is only to be expected
• Correlation coefficients are often reported as a measure of agreement between
methods of measurement whereas they are in reality only a measure of association.
For example, if one variable was always approximately twice the value of the other,
correlation would be high but agreement would be low.
 The general QUESTION:Is there a cause and effect
relationship between the presence of factor X and the
development of disease Y?
 One way of determining causation is personal
experience by directly observing a sequence of
events.
DETERMINATION OF CAUSATION
 Long latency period between exposure and disease
 Common exposure to the risk factor
 Small risk from common or uncommon exposure
 Rare disease
 Common disease
 Multiple causes of disease
Personal Experience / Insufficient
The answer is made by
inference and relies on a
summary of all valid
evidence.
Scientific Evidences
1. Replication of Findings –
 consistent in populations
2. Strength of Association –
 significant high risk
3. Temporal Sequence –
 exposure precede disease
4. Dose-Response –
 higher dose exposure, higher risk
5. Biologic Credibility –
 exposure linked to pathogenesis
6. Consideration of alternative explanations –
 the extent to which other explanations have been considered.
7. Cessation of exposure (Dynamics) –
 removal of exposure – reduces risk
8. Specificity
 specific exposure is associated with only one disease
9. Experimental evidence
Nature of Evidence:
criteria for assessing causality:
 Consistency  Strength
Specificity Temporality
Plausibility Coherence
Dose Response Analogy
Experimental evidence
Bradford Hill Criteria (1965)
 Hill stated
 None of my criteria can bring indisputable evidence for
or against the cause-and-effect hypothesis
 None can be required as sufficient alone
Bradford Hill Criteria
 Consistency
 association has been replicated in other studies
 Strength
 H. pylori is found in at least 90% of patients with
duodenal ulcer
 Temporal relationship
 11% of chronic gastritis patients go on the
develop duodenal ulcers over a 10-year period.
 Dose response
 density of H.pylori is higher in patients with
duodenal ulcer than in patients without
H. pylori
 Biologic plausibility
 originally – no biologic plausibility
 then H. pylori binding sites were found
 know H. pylori induces inflammation
 Cessation
 Eradication of H Pylori heals duodenal ulcers
H. pylori
1. Strength of Association:
 The relative risks for the association of smoking and lung
cancer are very high
2. Biologic Credibility:
 The burning of tobacco produces carcinogenic compounds
which are inhaled and come into contact with pulmonary
tissue.
SMOKING AND LUNG CANCER
3. Replication of findings:
 The association of cigarette smoke and lung cancer is found
in both sexes in all races, in all socioeconomic classes, etc.
4. Temporal Sequence:
 Cohort studies clearly demonstrate that smoking precedes
lung cancer and that lung cancer does not cause an
individual to become a cigarette smoker.
SMOKING AND LUNG CANCER
5. Dose-Response:
 The more cigarette smoke an individual inhales, over a
life-time, the greater the risk of developing lung cancer.
6. Dynamics (cessation of exposure):
 Reduction in cigarette smoking reduces the risk of
developing lung cancer.
SMOKING AND LUNG CANCER
 Smoking is cited as a cause of lung cancer, however. . .
 . . . smoking is not necessary (is not a prerequisite) to get lung cancer.
Some people get lung cancer who have never smoked.
 . . . smoking alone does not cause lung cancer. Some smokers never get
lung cancer.
 Smoking is a member of a set of factors (i.e., web of causation)
which cause lung cancer.
 The identity of all the other factors in the set are unknown. (One factor in
the web of causation is probably genetic susceptibility.)
D
E A
B
C F
B
AH
G
F
C
AJ
I
“A” is necessary – since it appears in each sufficient
causal complex
“A” is not sufficient –
Disease Not Present Disease Present Disease Present
necessary / sufficient
 necessary and sufficient
 the factor always causes disease and disease is never present
without the factor
 most infectious diseases
 necessary but not sufficient
 multiple factors are required
 cancer
 sufficient but not necessary
 many factors may cause same disease
 leukemia
 neither sufficient nor necessary
 multiple cause
necessary / sufficient
 Few causes are necessary and sufficient
 High cholesterol is neither necessary nor sufficient for CVD
because many individuals who develop CVD do not
have high serum cholesterol levels
necessary / sufficient

10-Interpretation& Causality by Mehdi Ehtesham

  • 1.
    Interpretation& Causality By: MehdiEhtesham MPH/ Epidemiologist, Avicenna Research Consulting London, UK
  • 2.
    ASSOCIATION AND CAUSATION 1) INTRODUCTION – Descriptive studies help indicate a hypothesis by studying time, place person, agent, host and environment. Suggest possible etiological hypothesis. Analytical and experimental studies test it i.e. confirm/ refute association  2) DEFINITION a) ASSOCIATION – When events occur together frequently not by chance. b) CO-RELATION -1.0 to +1.0 , however co-relation does not imply causation  3) TYPES OF ASSOCIATION a) SPURIOUS b) INDIRECT c) DIRECT i) One to one causal ii) Multifactorial
  • 3.
    Associations may bedue to Chance (random error) Bias (Systematic error) Confounding Causation
  • 4.
    Associations may bedue to Chance (random error) Bias (Systematic error) Confounding Causation
  • 6.
    Dealing with chanceerror  During design of study  Sample size  Power  During analysis (Statistical measures of chance)  Test of statistical significance (P value)  Confidence intervals
  • 7.
    Statistical measures ofchance I (Test of statistical significance) Observed association Association in Reality Yes No Yes No Type I error Type II error
  • 8.
    P-value the probability theobserved results occurred by chance the probability that an effect at least as extreme as that observed could have occurred by chance alone, given there is truly no relationship between exposure and disease (Ho) statistically non-significant results are not necessarily attributable to chance due to small sample size
  • 9.
    Statistical Power  Power= 1 – type II error  Power = 1 - ß
  • 10.
  • 11.
    Statistical measures ofchance II (Confidence intervals)
  • 12.
    Question?  20 outof 100 participants: 20%  200 out of 1000 participants: 20%  2000 out of 10000 participants: 20%  What is the difference?
  • 13.
    Answer: Confidence Interval Definition: A range of values for a variable of interest constructed so that this range has a specified probability of including the true value of the variable for the population  Characteristics:  a measure of the precision (stability) of an observed effect  the range within which the true magnitude of effect lies with a particular degree of certainty  95% C.I. means that true estimate of effect (mean, risk, rate) lies within 2 standard errors of the population mean 95 times out of 100  Confidence intervals get smaller (i.e. more precise or more certain) if the underlying data have less variation/scatter  Confidence intervals get smaller if there are more people in your sample
  • 14.
    95% Confidence Interval(95% CI)  20 out of 100 participants: 20% 95% CI: 12 to 28  80 out of 400 participants: 20% 95% CI: 16 to 24  2000 out of 10000 participants: 20% 95% CI: 19.2 to 20.8
  • 15.
    Associations may bedue to Chance (random error) Bias (Systematic error) Confounding Causation
  • 16.
    Bias  Any systematicerror that results in an incorrect estimate of the association between risk factors and outcome
  • 17.
    Types of Bias Selectionbias – identification of individual subjects for inclusion in study on the basis of either exposure or disease status depends in some way on the other axis of interest Observation (information) bias – results from systematic differences in the way data on exposure or outcome are obtained from the various study groups
  • 18.
    Selection Bias  Self-selectionbias  Healthy (or diseased) people may seek out participation in the study  Referral bias  Sicker patients are referred to major health centers  Diagnostic bias  Diagnosis of outcome associated with exposure  Non-response bias  Response, or lack of it, depends on exposure  Differential loss to follow-up  Exposed (or unexposed) group followed with different intensity  Berkson’s bias  Hospitalization rates differ for by disease and presence/absence of the exposure of interest
  • 19.
    Information Bias  Datacollection bias  Bias that results from abstracting charts, interviews or surrogate interviews  Recall bias  Disease occurrence enhances recall about potential exposures  Surveillance or detection bias  The exposure promoted more careful evaluation for the outcome of interest  Reporting bias  Exposure may be under-reported because of attitudes, perceptions, or beliefs
  • 20.
    Bias results fromsystematic flaws  study design,  data collection,  analysis  interpretation of results
  • 21.
    Control of Bias Can only be prevented and controlled during the design and conduct of a study:  Careful planning of measurements  Formal assessments of validity  Regular calibration of instruments  Training of data collection personnel
  • 22.
    Confounding  Confounding resultswhen the effect of an exposure on the disease (or outcome) is distorted because of the association of exposure with other factor(s) that influence the outcome under study.
  • 23.
    Confounding Coffee MI Smoking, Stress Observed association,presumed causation Observed association True association
  • 24.
    Properties of aconfounder A confounding factor must be a risk factor for the disease.  The confounding factor must be associated with the exposure under study in the source population. The confounder cannot be an intermediate step in the causal path between the exposure and the disease.
  • 25.
    Control of Confounding During design of study  Restriction  Matching  Randomization  During analysis  Stratified analysis  Multivariate analysis
  • 26.
    Confounding Coffee MI Smoking, Stress Observed association,presumed causation Observed association True association
  • 27.
    Coffee drinking Myocardial Infarction YesNo Yes 36 24 No 24 36 OD= ? OD = 36×36/24×24 OD= 2.25 Is there any confounding effect?
  • 28.
    Stratification Coffee Drinking Myocardial Infarction Yes No Yes 26 No 18 34 Coffee Drinking Myocardial Infarction Yes No Yes 34 18 No 6 2 Coffee Drinking Myocardial Infarction Yes No Yes 36 24 No 24 36 Smokers Non Smokers
  • 29.
    Coffee Drinking Myocardial Infarction Yes No Yes 26 No 18 34 Coffee Drinking Myocardial Infarction Yes No Yes 34 18 No 6 2 Smokers Non Smokers OD= ? OD = 34×2/18×6 OD= 0.63 OD= ? OD = 2×34/6×18 OD= 0.63 Stratification
  • 30.
    Crude OR =2.25 Common OR = 0.63 So There is confounding effect. Crude and Common OR
  • 31.
    Chance (random error) Bias(Systematic error) Confounding Causation Associations may be due to
  • 34.
    The numerical valueascribed to these relationships is the correlation coefficient, its value ranging between +1 (a perfect positive correlation) and −1 (a perfect negative correlation). Pearson’s and Spearman’s are the two best-known correlation coefficients. Of the two, Pearson’s product–moment correlation coefficient is the most widely used, to the extent that it has now become the default. It must nevertheless be used with care as it is sensitive to deviations from normality, as might be the case if there were a number of outliers. With the correlation coefficient, the +/− sign indicates the direction of the association, a positive correlation meaning that high values of one variable are associated with high values of the other. A positive correlation exists between height and weight, for example. Conversely, when high values of one variable usually mean low values of the other (as is the case in the relationship between obesity and physical activity), the correlation will be negative. Correlation coefficients
  • 35.
    In addition tothe problems related to normally distributed data, correlation coefficients can be misused in other ways. Detailed below are some of the most common ways in which such misuse tends to occur: • When variables are repeatedly measured over a period of time, time trends can lead to spurious relationships between the variables • Violating the assumption of random samples can lead to the calculation of invalid correlation coefficients • Correlating the difference between two measurements to one of them leads to error. This typically occurs in the context of attempting to correlate pre-test values to the difference between pre-test and post-test scores. Since pre-test values figure in both variables, a correlation is only to be expected • Correlation coefficients are often reported as a measure of agreement between methods of measurement whereas they are in reality only a measure of association. For example, if one variable was always approximately twice the value of the other, correlation would be high but agreement would be low.
  • 36.
     The generalQUESTION:Is there a cause and effect relationship between the presence of factor X and the development of disease Y?  One way of determining causation is personal experience by directly observing a sequence of events. DETERMINATION OF CAUSATION
  • 37.
     Long latencyperiod between exposure and disease  Common exposure to the risk factor  Small risk from common or uncommon exposure  Rare disease  Common disease  Multiple causes of disease Personal Experience / Insufficient
  • 38.
    The answer ismade by inference and relies on a summary of all valid evidence. Scientific Evidences
  • 39.
    1. Replication ofFindings –  consistent in populations 2. Strength of Association –  significant high risk 3. Temporal Sequence –  exposure precede disease 4. Dose-Response –  higher dose exposure, higher risk 5. Biologic Credibility –  exposure linked to pathogenesis 6. Consideration of alternative explanations –  the extent to which other explanations have been considered. 7. Cessation of exposure (Dynamics) –  removal of exposure – reduces risk 8. Specificity  specific exposure is associated with only one disease 9. Experimental evidence Nature of Evidence:
  • 40.
    criteria for assessingcausality:  Consistency  Strength Specificity Temporality Plausibility Coherence Dose Response Analogy Experimental evidence Bradford Hill Criteria (1965)
  • 41.
     Hill stated None of my criteria can bring indisputable evidence for or against the cause-and-effect hypothesis  None can be required as sufficient alone Bradford Hill Criteria
  • 42.
     Consistency  associationhas been replicated in other studies  Strength  H. pylori is found in at least 90% of patients with duodenal ulcer  Temporal relationship  11% of chronic gastritis patients go on the develop duodenal ulcers over a 10-year period.  Dose response  density of H.pylori is higher in patients with duodenal ulcer than in patients without H. pylori
  • 43.
     Biologic plausibility originally – no biologic plausibility  then H. pylori binding sites were found  know H. pylori induces inflammation  Cessation  Eradication of H Pylori heals duodenal ulcers H. pylori
  • 44.
    1. Strength ofAssociation:  The relative risks for the association of smoking and lung cancer are very high 2. Biologic Credibility:  The burning of tobacco produces carcinogenic compounds which are inhaled and come into contact with pulmonary tissue. SMOKING AND LUNG CANCER
  • 45.
    3. Replication offindings:  The association of cigarette smoke and lung cancer is found in both sexes in all races, in all socioeconomic classes, etc. 4. Temporal Sequence:  Cohort studies clearly demonstrate that smoking precedes lung cancer and that lung cancer does not cause an individual to become a cigarette smoker. SMOKING AND LUNG CANCER
  • 46.
    5. Dose-Response:  Themore cigarette smoke an individual inhales, over a life-time, the greater the risk of developing lung cancer. 6. Dynamics (cessation of exposure):  Reduction in cigarette smoking reduces the risk of developing lung cancer. SMOKING AND LUNG CANCER
  • 47.
     Smoking iscited as a cause of lung cancer, however. . .  . . . smoking is not necessary (is not a prerequisite) to get lung cancer. Some people get lung cancer who have never smoked.  . . . smoking alone does not cause lung cancer. Some smokers never get lung cancer.  Smoking is a member of a set of factors (i.e., web of causation) which cause lung cancer.  The identity of all the other factors in the set are unknown. (One factor in the web of causation is probably genetic susceptibility.)
  • 48.
    D E A B C F B AH G F C AJ I “A”is necessary – since it appears in each sufficient causal complex “A” is not sufficient – Disease Not Present Disease Present Disease Present necessary / sufficient
  • 49.
     necessary andsufficient  the factor always causes disease and disease is never present without the factor  most infectious diseases  necessary but not sufficient  multiple factors are required  cancer  sufficient but not necessary  many factors may cause same disease  leukemia  neither sufficient nor necessary  multiple cause necessary / sufficient
  • 50.
     Few causesare necessary and sufficient  High cholesterol is neither necessary nor sufficient for CVD because many individuals who develop CVD do not have high serum cholesterol levels necessary / sufficient