SlideShare a Scribd company logo
1 of 11
Download to read offline
Biostatistics 2002, ÅS
1
Lecture notes on epidemiology
The aims of epidemiology:
Epidemiology is the study of the distribution and determinants of health related states
or events in specified populations and the application of this study to control health
problems (Mould).
Typically one tries to find associations between risk factors (exposures) and diseases
by studying the occurrence of diseases in populations. The final aim is to conclude
casual relations.
Examples of epidemiological findings (from Rothman):
• Smoking and lung cancer
• Smoking and other health related events
• Passive smoking
• Low-level ionising radiation and leukaemia
• Saccharin and bladder cancer
• Swine flu and Guillain-Barré syndrome
• The effect of diethylstilbestrol on offspring
• Tampons and toxic-shock syndrome
• Coffee drinking and pancreatic cancer
Epidemiological studies are not generally amenable to being investigated by
randomised trials and observational studies are therefore the more practical to study
factors and exposure which can not be controlled by the investigators (Mould).
Biostatistics 2002, ÅS
2
Hill’s criteria for distinguishing between causal and non-causal associations:
1. Strength (the association should be sufficiently strong)
2. Consistency (it should be seen in different populations under different
circumstances)
3. Specificity (a cause should be specific to its effect, e.g. smoking leads to a
specific form of lung cancers)
4. Temporality (cause precedes effect)
5. Biological gradient (greater dose should give larger effect)
6. Plausibility (seen in animal models)
7. Coherence (not in conflict to known biology)
8. Experimental evidence (if possible a randomised experiment should be carried
through)
9. Analogy (the cause make sense)
Type of studies:
Studies may be classified as prospective or retrospective. In a prospective study
measures of exposures and covariates are made before illness occur. In a
retrospective study these measurements are made after the case have already
occurred.
• Cross-sectional study
A study that includes all persons, in the population, at the time of
ascertainment, or a representative sample of such persons. Disease and
exposure are observed simultaneously.
Biostatistics 2002, ÅS
3
• Cohort study
Two groups (or more) groups of people that are free from the disease and
that differ according to the extent of their exposure to a potential cause of
the diseases are studied during a time interval.
• Case-Control study
A case control study includes people with a disease and a suitable control
group of people unaffected by the disease. The occurrence of the possible
cause (exposure) is compared between cases and controls. A relevant
situation is to think of a case-control study as a cohort study where
exposure is investigated for all cases and for a sample of the non-cases.
Measures of disease frequency:
Prevalence is the number of cases of a disease in a defined population. The
prevalence can be measured by a prevalence rate, i.e., the proportion of the
population that has the disease at a specific time. This can be measured by in a cross-
sectional study.
Incidence is the number of new cases of disease in a population. The incidence can
only be measured by considering a population during a time interval. It can be
measured by the incidence rate. The incidence rate is defined as
(The number of new cases in the population during the time considered)/(the total
number of years that people are under risk do develop the disease).
Cumulative incidence is the number of people in a cohort that gets the disease
during the time of study.
The relation between prevalence and incidence is somewhat complicated. It depends
on the duration of the disease state. An approximate relation is
P = I D
Where D is the mean duration which diseased people carry the disease.
Biostatistics 2002, ÅS
4
If D is increased (e.g. the mortality is lowered by new medicines or if the disease is
diagnosed at an earlier state) the prevalence rate increases even if the incidence rate
is the same.
Epidemiological are most often concerned with the study of incidences. The main
aim is to compare incidences for exposed and non-exposed people.
Measures of effect and association:
Incidence rates in exposed and non-exposed people can be compared in several way.
Obvious alternatives are differences and ratios. Let I1 be the incidence rate for
exposed people and I0 be the incidence rate of non-exposed people. The the two
alternatives are
I1 - I0
and
I1/I0.
If p1 is the probability that an exposed person gets ill during a time interval and p0 is
the corresponding probability that an unexposed person gets ill then we can measure
the difference either as a risk difference
p1 - p0,
Or a risk ratio
p1/p0.
Still another alternative is the odds ratio
{p1/(1-p1)} /{p0/(1-p0)}.
These measures gives different results in all cases other then p1 = p0. In that case the
difference is 0 and the risk ratio and the odds ratio are both equal to 1.
It should be observed that these comparisons say little of the absolute values of the
risk.
Other measures (which relates to a certain population) is attributable risk or
etiological fraction.
Biostatistics 2002, ÅS
5
Statistical analysis:
One strata:
In the simplest situation the outcome of a cohort study can be summarized in a 2x2-
table:
exposed Not exposed
ill n11 n12 n1.
Not ill n21 n22 n2.
n-1 n-2 n..
An interesting hypothesis is that the risks of getting ill are the same for exposed and
non-exposed people. This corresponds to a χ2
-test of homogeneity (or Fisher’s exact
test). If this hypothesis is rejected it is of interest to estimate the effect of the
exposure. One effect measure is the odds ratio. The odds-ratio is estimated by the
cross-product ratio:
2112
2211
nn
nn
Most common is however to estimate the logarithm of the odds ratio. The ML-
estimate is
)ln(n)ln(n)ln(n)ln(n 22211211 +−−
If the observations in the cells are large it is possible to prove that this estimate is
asymptotically normal distributed with the true value of the logarithm of the odds
ratio as mean and an asymptotic variance that can be estimated by
22211211 1/n1/n1/n1/n +++
This can be used to calculate an approximate confidence interval for the logarithm of
the odds ratio by the following formula
2221121122211211 1/n1/n1/n1/nz)ln(n)ln(n)ln(n)ln(n +++±+−− α
From this an approximate confidence region for the log odds ratio can be derived as
Biostatistics 2002, ÅS
6
( )22211211
2112
2211
1/n1/n1/n1/nzexp
nn
nn
+++± α
The reason for calculate estimates and confidence intervals for log odds instead of
the odds directly is that the normal approximation often works better for the log of
the cross-product ratio than for the cross-product ratio itself.
In a case-control study the observations can also be summarized in a 2x2 table, but
with a different interpretation:
exposed Not exposed
cases n11 n12 n1.
controls n21 n22 n2.
n-1 n-2 n..
The same analysis can be applied. Observe that a in a case-control study you can
estimate the odds-ratio but not the risk ratio. However if the risk is small these
measure are almost identical.
Several strata:
Sometimes the analysis has to be stratified into different groups of people. In that
case we have to summarize the observations into several 2x2-tables (one for each
strata). The reason for this is that there may be some other factor that relates to the
exposure and to the disease that we want to control for. Making a separate analysis
for strata in which such factors are constant does this. Typical stratification variables
are sex, age, smoking, drinking and other life style factors. Such factor may either
influence the effect as being either effect modifiers or confounders.
Confounding is a systematic bias in the estimate of the effect that follows from that
the cases and controls differs with regard to a risk factor not considered in the study.
The analysis have to consider this possibility as is seen by the following example
Stratum 1
Exposed Not exposed
Cases 50 50 100
Controls 20 80 100
70 130 200
Biostatistics 2002, ÅS
7
This gives the estimated odds ratio 4.0
Stratum 2
Exposed Not exposed
Cases 8 92 100
Controls 2 98 100
10 190 200
This gives the estimated odds ratio 4.3
Summing these two tables gives
Strata 1 + 2
Exposed Not exposed
Cases 58 142 200
Controls 22 178 200
80 320 400
Which gives the estimated odds ratio 3.3.
That it not possible to do a fair result by just summarising (collapsing) over strata is
referred to as non-collapsibility. It is easy to construct other examples that give even
more deviating estimates.
If the effect is different in different strata this effect modification has to be described
and analysed. Formulating a model for how the effect is modified by various
covariates may do this. If the effect is the same in all strata there is need for a
statistical method to do testing and estimating simultaneously for several 2x2-tables:
The null hypothesis of no effect in any of the strata can be tested by a so-called
Mantel-Haenszel test.
Biostatistics 2002, ÅS
8
Stratum 1:
exposed Not exposed
cases a1 b1
controls c1 d1
.
.
.
Stratum i
exposed Not exposed
cases ai bi
controls ci di
.
.
.
Stratum n
exposed Not exposed
cases an bn
controls cn dn
The observed number of exposed cases is
∑=
i
iaO
The expected number of exposed cases is (according to hypergeometric distribution
in each strata)
( )( )
∑ +++
++
=
iiii
iiii
dcba
caba
E
The variance of O is (according to the hypergeometric distribution)
( )( )( )( )
( )∑ −
++++
=
i i
2
i
iiiiiiii
1tt
dbcadcba
V
Biostatistics 2002, ÅS
9
Where
iiiii dcbat +++=
The null hypothesis of no effect in all strata is rejected if
( )
V
EO
2
−
is large compared to a χ2
(1)-percentile.
Matched case-control studies:
In order to minimise the possibility of introducing confounders epidemiological
studies are sometimes matched. This means that controls are selected specific to each
case (or a group of cases) in a way that makes them as close to the cases as possible
as regards covariates that are not under study. An typical examples of this is given in
Rice section 13.5. This describes a 1-1 matching. It is also possible to have more
controls, so called 1-m matching. Data from matched studies should be analysed by
methods for simultaneous analysis of several 2x2-tables. Two remarks:
• In matched studies the precision gained by increasing the number of controls
to each cases decreases rather fast.
• There may be a disadvantage of matched studies since it prevents any study of
the effect of the variables that has been used to define the matching.
Analysis of matched case-control studies
The methods for estimating matched case-control studies are the same as for
analysing several 2x2-tables where same effect is assumed in all strata. In a matched
study each case define a stratum. An estimate of the common effect is obtained
through the Mantel-Haenszel estimator:
∑
∑
i
iii
i
iii
t/cb
t/da
Biostatistics 2002, ÅS
10
Logistic regression
In many situation covariates can be attach to each individual under study. This means
that we do not only observe the Bernoulli variable Y which can take the values 0 and
1 according to if the individual is diseases or not but we also have other
measurements (x1, …,xk) which may influence the risk of getting ill. In such cases
there is need for an model that describes the influence of the covariates (independent
variables) on the distribution of Y. Such a model which is similar to normal linear
regresson assumes that
( )
( )kk110
kk110
x...xexp1
x...xexp
)1YPr(
β++β+β+
β++β+β
==
This model is called a logistic regression model. Observe that the log odds is a linear
combination of the covariates, i.e.,
.xx
)0YPr(
)1YPr(
ln kk110 β++β+β=





=
=
K
One of the covariates may be exposure, either defined as a 0/1-variable (exposed or
not) or as a dose (that measure the amount of exposures. In such case the
corresponding regression coefficient measures the effect of the exposure. In fact the
estimate of the regression coefficient is the logarithm of the odds ratio effect of
exposure that differs with one unit of the scale in which the exposure is measured in.
Conditional logistic regression
In matched case control studies we have a case with corresponding covariates and
one or more controls with possible different covariates. We can then use a logistic
regression model to calculate a conditional probability that says how propable it is
that among a number of people (the case and the controls) with a cetain set of
covariate values it is that just the case have the disease under study.
Let consider a 1-1 matching situation and assume that the case has covariates x and
the control covariate z. (For simplicity we assume a one-dimensional covariate.)
Then we can calculate the conditional probability that the person with covariates x
have the disease given that exactly one of the two persons have the disease.
Elementary (but not simple) calculations give the result
Biostatistics 2002, ÅS
11
( )
( ) ( )
.
zexpxexp
xexp
11
1
β+β
β
The statistical analysis of the matched data can be analysed using these probabilities
to built up a likelihood. ML estimates can be found and likelihood ratio test derived.
Possible sources of Bias in epidemiological studies:
• Confounding
• Selection bias (e.g. Healthy worker effect)
• Recall Bias (cases may report exposures more accurately than controls)
• Publication Bias (significant results tends to be published easier than studies
with significance is not found)

More Related Content

What's hot

Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysissirjana Tiwari
 
Fundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFarazaJaved
 
Case control studies
Case control studiesCase control studies
Case control studiesBruno Mmassy
 
Case control & cohort study
Case control & cohort studyCase control & cohort study
Case control & cohort studyBhumika Bhatt
 
Analytical epidemiology
Analytical  epidemiologyAnalytical  epidemiology
Analytical epidemiologyb_bhushan
 
Case control study
Case control study   Case control study
Case control study swati shikha
 
Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology coolboy101pk
 
5.principles and methods of epidemiology
5.principles and methods of epidemiology5.principles and methods of epidemiology
5.principles and methods of epidemiologyRajeev Kumar
 
3. descriptive study
3. descriptive study3. descriptive study
3. descriptive studyNaveen Phuyal
 
Case Control Studies
Case Control StudiesCase Control Studies
Case Control StudiesRachel Walden
 
A selection of slides with some definitions from A dictionary of epidemiology
A selection of slides with some definitions from A dictionary of epidemiologyA selection of slides with some definitions from A dictionary of epidemiology
A selection of slides with some definitions from A dictionary of epidemiologyMiquelPorta2
 
Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiologyDalia El-Shafei
 

What's hot (20)

Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysis
 
Fundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFundamentals and Study Design of Epidemiology
Fundamentals and Study Design of Epidemiology
 
Case control studies
Case control studiesCase control studies
Case control studies
 
Case control & cohort study
Case control & cohort studyCase control & cohort study
Case control & cohort study
 
Prevalence and incidence
Prevalence and incidencePrevalence and incidence
Prevalence and incidence
 
08 case control studies
08 case control studies08 case control studies
08 case control studies
 
Analytical epidemiology
Analytical  epidemiologyAnalytical  epidemiology
Analytical epidemiology
 
ANALYTICAL EPIDEMIOLOGY
 ANALYTICAL EPIDEMIOLOGY  ANALYTICAL EPIDEMIOLOGY
ANALYTICAL EPIDEMIOLOGY
 
Case control study
Case control study   Case control study
Case control study
 
Experimental epidemology
Experimental epidemologyExperimental epidemology
Experimental epidemology
 
10 information bias
10 information bias10 information bias
10 information bias
 
Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology
 
Epidemiology
EpidemiologyEpidemiology
Epidemiology
 
Epidemiological statistics I
Epidemiological statistics IEpidemiological statistics I
Epidemiological statistics I
 
5.principles and methods of epidemiology
5.principles and methods of epidemiology5.principles and methods of epidemiology
5.principles and methods of epidemiology
 
3. descriptive study
3. descriptive study3. descriptive study
3. descriptive study
 
EPIDEMIOLOGICAL RESEARCH DESIGN
EPIDEMIOLOGICAL RESEARCH DESIGNEPIDEMIOLOGICAL RESEARCH DESIGN
EPIDEMIOLOGICAL RESEARCH DESIGN
 
Case Control Studies
Case Control StudiesCase Control Studies
Case Control Studies
 
A selection of slides with some definitions from A dictionary of epidemiology
A selection of slides with some definitions from A dictionary of epidemiologyA selection of slides with some definitions from A dictionary of epidemiology
A selection of slides with some definitions from A dictionary of epidemiology
 
Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiology
 

Similar to epid2-1

Case-control study un.uob.pptx
Case-control study un.uob.pptxCase-control study un.uob.pptx
Case-control study un.uob.pptxKifluKumera
 
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 Frequency Measures Used in EpidemiologyIntroductionIn e.docx Frequency Measures Used in EpidemiologyIntroductionIn e.docx
Frequency Measures Used in EpidemiologyIntroductionIn e.docxMARRY7
 
analyticalstudydesignscasecontrolstudy-160305174642.pdf
analyticalstudydesignscasecontrolstudy-160305174642.pdfanalyticalstudydesignscasecontrolstudy-160305174642.pdf
analyticalstudydesignscasecontrolstudy-160305174642.pdfEhsan Larik
 
epidemiology with part 2 (complete) 2.ppt
epidemiology with part 2 (complete)   2.pptepidemiology with part 2 (complete)   2.ppt
epidemiology with part 2 (complete) 2.pptAmosWafula3
 
Epidemiology Concepts
Epidemiology ConceptsEpidemiology Concepts
Epidemiology ConceptsMANULALVS
 
Lecture of epidemiology
Lecture of epidemiologyLecture of epidemiology
Lecture of epidemiologyAmany El-seoud
 
Quantitative Methods.pptx
Quantitative Methods.pptxQuantitative Methods.pptx
Quantitative Methods.pptxKhem21
 
Case control study
Case control studyCase control study
Case control studyswati shikha
 
observational analytical study
observational analytical studyobservational analytical study
observational analytical studyDr. Partha Sarkar
 
Role of epidemiology & statistics in
Role of epidemiology & statistics inRole of epidemiology & statistics in
Role of epidemiology & statistics inmanishashrivastava9
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studiesMrinmoy Bharadwaz
 
epidemiological approach.pptx
epidemiological approach.pptxepidemiological approach.pptx
epidemiological approach.pptxSuraj Pande
 
Study design used in pharmacoepidemiology
Study design used in pharmacoepidemiology Study design used in pharmacoepidemiology
Study design used in pharmacoepidemiology kamolwantnok
 
What is epidemiology?.pdf
What is epidemiology?.pdfWhat is epidemiology?.pdf
What is epidemiology?.pdfMdNayem35
 
Statistics and biostatistics
Statistics and biostatisticsStatistics and biostatistics
Statistics and biostatisticsMostafa Farghaly
 

Similar to epid2-1 (20)

Case-control study un.uob.pptx
Case-control study un.uob.pptxCase-control study un.uob.pptx
Case-control study un.uob.pptx
 
Epidemiology Depuk sir_ 1,2,3 chapter,OK
Epidemiology Depuk sir_ 1,2,3 chapter,OKEpidemiology Depuk sir_ 1,2,3 chapter,OK
Epidemiology Depuk sir_ 1,2,3 chapter,OK
 
Case control study
Case control studyCase control study
Case control study
 
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 Frequency Measures Used in EpidemiologyIntroductionIn e.docx Frequency Measures Used in EpidemiologyIntroductionIn e.docx
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 
analyticalstudydesignscasecontrolstudy-160305174642.pdf
analyticalstudydesignscasecontrolstudy-160305174642.pdfanalyticalstudydesignscasecontrolstudy-160305174642.pdf
analyticalstudydesignscasecontrolstudy-160305174642.pdf
 
epidemiology with part 2 (complete) 2.ppt
epidemiology with part 2 (complete)   2.pptepidemiology with part 2 (complete)   2.ppt
epidemiology with part 2 (complete) 2.ppt
 
Epidemiology Concepts
Epidemiology ConceptsEpidemiology Concepts
Epidemiology Concepts
 
Lecture of epidemiology
Lecture of epidemiologyLecture of epidemiology
Lecture of epidemiology
 
Quantitative Methods.pptx
Quantitative Methods.pptxQuantitative Methods.pptx
Quantitative Methods.pptx
 
Case control study
Case control studyCase control study
Case control study
 
observational analytical study
observational analytical studyobservational analytical study
observational analytical study
 
Role of epidemiology & statistics in
Role of epidemiology & statistics inRole of epidemiology & statistics in
Role of epidemiology & statistics in
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studies
 
epidemiological approach.pptx
epidemiological approach.pptxepidemiological approach.pptx
epidemiological approach.pptx
 
Epidemiology of periodontal disease
Epidemiology of periodontal diseaseEpidemiology of periodontal disease
Epidemiology of periodontal disease
 
Study design used in pharmacoepidemiology
Study design used in pharmacoepidemiology Study design used in pharmacoepidemiology
Study design used in pharmacoepidemiology
 
What is epidemiology?.pdf
What is epidemiology?.pdfWhat is epidemiology?.pdf
What is epidemiology?.pdf
 
Epidemiological studies
Epidemiological studiesEpidemiological studies
Epidemiological studies
 
Statistics and biostatistics
Statistics and biostatisticsStatistics and biostatistics
Statistics and biostatistics
 
Case control study
Case control studyCase control study
Case control study
 

More from Abebaw Miskir

Lectures_all_cesme2011
Lectures_all_cesme2011Lectures_all_cesme2011
Lectures_all_cesme2011Abebaw Miskir
 
FundamentalsOfEpidemiology
FundamentalsOfEpidemiologyFundamentalsOfEpidemiology
FundamentalsOfEpidemiologyAbebaw Miskir
 
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)Abebaw Miskir
 

More from Abebaw Miskir (7)

Lectures_all_cesme2011
Lectures_all_cesme2011Lectures_all_cesme2011
Lectures_all_cesme2011
 
lecture9
lecture9lecture9
lecture9
 
FundamentalsOfEpidemiology
FundamentalsOfEpidemiologyFundamentalsOfEpidemiology
FundamentalsOfEpidemiology
 
Epidemiology
EpidemiologyEpidemiology
Epidemiology
 
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)
[Neil_Pearce_Centre_for_Public_Health_Research__Ma(BookFi)
 
GHA-7-22443
GHA-7-22443GHA-7-22443
GHA-7-22443
 
GHA-7-22443
GHA-7-22443GHA-7-22443
GHA-7-22443
 

epid2-1

  • 1. Biostatistics 2002, ÅS 1 Lecture notes on epidemiology The aims of epidemiology: Epidemiology is the study of the distribution and determinants of health related states or events in specified populations and the application of this study to control health problems (Mould). Typically one tries to find associations between risk factors (exposures) and diseases by studying the occurrence of diseases in populations. The final aim is to conclude casual relations. Examples of epidemiological findings (from Rothman): • Smoking and lung cancer • Smoking and other health related events • Passive smoking • Low-level ionising radiation and leukaemia • Saccharin and bladder cancer • Swine flu and Guillain-Barré syndrome • The effect of diethylstilbestrol on offspring • Tampons and toxic-shock syndrome • Coffee drinking and pancreatic cancer Epidemiological studies are not generally amenable to being investigated by randomised trials and observational studies are therefore the more practical to study factors and exposure which can not be controlled by the investigators (Mould).
  • 2. Biostatistics 2002, ÅS 2 Hill’s criteria for distinguishing between causal and non-causal associations: 1. Strength (the association should be sufficiently strong) 2. Consistency (it should be seen in different populations under different circumstances) 3. Specificity (a cause should be specific to its effect, e.g. smoking leads to a specific form of lung cancers) 4. Temporality (cause precedes effect) 5. Biological gradient (greater dose should give larger effect) 6. Plausibility (seen in animal models) 7. Coherence (not in conflict to known biology) 8. Experimental evidence (if possible a randomised experiment should be carried through) 9. Analogy (the cause make sense) Type of studies: Studies may be classified as prospective or retrospective. In a prospective study measures of exposures and covariates are made before illness occur. In a retrospective study these measurements are made after the case have already occurred. • Cross-sectional study A study that includes all persons, in the population, at the time of ascertainment, or a representative sample of such persons. Disease and exposure are observed simultaneously.
  • 3. Biostatistics 2002, ÅS 3 • Cohort study Two groups (or more) groups of people that are free from the disease and that differ according to the extent of their exposure to a potential cause of the diseases are studied during a time interval. • Case-Control study A case control study includes people with a disease and a suitable control group of people unaffected by the disease. The occurrence of the possible cause (exposure) is compared between cases and controls. A relevant situation is to think of a case-control study as a cohort study where exposure is investigated for all cases and for a sample of the non-cases. Measures of disease frequency: Prevalence is the number of cases of a disease in a defined population. The prevalence can be measured by a prevalence rate, i.e., the proportion of the population that has the disease at a specific time. This can be measured by in a cross- sectional study. Incidence is the number of new cases of disease in a population. The incidence can only be measured by considering a population during a time interval. It can be measured by the incidence rate. The incidence rate is defined as (The number of new cases in the population during the time considered)/(the total number of years that people are under risk do develop the disease). Cumulative incidence is the number of people in a cohort that gets the disease during the time of study. The relation between prevalence and incidence is somewhat complicated. It depends on the duration of the disease state. An approximate relation is P = I D Where D is the mean duration which diseased people carry the disease.
  • 4. Biostatistics 2002, ÅS 4 If D is increased (e.g. the mortality is lowered by new medicines or if the disease is diagnosed at an earlier state) the prevalence rate increases even if the incidence rate is the same. Epidemiological are most often concerned with the study of incidences. The main aim is to compare incidences for exposed and non-exposed people. Measures of effect and association: Incidence rates in exposed and non-exposed people can be compared in several way. Obvious alternatives are differences and ratios. Let I1 be the incidence rate for exposed people and I0 be the incidence rate of non-exposed people. The the two alternatives are I1 - I0 and I1/I0. If p1 is the probability that an exposed person gets ill during a time interval and p0 is the corresponding probability that an unexposed person gets ill then we can measure the difference either as a risk difference p1 - p0, Or a risk ratio p1/p0. Still another alternative is the odds ratio {p1/(1-p1)} /{p0/(1-p0)}. These measures gives different results in all cases other then p1 = p0. In that case the difference is 0 and the risk ratio and the odds ratio are both equal to 1. It should be observed that these comparisons say little of the absolute values of the risk. Other measures (which relates to a certain population) is attributable risk or etiological fraction.
  • 5. Biostatistics 2002, ÅS 5 Statistical analysis: One strata: In the simplest situation the outcome of a cohort study can be summarized in a 2x2- table: exposed Not exposed ill n11 n12 n1. Not ill n21 n22 n2. n-1 n-2 n.. An interesting hypothesis is that the risks of getting ill are the same for exposed and non-exposed people. This corresponds to a χ2 -test of homogeneity (or Fisher’s exact test). If this hypothesis is rejected it is of interest to estimate the effect of the exposure. One effect measure is the odds ratio. The odds-ratio is estimated by the cross-product ratio: 2112 2211 nn nn Most common is however to estimate the logarithm of the odds ratio. The ML- estimate is )ln(n)ln(n)ln(n)ln(n 22211211 +−− If the observations in the cells are large it is possible to prove that this estimate is asymptotically normal distributed with the true value of the logarithm of the odds ratio as mean and an asymptotic variance that can be estimated by 22211211 1/n1/n1/n1/n +++ This can be used to calculate an approximate confidence interval for the logarithm of the odds ratio by the following formula 2221121122211211 1/n1/n1/n1/nz)ln(n)ln(n)ln(n)ln(n +++±+−− α From this an approximate confidence region for the log odds ratio can be derived as
  • 6. Biostatistics 2002, ÅS 6 ( )22211211 2112 2211 1/n1/n1/n1/nzexp nn nn +++± α The reason for calculate estimates and confidence intervals for log odds instead of the odds directly is that the normal approximation often works better for the log of the cross-product ratio than for the cross-product ratio itself. In a case-control study the observations can also be summarized in a 2x2 table, but with a different interpretation: exposed Not exposed cases n11 n12 n1. controls n21 n22 n2. n-1 n-2 n.. The same analysis can be applied. Observe that a in a case-control study you can estimate the odds-ratio but not the risk ratio. However if the risk is small these measure are almost identical. Several strata: Sometimes the analysis has to be stratified into different groups of people. In that case we have to summarize the observations into several 2x2-tables (one for each strata). The reason for this is that there may be some other factor that relates to the exposure and to the disease that we want to control for. Making a separate analysis for strata in which such factors are constant does this. Typical stratification variables are sex, age, smoking, drinking and other life style factors. Such factor may either influence the effect as being either effect modifiers or confounders. Confounding is a systematic bias in the estimate of the effect that follows from that the cases and controls differs with regard to a risk factor not considered in the study. The analysis have to consider this possibility as is seen by the following example Stratum 1 Exposed Not exposed Cases 50 50 100 Controls 20 80 100 70 130 200
  • 7. Biostatistics 2002, ÅS 7 This gives the estimated odds ratio 4.0 Stratum 2 Exposed Not exposed Cases 8 92 100 Controls 2 98 100 10 190 200 This gives the estimated odds ratio 4.3 Summing these two tables gives Strata 1 + 2 Exposed Not exposed Cases 58 142 200 Controls 22 178 200 80 320 400 Which gives the estimated odds ratio 3.3. That it not possible to do a fair result by just summarising (collapsing) over strata is referred to as non-collapsibility. It is easy to construct other examples that give even more deviating estimates. If the effect is different in different strata this effect modification has to be described and analysed. Formulating a model for how the effect is modified by various covariates may do this. If the effect is the same in all strata there is need for a statistical method to do testing and estimating simultaneously for several 2x2-tables: The null hypothesis of no effect in any of the strata can be tested by a so-called Mantel-Haenszel test.
  • 8. Biostatistics 2002, ÅS 8 Stratum 1: exposed Not exposed cases a1 b1 controls c1 d1 . . . Stratum i exposed Not exposed cases ai bi controls ci di . . . Stratum n exposed Not exposed cases an bn controls cn dn The observed number of exposed cases is ∑= i iaO The expected number of exposed cases is (according to hypergeometric distribution in each strata) ( )( ) ∑ +++ ++ = iiii iiii dcba caba E The variance of O is (according to the hypergeometric distribution) ( )( )( )( ) ( )∑ − ++++ = i i 2 i iiiiiiii 1tt dbcadcba V
  • 9. Biostatistics 2002, ÅS 9 Where iiiii dcbat +++= The null hypothesis of no effect in all strata is rejected if ( ) V EO 2 − is large compared to a χ2 (1)-percentile. Matched case-control studies: In order to minimise the possibility of introducing confounders epidemiological studies are sometimes matched. This means that controls are selected specific to each case (or a group of cases) in a way that makes them as close to the cases as possible as regards covariates that are not under study. An typical examples of this is given in Rice section 13.5. This describes a 1-1 matching. It is also possible to have more controls, so called 1-m matching. Data from matched studies should be analysed by methods for simultaneous analysis of several 2x2-tables. Two remarks: • In matched studies the precision gained by increasing the number of controls to each cases decreases rather fast. • There may be a disadvantage of matched studies since it prevents any study of the effect of the variables that has been used to define the matching. Analysis of matched case-control studies The methods for estimating matched case-control studies are the same as for analysing several 2x2-tables where same effect is assumed in all strata. In a matched study each case define a stratum. An estimate of the common effect is obtained through the Mantel-Haenszel estimator: ∑ ∑ i iii i iii t/cb t/da
  • 10. Biostatistics 2002, ÅS 10 Logistic regression In many situation covariates can be attach to each individual under study. This means that we do not only observe the Bernoulli variable Y which can take the values 0 and 1 according to if the individual is diseases or not but we also have other measurements (x1, …,xk) which may influence the risk of getting ill. In such cases there is need for an model that describes the influence of the covariates (independent variables) on the distribution of Y. Such a model which is similar to normal linear regresson assumes that ( ) ( )kk110 kk110 x...xexp1 x...xexp )1YPr( β++β+β+ β++β+β == This model is called a logistic regression model. Observe that the log odds is a linear combination of the covariates, i.e., .xx )0YPr( )1YPr( ln kk110 β++β+β=      = = K One of the covariates may be exposure, either defined as a 0/1-variable (exposed or not) or as a dose (that measure the amount of exposures. In such case the corresponding regression coefficient measures the effect of the exposure. In fact the estimate of the regression coefficient is the logarithm of the odds ratio effect of exposure that differs with one unit of the scale in which the exposure is measured in. Conditional logistic regression In matched case control studies we have a case with corresponding covariates and one or more controls with possible different covariates. We can then use a logistic regression model to calculate a conditional probability that says how propable it is that among a number of people (the case and the controls) with a cetain set of covariate values it is that just the case have the disease under study. Let consider a 1-1 matching situation and assume that the case has covariates x and the control covariate z. (For simplicity we assume a one-dimensional covariate.) Then we can calculate the conditional probability that the person with covariates x have the disease given that exactly one of the two persons have the disease. Elementary (but not simple) calculations give the result
  • 11. Biostatistics 2002, ÅS 11 ( ) ( ) ( ) . zexpxexp xexp 11 1 β+β β The statistical analysis of the matched data can be analysed using these probabilities to built up a likelihood. ML estimates can be found and likelihood ratio test derived. Possible sources of Bias in epidemiological studies: • Confounding • Selection bias (e.g. Healthy worker effect) • Recall Bias (cases may report exposures more accurately than controls) • Publication Bias (significant results tends to be published easier than studies with significance is not found)