2. Big Picture
¤ Epidemiology (description, prediction, etiology/
causation) relies on precise measurement of
outcomes
¤ To precisely measure outcome we must
¤ Define the outcome
¤ Specify whether & when outcome occurred
¤ Induction period, latent period
¤ Specify amount of time at risk for disease
¤ Determine among what population (at risk,
candidate, closed/fixed, open, steady state)
2
3. 3 Categories of Measures
¤ Proportion
¤ Numerator is included in the denominator:
(a/a+b)
¤ Ratio
¤ Numerator is distinct from the denominator :
(c/d)
¤ Rate
¤ Change in disease status per unit time: #
disease events/period of time
3
4. Types of Populations/Cohorts
¤ Closed or Fixed
¤ Can only lose members due to death or disease of interest
¤ A closed population/cohort with constant incidence rate
(ID) declines in size exponentially
¤ Open
¤ Members can be added (birth, migrate in) and removed
(death, migrate out)
¤ Steady state
¤ Can occur only in an open population/cohort
¤ Number of people entering is balanced by the number of
people exiting within levels of age, sex, etc.
4
5. Measures of disease
¤ Prevalence = proportion of current cases
of disease in a population at a single
point (or period) in time
¤ Incidence = frequency of development
of new cases of disease in a population
over a defined period of time
5
6. Prevalence (a proportion)
¤ Prevalence = # existing cases of disease
# of persons in the population of interest
¤ Two types
¤ Point Prevalence: cases at a given point in time
¤ Period Prevalence: cases during a given time
period
¤ Useful for resource planning if
prevalence is constant
6
7. Risk vs. Rate
RISK RATE
Defn. Probability that an individual
will develop outcome over
given time period,
conditional on that person
not dying from other causes
during time period.
The frequency of occurrence
of new cases of disease during
person-time of observation in a
population at risk of
developing disease
Measure Cumulative incidence Hazard (instantaneous)
Incidence density (average)
Range 0 to 1 0 to infinity
Time Refers to time period (but
not used in calculations)
1/time (time-1); cases/person-
time
(time used in calculations)
Refers to Individual (but popln. used
to estimate)
Population (no individual level
interpretation)
7
8. Incidence density
¤ ID = # new cases
total person-time at risk
¤ Used for populations, not individuals
¤ Interpretation of ID = 35/1000 person-years
¤ The rate of outcome in the population is
35/1000 person-years
8
9. Calculating person-time
¤ When you have individual-level (detailed) data:
¤ 1. If exact time contribution of each individual is known
¤ PT =
¤ Sum all contributions of disease-free time
¤ 2. If exact time contributions unknown but you have interval
specific data
¤ PTj = (N’
0j– Wj/2) Δtj
¤ N’
0j = Number of individuals at beginning of each interval
¤ Δtj = duration of follow up in each interval
¤ Wj = # of withdrawals in each interval
¤ (Note: Variants of this formula also subtract (Ij/2) in each interval)
9
!ti
i=1
N '
"
10. Calculating person-time
¤ When you have group level data
¤ 3. If population is in steady state à
¤ PT = N’ Δt
¤ N’= stable, disease free population size
¤ Δt = duration of follow up
¤ 4. If population is not in steady state à
¤ PT = N’
1/2 Δt
¤ N’
1/2 = mid-interval disease-free population size
¤ Δt = duration of follow up
10
11. 11
In class exercise: estimate ID
¤ Study population observed monthly for 5 months
¤ What is the person-time contributed by this
population?
¤ What is the incidence density?
C = censored
D = died/ developed disease
0 1 2 3 4 5
1
2
3
4
Follow-up Time (months)
Person#
C
C
D
12. In class exercise solution
j N’0j Wj ΔTj PT
1 4 1 1 (4-(1/2))*1=3.5
2 3 1 1 (3-(1/2))*1=2.5
3 1 0 1 (1-(0/2))*1=1
4 1 0 1 (1-(0/2))*1=1
5 1 0 1 (1-(0/2))*1=1
Total PT = 9
12
Use the following formula because exact time contributions are
unknown and we have interval specific data
PTj = (N’
0j– Wj/2) Δtj
ID=1/9 (0.111) cases per person-year
13. Cumulative Incidence
¤ CI= # new cases of disease
# of persons at risk
(at beginning of time period)
¤ Always defined over a time period
¤ Can be used to predict individual’s risk
¤ Interpretation of CI = 0.35 in a 5-year period
¤ The risk of developing the outcome in the
population at risk at baseline over a 5-year period
was 0.35.
13
14. Cumulative Incidence
¤ 4 Ways to Calculate it:
¤ Simple Cumulative
¤ Actuarial
¤ Kaplan-Meier
¤ Density Method
¤ Know the assumptions needed to use each
one & how to apply them
¤ Why different approaches?
14
15. 1. Simple cumulative
¤ R (t0, tj) = CI (t0, tj) = I/N’
0j = # new cases
# disease free subjects at risk at t0
¤ Assumptions:
¤ Closed population
¤ No withdrawals/loss to follow up
¤ No competing risks
¤ Best for: short time frames (e.g., outbreaks)
¤ Can’t be used when duration of follow-up varies
15
16. 16
In class exercise: Simple CI
¤ Study population observed monthly for 5 months
¤ What is the simple CI?
C = censored
D = died/ developed disease
0 1 2 3 4 5
1
2
3
4
Follow-up Time (months)
Person#
D
17. 2. Actuarial (life table)
¤ R (tj-1, tj) = CI (tj-1, tj) = Ij /[N’
0j – (Wj/2)] =
# new cases during interval j
# of disease free subjects at risk at the beginning of interval j
adjusted for withdrawals during that interval
¤ tj-1, tj is a shorter time interval; calculate risks over shorter time
intervals & then accumulate them
¤ Cumulative risk = R (t0, tj) = 1 - ∏ [S (tj-1, tj)] = 1 – ∏ [1-R( tj-1, tj)]
17
18. Assumptions/benefits of actuarial method
¤ Assumptions
¤ Withdrawals occur halfway through observation
period on average
¤ Independence of censoring and survival
¤ Lack of secular trends during the study period
¤ Benefits
¤ Allows for censoring
18
21. 3. Kaplan Meier
¤ Rj = CIj = Ij/Nj = # new cases
# of individuals still at risk at time j
¤ Calculate risk at time each disease event occurs
¤ Accumulate interval-specific risks (similar to actuarial
method)
¤ Need very detailed data
21
22. Assumptions/benefits of Kaplan-Meier Method
¤ Assumptions:
¤ Independence of censoring and survival
¤ Lack of secular trends during the study period
¤ Benefits:
¤ Calculate event probability at the time it occurs
22
23. 3. Kaplan Meier
23
Timej Nj Ij Interval risk (Rj) Interval survival
(1-Rj)
Cumul.
risk
Cumul.
survival
1 4 1 1/4=0.25 0.75 0.25 0.75
5 2 1 1/2=0.5 0.5 0.625 0.375
1 D
2 C
3
4 D
1 2 3 4 5 6
Follow-up Time (Months)
PersonNo.
24. ¤ Risk (CI) and rate (ID) are mathematically
related
¤ CI ≈ 1-e(-ID*Δt)
¤ Rare Disease Approximation:
¤ When ID*Δt is very small (<10%) the CI≈ID*Δt
¤ 3 assumptions
¤ Closed population
¤ No competing risk
¤ Each age-specific rate (IDj) is constant over that interval
Relationship between risk and rate
24
25. 4. Density method
¤ Estimate risk (CI) using observed incidence rates (ID) in
each time interval
¤ Interval risk = R (tj-1, tj) = CI (tj-1, tj) =
¤ Cumulative risk = R (t0, tj) =
25
1!e
! (IDj"tj )
j
#
1!e
!IDj"tj
26. Example: Estimate CI with density method (1)
¤ Estimate 5-year risk assuming constant ID:
1-exp[-0.192*5]=0.618
¤ Be careful! The rate differs greatly from interval to interval
26
Kleinbaum
Table 6.2
28. Assumptions/benefits of density method
¤ Assumptions
¤ Within each interval, rate is constant
¤ Closed cohort
¤ No competing risks
¤ Independence of censoring and survival
¤ Lack of secular trends during the study period
¤ Benefits
¤ Can be used to extrapolate CI to intervals beyond the
follow-up time
28
29. Relationship between prevalence
and incidence
¤ In a steady-state population:
¤ P/(1-P) = Incidence (ID) x duration (D)
a. If prevalence is low/disease rare:
¤ P ≈ ID x D
b. If prevalence is not low/disease not rare:
¤ P = (ID x D)/(ID x D+1)
¤ So, if disease not rare, then prevalence
odds may be preferred
29
30. Standardization of rates
¤ Types of rates
¤ Crude
¤ Specific (e.g., age-specific)
¤ Standardized/adjusted (e.g., age-adjusted)
¤ Why adjust?
¤ Comparing two or more crude rates between
populations can be misleading because populations
may also differ with respect to characteristics (e.g.,
age) that affect the rate of disease
¤ Example: Crude mortality higher in Florida than other
states
30
31. Direct Age Adjustment
Total expected outcomes
Age-adjusted rate = ----------------------------------
Total standard population
Age-specific rates come from your study population
Age-specific population sizes (i.e., the weights) come
from the standard population
31
32. Age-specific rates come from the standard
population
Age-specific population sizes (also called weights)
come from your study population
observed outcomes (O)
SMR = ------------------------------------ x 100%
expected outcomes (E)
Standardized mortality ratio (SMR) = ratio of the observed
number of outcomes in the study population to the expected
number of outcomes if the study population had the same
age-specific rates as the standard population.
Indirect Age Adjustment
32
33. Standardization and the
counterfactual
¤ What does the standard population represent?
¤ Direct: a counterfactual estimate of the age
structure
¤ Answers question: what would the rate of disease look
like in my study population if, counter to fact, it had the
same age structure as the standard population
¤ Indirect: a counterfactual estimate of the age-
specific rates
¤ Answers question: what would the rate of disease look
like in my study population if, counter to fact, it had the
same age-specific rates as the standard population
33
34. Standardization and the
counterfactual
¤ What does this imply about your choice of
standard population??
¤ Your age-adjusted rate or SMR depends on your
choice of standard population
¤ Changing the standard population will change your answer!
¤ Example
¤ SMR comparing rate of accidents among U.S. construction
workers to the U.S. standard population = 167%
¤ SMR comparing rate of accidents among U.S. construction
workers to construction workers in China = 76%
34
35. Key points about indirect
standardization
¤ Why use indirect?
¤ When we do not have stratum-specific rates, or if
stratum specific rates are based on cells with small
numbers.
¤ SMR can be >100% or <100% depending on
standard population (illustrated in previous slide)
¤ Caution: do not compare two SMRs
35
36. Don’t compare 2 SMRs!
¤ Both study groups have identical age-specific rates
¤ How do they compare to the standard?
Community A Community B
Age
(yrs)
N Deaths Rate N Deaths Rate Standard
Pop. Rates
<40 100 10 10% 500 50 10% 12%
40+ 500 100 20% 100 20 20% 50%
Total 600 110 18.3% 600 70 11.7%
36
37. Don’t compare 2 SMRs
Expected # of deaths obtained by applying the
reference rates to Communities A & B
Age (yrs) Community A Community B
< 40 0.12 * 100= 12 0.12 * 500 = 60
40+ 0.5 * 500 = 250 0.5 * 100 =50
Total # expected 262 110
Total # observed 110 70
SMR (observed/expected) 0.42 0.64
The SMRs are different even though both populations
had the same rates of disease in each age stratum!
37
38. Don’t compare 2 SMRs
Another way to think about it:
¤ SMRA = obsA
expA
¤ SMRB = obsB
expB
¤ Directly adjusted rateA = expA
standard
¤ Adjusted rateB = expB
standard
Different denominator: can’t compare
Same denominator: can compare
39. Age effect
¤ Definition:
¤ Variation in health status arising from social or
biological consequences of aging
¤ What to look for:
¤ Rate (of disease) changes with age
¤ Irrespective of birth cohort and calendar time
39
Source: Szklo
40. Period effect
¤ Definition:
¤ Variation in health status arising from changes in environment
during time period
¤ What to look for:
¤ Change in rate (of disease) affecting an entire population at
some point in time
¤ Irrespective of age and birth cohort
40
Source: Szklo
41. Cohort effect
¤ Definition:
¤ Variation in health status arising from
exposures that vary by cohort
¤ What to look for:
¤ Change in the rate (of disease) according to
year of birth
¤ Irrespective of age and calendar time
41
Source: Szklo
43. Specific measures of disease
¤ Proportionate mortality
¤ Proportionate mortality ratio
¤ Case fatality rate
¤ Death to case ratio
¤ Infant mortality rate
¤ Neonatal mortality rate
¤ Postneonatal mortality rate
¤ Maternal mortality ratio (also called maternal mortality rate)
¤ Crude birth rate
¤ General fertility rate
¤ Years of potential life lost (YPLL)
43
45. People moving through time
E
A C
D
Infection with
HPV
Cervical
dysplasia
detectable by
pap smear
(“early
diagnosis”)
Symptoms, e.g.
bleeding
(“usual
diagnosis”)
Death
Disease detected
B
Changes to
cervical cells
Induction
period
Latent period
45
Lead time
47. Context – why do this study?
¤ Hepatitis E is an “emerging” pathogen
¤ Endemic in South Asia
¤ Poor understanding of patterns of infection in
subpopulations
¤ Low incidence in children under 15 years
¤ High mortality rate among pregnant woman
¤ Poor understanding of how the body develops an
immune response to Hep E
¤ Few longitudinal studies of Hep E
47
48. Study objective and design
¤ Objective: Calculate age-specific population incidence
rates of HEV infection and disease under endemic, non-
outbreak conditions
¤ Design: Follow randomly selected cohort of rural
Bangladeshis for 18 months
48
51. Data collection
¤ Random selection from the Matlab cohort
¤ Entire cohort was enumerated in 2003 census
¤ 1,300 households randomly selected
¤ Baseline survey 2003-2004
¤ Test for antibodies to HEV (n=1,134)
¤ 12-month & 18-month follow-up
¤ Blood sample
¤ Test for antibodies
¤ Seronegative: Titers < 20 WR-U/mL
¤ Seropositive (“seroconverted”): Titers ≥20 WR-U/mL
¤ Questionnaire – exposures, morbidity
51
52. Key Findings
¤ Baseline seroprevalence of HEV: 22.5%
¤ Overall incidence density : 64 per 1,000 person-years
¤ Expected lower incidence since HEV has typically been
reported to be sporadic in Bangladesh
52