Presentation at the Epidemiology Congress of Americas 2016.
https://epiresearch.org/2016-meeting/submitted-abstract-sessions/pharmacoepidemiology-estimation-of-treatment/
Paper: http://journals.lww.com/epidem/Abstract/publishahead/Matching_weights_to_simultaneously_compare_three.98901.aspx (email me at kazukiyoshida@mail.harvard.edu)
Simulation code: https://github.com/kaz-yos/mw
Tutorial: http://rpubs.com/kaz_yos/matching-weights
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulation Study
1. Matching Weights
to Simultaneously Compare Three Treatment Groups:
a Simulation Study
Kazuki Yoshida, MD, MPH, MS*,
Sonia Hern´andez-D´ıaz, MD, DrPH, Daniel H. Solomon, MD, MPH,
John W. Jackson, ScD, Joshua J. Gagne, PharmD, ScD,
Robert Glynn, PhD, Jessica M. Franklin, PhD
*Joint Doctor of Science Student
Departments of Epidemiology & Biostatistics
Harvard T.H. Chan School of Public Health
677 Huntington Ave, Boston, MA 02115, USA
Last updated on June 22, 2016
2. Motivation
Propensity score matching (Rosenbaum & Rubin 1983) is a well
established method, and is widely used in the two-group setting.
In clinical practice, however, there are often 3+ comparator drugs to
be compared, e.g., antirheumatic drugs for rheumatoid arthritis.
For non-binary treatment, generalized propensity score (Imbens
2000) has been proposed, but its use has been limited.
Recently developed software (Rassen et al 2013) allows 3-way
simultaneous matching on generalized PS, but further generalization
is complicated.
Question: Is there an alternative that is similar to PS matching, but
more easily generalizes to 3+ groups?
Hypothesis: Matching weights (Li & Greene 2013) may be a viable
candidate.
2 / 31
3. Matching weights definition
Li & Greene. A weighting analogue to pair matching in propensity score
analysis. Int J Biostat 2013;9:215-234.
MWi =
min(ei , 1 − ei )
Zi ei + (1 − Zi )(1 − ei )
where ei is propensity score and Zi is binary treatment indicator
Advantages
Asymptotic equivalence of estimand to 1:1 matching
Efficiency gain
No tuning parameters (no algorithm, caliper scale or width)
Range (0,1) unlike non-stabilized IPTW (1,∞)
Disadvantages
Potential for common support violation
3 / 31
5. Comparing weighting methods
IPTWi =
1
Zi ei + (1 − Zi )(1 − ei )
=
1
ei
for Zi = 1
1
1 − ei
for Zi = 0
ATTWi =
ei
Zi ei + (1 − Zi )(1 − ei )
=
1 for Zi = 1
ei
1 − ei
for Zi = 0
ATUWi =
1 − ei
Zi ei + (1 − Zi )(1 − ei )
=
1 − ei
ei
for Zi = 1
1 for Zi = 0
MWi =
min(ei , 1 − ei )
Zi ei + (1 − Zi )(1 − ei )
=
ATTWi for ei ≤ 0.5
ATUWi for ei > 0.5
The denominator (IPTW) balances covariates, and the numerator
reshapes to the target population.
5 / 31
6. Extension of MW to K groups
Define a propensity score (eki ) for each treatment category
(k ∈ {1, 2}) and redefine the treatment variable as Zi ∈ {1, 2}.
MWi =
min(e1i , e2i )
2
k=1
I(Zi = k)eki
=
Smallest PS
PS of assigned treatment
Use multinomial logistic regression for PS model
Each subject has K propensity scores {e1i , e2i , ..., eKi }
K propensity scores sum to 1
Generalize the weights as
MWi =
min(e1i , . . . , eKi )
K
k=1
I(Zi = k)eki
6 / 31
7. Simulation study
Ti
Xi
Yi
Outcome model
βT1, βT2 (main effects)
for treatment effects
βXT1, βXT2 (interactions)
for additional treatment effects in subset
Treatment model
α10, α20 (intercepts)
for treatment prevalence
α1X , α2X (covariate association)
for covariate overlap level
Outcome model
β0 (intercept)
for baseline risk of disease
βX (covariate association)
for strength of risk factors
Exposure distribution: {(33 : 33 : 33), (10 : 45 : 45), (10 : 10 : 80)}
Levels of covariate overlap: small, substantial
Baseline risk of disease: {0.05, 0.20}
Presence of treatment effect: absent, present
Presence of treatment effect heterogeneity: absent, present
7 / 31
8. Good overlap Poor overlap
q
q
q
qq
q q
qq
q
q
q q
q q
qq
q q
qq
q q
q
0
2000
4000
6000
U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
Sample Sizes
9. X1 X4 X7
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
GoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip
method
pExpo 33:33:33 10:45:45 10:10:80
Average Standardized Mean Differences
10. Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis 0.05 0.2
Bias (Estimated Risk Ratio / True Risk Ratio)
12. Simulation: Summary results
Comparing matching weights to three-way matching and IPTW, we
found:
Similar sample sizes for MW and matching, but not IPTW
Best covariate balance
Similarly small bias compared to matching
Smaller MSE compared to matching in all scenarios
More robust to rare events, unequally sized groups, and poor
covariate overlap
12 / 31
13. Conclusion
MW has been suggested as a more efficient alternative to 1:1
pairwise matching with a similar estimand (Li & Greene 2013).
In the three treatment group setting, MW demonstrated similar bias,
but smaller MSE compared to 1:1:1 three-way matching in a
simulation study.
Efficiency gain compared to 1:1:1 three-way matching was more
noticeable in scenarios in which the outcome events were rare,
treatment groups were unequally sized, or covariate overlap was
poor.
Compared to IPTW, MW was more stable in the poor covariate
overlap setting.
13 / 31
14. Acknowledgment
KY currently receives tuition support jointly from Japan Student
Services Organization (JASSO) and Harvard T. H. Chan School of
Public Health (partially supported by training grants from Pfizer,
Takeda, Bayer and PhRMA).
14 / 31
17. Simulation: Covariate generating model
Based on Franklin et al. Metrics for covariate balance in cohort
studies of causal effects. Stat Med. 2014;33:1685.
Variable Generation Process
X1i Normal(0, 12
)
X2i Log-Normal(0, 0.52
)
X3i Normal(0, 102
)
X4i Bernoulli(pi = e2X1i
/(1 + e2X1i
)) where E[pi ] = 0.5
X5i Bernoulli(p = 0.2)
X6i Multinomial(p = (0.5, 0.3, 0.1, 0.05, 0.05)T
)
X7i sin(X1i )
X8i X2
2i
X9i X3i × X4i
X10i X4i × X5i
17 / 31
18. Simulation: Treatment generating model
ηT1i = log
P(Ti = 1|Xi = xi )
P(Ti = 0|Xi = xi )
= α10 + αT
1X xi
ηT2i = log
P(Ti = 2|Xi = xi )
P(Ti = 0|Xi = xi )
= α20 + αT
2X xi
where
α10, α20 determine treatment prevalence
α1X , α2X determine covariate-treatment association
e0i = P(Ti = 0|Xi = xi ) =
1
qi
e1i = P(Ti = 1|Xi = xi ) =
exp(ηT1i )
qi
e2i = P(Ti = 2|Xi = xi ) =
exp(ηT2i )
qi
where qi = 1 + exp(ηT1i ) + exp(ηT2i )
18 / 31
19. Simulation: Outcome generating model
ηYi = log(P(Yi = 1|Ti = ti , Xi = xi ))
= β0 + βT
X xi + βT1I(ti = 1) + βT2I(ti = 2)+
βXT1x4i I(ti = 1) + βXT2x4i I(ti = 2)
where
β0 = Intercept determining baseline disease risk
βX = Effects of ten covariates (risk factors) on disease risk
βT1 = Main effect of Treatment 1 compared to Treatment 0
βT2 = Main effect of Treatment 2 compared to Treatment 0
βXT1 = Additional effect for Treatment 1 vs 0 among X4i = 1
βXT2 = Additional effect for Treatment 2 vs 0 among X4i = 1
Counterfactual disease probabilities
P(Yi = 1|Ti = 0, Xi = xi )
P(Yi = 1|Ti = 1, Xi = xi )
P(Yi = 1|Ti = 2, Xi = xi )
Yi ∼ Bernoulli (pYi = P(Yi = 1|Ti = ti , Xi = xi ))
19 / 31
20. Simulation: Analyses
All computation except 3-way matching was performed in R
Multinomial logistic regression with all covariates as the propensity
score model.
Matched analyses:
Three-way nearest neighbor algorithm implemented in the
pharmacoepi toolbox by Rassen et al generated matched “trios”.
Caliper for the matched trio triangle perimeter was defined as
0.6
τ2
1 +τ2
2 +τ2
3
3
where τ2
j = Var(e1|T=j)+Var(e2|T=j)
2
.
OLS linear regression was conducted in the matched dataset.
Weighted analysis:
MW and stabilized IPTW
survey package was used to appropriately account for weighting in
the outcome log linear model.
20 / 31
21. Simulation: Assessment metrics
Matched/weighted sample size
Covariate standardized mean difference (SMD) averaged across
three contrasts
bias in risk ratio
Simulation and estimated variance of estimators
Mean squared error of estimators
False positive rates in null scenarios
Coverage probability of estimated confidence intervals
21 / 31
22. Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
True Risk Ratios (Estimands)
28. Empirical example: Methods
Solomon et al. Arch Intern Med 2010;170:1968.
Medicare Beneficiary dataset from PA and NJ (1999-2005)
Groups: Opioids (12,601) vs COX2 inhibitors (6,172) vs nsNSAIDs
(4,874) new users
Outcomes: Death (794), fractures (706), GI bleed (230), and
cardiovascular events (1,204)
Confounders: 35 pre-treatment variables including 5 continuous
PS model: Quadratic terms for continuous variables; no interaction
Analyzed using MW and three-way matching to see agreement
MW sample size 4,618.7-4,635.71 per group; matched sample size
4,611 per group; stablized IPTW sample size 4,926.6-12,585.0
Best balance was achieved by MW in 24 covariates, by matching in
6, and by IPTW in 5.
28 / 31
29. Empirical example: Covariate balance
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Hepatic disease
Gender
ARB use
Alzheimer disease
Osteoporosis
Hypertension
Parkinson's disease
Thiazide use
Bone meneral density test
ACE inhibitor use
Antiepileptic use
Diabetes
Upper gastrointestinal disease
Hyperlipidemia
Gout
Benzodiazepine use
Beta blocker use
Back pain
SSRI use
Angina
H2 blocker use
Corticosteroid use
Falls
PPI use
Stroke
Myocardial infarction
No. physician visits
Age
Loop diuretic use
Fracture
White race
No. days in hospital
No. prescription drugs
Antithrombotic use
Charlson score
0.00 0.05 0.10 0.15 0.20
Absolute Standardized Mean Difference
Methods
q Unmatched
Matched
MW
IPTW
Unmatched
Matched
MW
IPTW
29 / 31
31. Outline of proof that estimands are equivalent
A complete common support and exact propensity score matching are
assumed. Sk is the set of matched individuals in treatment group k. Wi
is the matching weights min(e1i ,...,eKi )
K
k=1 I(Zi =k)eki
. The estimators for the group
mean have the same estimand.
Matching
1
n
n
i=1 Yi I(i ∈ Sk )
1
n
n
i=1 I(i ∈ Sk )
=
1
n
n
i=1 Yki I(i ∈ Sk )
1
n
n
i=1 I(i ∈ Sk )
→
E[Yki I(i ∈ Sk )]
E[I(i ∈ Sk )]
. . .
=
E [E[Yki |Xi ]min(e1i , ..., eKi )]
E [min(e1i , ..., eKi )]
Weighting
1
n
n
i=1 Yi I(Zi = k)Wi
1
n
n
i=1 I(Zi = k)Wi
=
1
n
n
i=1 Yki I(Zi = k)Wi
1
n
n
i=1 I(Zi = k)Wi
→
E[
n
i=1 Yki I(Zi = k)Wi ]
E[
n
i=1 I(Zi = k)Wi ]
. . .
=
E [E[Yki |Xi ]min(e1i , ..., eKi )]
E [min(e1i , ..., eKi )] 31 / 31