Matching Weights
to Simultaneously Compare Three Treatment Groups:
a Simulation Study
Kazuki Yoshida, MD, MPH, MS*,
Sonia Hern´andez-D´ıaz, MD, DrPH, Daniel H. Solomon, MD, MPH,
John W. Jackson, ScD, Joshua J. Gagne, PharmD, ScD,
Robert Glynn, PhD, Jessica M. Franklin, PhD
*Joint Doctor of Science Student
Departments of Epidemiology & Biostatistics
Harvard T.H. Chan School of Public Health
677 Huntington Ave, Boston, MA 02115, USA
Last updated on June 22, 2016
Motivation
Propensity score matching (Rosenbaum & Rubin 1983) is a well
established method, and is widely used in the two-group setting.
In clinical practice, however, there are often 3+ comparator drugs to
be compared, e.g., antirheumatic drugs for rheumatoid arthritis.
For non-binary treatment, generalized propensity score (Imbens
2000) has been proposed, but its use has been limited.
Recently developed software (Rassen et al 2013) allows 3-way
simultaneous matching on generalized PS, but further generalization
is complicated.
Question: Is there an alternative that is similar to PS matching, but
more easily generalizes to 3+ groups?
Hypothesis: Matching weights (Li & Greene 2013) may be a viable
candidate.
2 / 31
Matching weights definition
Li & Greene. A weighting analogue to pair matching in propensity score
analysis. Int J Biostat 2013;9:215-234.
MWi =
min(ei , 1 − ei )
Zi ei + (1 − Zi )(1 − ei )
where ei is propensity score and Zi is binary treatment indicator
Advantages
Asymptotic equivalence of estimand to 1:1 matching
Efficiency gain
No tuning parameters (no algorithm, caliper scale or width)
Range (0,1) unlike non-stabilized IPTW (1,∞)
Disadvantages
Potential for common support violation
3 / 31
PS methods visualized (common treatment)
Original IPTW Matching
ATTW ATUW MW
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
PS
Frequency
Treatment
Treated
Untreated
4 / 31
Comparing weighting methods
IPTWi =
1
Zi ei + (1 − Zi )(1 − ei )
=



1
ei
for Zi = 1
1
1 − ei
for Zi = 0
ATTWi =
ei
Zi ei + (1 − Zi )(1 − ei )
=



1 for Zi = 1
ei
1 − ei
for Zi = 0
ATUWi =
1 − ei
Zi ei + (1 − Zi )(1 − ei )
=



1 − ei
ei
for Zi = 1
1 for Zi = 0
MWi =
min(ei , 1 − ei )
Zi ei + (1 − Zi )(1 − ei )
=
ATTWi for ei ≤ 0.5
ATUWi for ei > 0.5
The denominator (IPTW) balances covariates, and the numerator
reshapes to the target population.
5 / 31
Extension of MW to K groups
Define a propensity score (eki ) for each treatment category
(k ∈ {1, 2}) and redefine the treatment variable as Zi ∈ {1, 2}.
MWi =
min(e1i , e2i )
2
k=1
I(Zi = k)eki
=
Smallest PS
PS of assigned treatment
Use multinomial logistic regression for PS model
Each subject has K propensity scores {e1i , e2i , ..., eKi }
K propensity scores sum to 1
Generalize the weights as
MWi =
min(e1i , . . . , eKi )
K
k=1
I(Zi = k)eki
6 / 31
Simulation study
Ti
Xi
Yi
Outcome model
βT1, βT2 (main effects)
for treatment effects
βXT1, βXT2 (interactions)
for additional treatment effects in subset
Treatment model
α10, α20 (intercepts)
for treatment prevalence
α1X , α2X (covariate association)
for covariate overlap level
Outcome model
β0 (intercept)
for baseline risk of disease
βX (covariate association)
for strength of risk factors
Exposure distribution: {(33 : 33 : 33), (10 : 45 : 45), (10 : 10 : 80)}
Levels of covariate overlap: small, substantial
Baseline risk of disease: {0.05, 0.20}
Presence of treatment effect: absent, present
Presence of treatment effect heterogeneity: absent, present
7 / 31
Good overlap Poor overlap
q
q
q
qq
q q
qq
q
q
q q
q q
qq
q q
qq
q q
q
0
2000
4000
6000
U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
Sample Sizes
X1 X4 X7
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
GoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip
method
pExpo 33:33:33 10:45:45 10:10:80
Average Standardized Mean Differences
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
0.75
1.00
1.50
2.00
3.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis 0.05 0.2
Bias (Estimated Risk Ratio / True Risk Ratio)
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
qqq
qqq
qqq qqq
qq
q q
q
q
q
qq
q
q
q
qqq
qqq
qqq
qqq
qq
q q
q
q
q
qq
q
q
q
qqq
qqq
qqq qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
qq
qqq qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq qqq
qqq qqq
q
qq
q
q
q
q
qq qq
q
qqq q
qq
qqq qqq
q
q
q
q
q
q
q
qq q
q
q
qqq
qqq
qqq
qqq
q
q
q
q
q
q
q
qq
q
q
q
qqq
qqq
qqq
qqq
q
qq
q
q
q
q
qq
q
q
q
qqq qqq
qqq qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
q
q
q
q
q
q
q
q
q
q
q
q
qqq qqq
qqq qqq
q
q
q
q
q
q
qqq
q
q
q
qqq q
qq
qqq qqq
q
q
q q
q
q
q
qq
q
q
q
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
Mean Squared Error
Simulation: Summary results
Comparing matching weights to three-way matching and IPTW, we
found:
Similar sample sizes for MW and matching, but not IPTW
Best covariate balance
Similarly small bias compared to matching
Smaller MSE compared to matching in all scenarios
More robust to rare events, unequally sized groups, and poor
covariate overlap
12 / 31
Conclusion
MW has been suggested as a more efficient alternative to 1:1
pairwise matching with a similar estimand (Li & Greene 2013).
In the three treatment group setting, MW demonstrated similar bias,
but smaller MSE compared to 1:1:1 three-way matching in a
simulation study.
Efficiency gain compared to 1:1:1 three-way matching was more
noticeable in scenarios in which the outcome events were rare,
treatment groups were unequally sized, or covariate overlap was
poor.
Compared to IPTW, MW was more stable in the poor covariate
overlap setting.
13 / 31
Acknowledgment
KY currently receives tuition support jointly from Japan Student
Services Organization (JASSO) and Harvard T. H. Chan School of
Public Health (partially supported by training grants from Pfizer,
Takeda, Bayer and PhRMA).
14 / 31
Additional slides with details follow
15 / 31
PS methods visualized (rare treatment)
Original IPTW Matching
ATTW ATUW MW
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
PS
Frequency
Treatment
Treated
Untreated
16 / 31
Simulation: Covariate generating model
Based on Franklin et al. Metrics for covariate balance in cohort
studies of causal effects. Stat Med. 2014;33:1685.
Variable Generation Process
X1i Normal(0, 12
)
X2i Log-Normal(0, 0.52
)
X3i Normal(0, 102
)
X4i Bernoulli(pi = e2X1i
/(1 + e2X1i
)) where E[pi ] = 0.5
X5i Bernoulli(p = 0.2)
X6i Multinomial(p = (0.5, 0.3, 0.1, 0.05, 0.05)T
)
X7i sin(X1i )
X8i X2
2i
X9i X3i × X4i
X10i X4i × X5i
17 / 31
Simulation: Treatment generating model
ηT1i = log
P(Ti = 1|Xi = xi )
P(Ti = 0|Xi = xi )
= α10 + αT
1X xi
ηT2i = log
P(Ti = 2|Xi = xi )
P(Ti = 0|Xi = xi )
= α20 + αT
2X xi
where
α10, α20 determine treatment prevalence
α1X , α2X determine covariate-treatment association
e0i = P(Ti = 0|Xi = xi ) =
1
qi
e1i = P(Ti = 1|Xi = xi ) =
exp(ηT1i )
qi
e2i = P(Ti = 2|Xi = xi ) =
exp(ηT2i )
qi
where qi = 1 + exp(ηT1i ) + exp(ηT2i )
18 / 31
Simulation: Outcome generating model
ηYi = log(P(Yi = 1|Ti = ti , Xi = xi ))
= β0 + βT
X xi + βT1I(ti = 1) + βT2I(ti = 2)+
βXT1x4i I(ti = 1) + βXT2x4i I(ti = 2)
where
β0 = Intercept determining baseline disease risk
βX = Effects of ten covariates (risk factors) on disease risk
βT1 = Main effect of Treatment 1 compared to Treatment 0
βT2 = Main effect of Treatment 2 compared to Treatment 0
βXT1 = Additional effect for Treatment 1 vs 0 among X4i = 1
βXT2 = Additional effect for Treatment 2 vs 0 among X4i = 1
Counterfactual disease probabilities
P(Yi = 1|Ti = 0, Xi = xi )
P(Yi = 1|Ti = 1, Xi = xi )
P(Yi = 1|Ti = 2, Xi = xi )
Yi ∼ Bernoulli (pYi = P(Yi = 1|Ti = ti , Xi = xi ))
19 / 31
Simulation: Analyses
All computation except 3-way matching was performed in R
Multinomial logistic regression with all covariates as the propensity
score model.
Matched analyses:
Three-way nearest neighbor algorithm implemented in the
pharmacoepi toolbox by Rassen et al generated matched “trios”.
Caliper for the matched trio triangle perimeter was defined as
0.6
τ2
1 +τ2
2 +τ2
3
3
where τ2
j = Var(e1|T=j)+Var(e2|T=j)
2
.
OLS linear regression was conducted in the matched dataset.
Weighted analysis:
MW and stabilized IPTW
survey package was used to appropriately account for weighting in
the outcome log linear model.
20 / 31
Simulation: Assessment metrics
Matched/weighted sample size
Covariate standardized mean difference (SMD) averaged across
three contrasts
bias in risk ratio
Simulation and estimated variance of estimators
Mean squared error of estimators
False positive rates in null scenarios
Coverage probability of estimated confidence intervals
21 / 31
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
0.40
0.50
0.75
1.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
True Risk Ratios (Estimands)
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
qqq
q
qq
qqq
qqq
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq
q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
q
q
q
q
q
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
True Variance
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
qqq
q
qq
qqq
qqq
q
q
q
q
q
q
q
qq q
q
q
qqq
q
qq
qqq
qqq
q
q
q
q
q
q
q
qq q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq qq
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq
qq
q
qqq
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
qq
q
qq
q
q
q
q
q
q
q
q
qq q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
q
qq
q
q
q
q
q
q
q
q
q
qqq q
qq
qqq qqq
qq
q
q
q
q
q
qq q
q
q
qqq q
q
q
qqq qqq
qq
q
q
q
q
q
qq q
q
q
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
Mean Estimated Variance
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
q q
q
q q
q
q
q
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
q
qq
q
qq
q
qq
q
q
qq
q
qq
q
qq
qq
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
0.25
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot.
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
Variance Comparison
1v0 2v0 2v1
q
qq qqq qqq qqq
q
q
q
qqq qqq
q
q
q
q
qq
qqq qqq qqq
qqq
qqq qqq
q
q
q
qq
q qqq qqq qqq
qq
q
qqq qqq q
q
q
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
GoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
Observed Type I Error Rates (Null Scenarios)
Modification (−)
1v0
Modification (−)
2v0
Modification (−)
2v1
Modification (+)
1v0
Modification (+)
2v0
Modification (+)
2v1
q
qq
qqq qqq qqq
q
q
q
qqq qq
q
q
q
q
q
qq
qqq qqq qqq
q
q
q
qqq qqq
q
q
q
q
qq
qqq qqq qqq
qqq
qqq qq
q
q
q
q
q
qq
qqq qqq qqq
qqq
qqq qqq
q
q
q
qq
q qqq qqq qqq
qq
q
qqq qqq q
q
q
qq
q
qqq qqq qqq
qq
q
qq
q qqq qq
q
q
qq qqq qqq qqq
q
q
q
qqq qqq
q
q
q
q
qq qqq qqq qqq
q
q
q
qqq qqq
q
q
q
q
qq
qqq qqq qqq
qqq
qqq qqq
q
q
q
q
qq
qqq qqq qqq
qqq
q
qq qqq
q
q
q
q
q
q qqq qqq qqq
q
q
q
qqq qqq
q
q
q
qqq qqq qqq qqq
q
q
q
q
qq qqq
q
q
q
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects
GoodoverlapPooroverlapGoodoverlapPooroverlap
U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip
pExpo 33:33:33 10:45:45 10:10:80
pDis q 0.05 0.2
Coverage of 95% Confidence Intervals
Empirical example: Methods
Solomon et al. Arch Intern Med 2010;170:1968.
Medicare Beneficiary dataset from PA and NJ (1999-2005)
Groups: Opioids (12,601) vs COX2 inhibitors (6,172) vs nsNSAIDs
(4,874) new users
Outcomes: Death (794), fractures (706), GI bleed (230), and
cardiovascular events (1,204)
Confounders: 35 pre-treatment variables including 5 continuous
PS model: Quadratic terms for continuous variables; no interaction
Analyzed using MW and three-way matching to see agreement
MW sample size 4,618.7-4,635.71 per group; matched sample size
4,611 per group; stablized IPTW sample size 4,926.6-12,585.0
Best balance was achieved by MW in 24 covariates, by matching in
6, and by IPTW in 5.
28 / 31
Empirical example: Covariate balance
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Hepatic disease
Gender
ARB use
Alzheimer disease
Osteoporosis
Hypertension
Parkinson's disease
Thiazide use
Bone meneral density test
ACE inhibitor use
Antiepileptic use
Diabetes
Upper gastrointestinal disease
Hyperlipidemia
Gout
Benzodiazepine use
Beta blocker use
Back pain
SSRI use
Angina
H2 blocker use
Corticosteroid use
Falls
PPI use
Stroke
Myocardial infarction
No. physician visits
Age
Loop diuretic use
Fracture
White race
No. days in hospital
No. prescription drugs
Antithrombotic use
Charlson score
0.00 0.05 0.10 0.15 0.20
Absolute Standardized Mean Difference
Methods
q Unmatched
Matched
MW
IPTW
Unmatched
Matched
MW
IPTW
29 / 31
Empirical example: Outcome regression
Yoshida K et al. Matching Weights for Three-category Exposure 2/9/2016
Table 1. Comparison of hazard ratios for coxibs and opioids (nonselective NSAIDs as the reference)
by different methods and outcomes.
Coxibs vs nsNSAIDs Opioids vs nsNSAIDs
HR [95% CI] p HR [95% CI] p
Death
Unmatched 1.702 [1.293, 2.240] <0.001 2.821 [2.185, 3.642] <0.001
Matched 1.415 [1.060, 1.889] 0.018 1.997 [1.492, 2.671] <0.001
MW 1.393 [1.056, 1.837] 0.019 1.973 [1.517, 2.566] <0.001
IPTW 1.385 [1.024, 1.873] 0.035 1.962 [1.480, 2.601] <0.001
Fracture
Unmatched 1.181 [0.799, 1.746] 0.405 5.825 [4.195, 8.089] <0.001
Matched 0.947 [0.618, 1.453] 0.804 4.708 [3.308, 6.702] <0.001
MW 1.013 [0.684, 1.502] 0.948 4.733 [3.396, 6.595] <0.001
IPTW 0.887 [0.576, 1.365] 0.585 4.068 [2.814, 5.882] <0.001
GI bleed
Unmatched 0.933 [0.605, 1.439] 0.753 1.529 [1.034, 2.262] 0.033
Matched 0.932 [0.587, 1.480] 0.766 1.005 [0.615, 1.643] 0.984
MW 0.857 [0.551, 1.335] 0.496 1.108 [0.737, 1.668] 0.622
IPTW 0.916 [0.575, 1.459] 0.713 1.196 [0.793, 1.804] 0.394
Cardiovascular
Unmatched 1.603 [1.298, 1.979] <0.001 2.294 [1.882, 2.797] <0.001
Matched 1.419 [1.135, 1.775] 0.002 1.585 [1.255, 2.003] <0.001
MW 1.355 [1.096, 1.675] 0.005 1.626 [1.326, 1.995] <0.001
IPTW 1.268 [0.979, 1.642] 0.072 1.445 [1.125, 1.856] 0.004 30 / 31
Outline of proof that estimands are equivalent
A complete common support and exact propensity score matching are
assumed. Sk is the set of matched individuals in treatment group k. Wi
is the matching weights min(e1i ,...,eKi )
K
k=1 I(Zi =k)eki
. The estimators for the group
mean have the same estimand.
Matching
1
n
n
i=1 Yi I(i ∈ Sk )
1
n
n
i=1 I(i ∈ Sk )
=
1
n
n
i=1 Yki I(i ∈ Sk )
1
n
n
i=1 I(i ∈ Sk )
→
E[Yki I(i ∈ Sk )]
E[I(i ∈ Sk )]
. . .
=
E [E[Yki |Xi ]min(e1i , ..., eKi )]
E [min(e1i , ..., eKi )]
Weighting
1
n
n
i=1 Yi I(Zi = k)Wi
1
n
n
i=1 I(Zi = k)Wi
=
1
n
n
i=1 Yki I(Zi = k)Wi
1
n
n
i=1 I(Zi = k)Wi
→
E[
n
i=1 Yki I(Zi = k)Wi ]
E[
n
i=1 I(Zi = k)Wi ]
. . .
=
E [E[Yki |Xi ]min(e1i , ..., eKi )]
E [min(e1i , ..., eKi )] 31 / 31

Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulation Study

  • 1.
    Matching Weights to SimultaneouslyCompare Three Treatment Groups: a Simulation Study Kazuki Yoshida, MD, MPH, MS*, Sonia Hern´andez-D´ıaz, MD, DrPH, Daniel H. Solomon, MD, MPH, John W. Jackson, ScD, Joshua J. Gagne, PharmD, ScD, Robert Glynn, PhD, Jessica M. Franklin, PhD *Joint Doctor of Science Student Departments of Epidemiology & Biostatistics Harvard T.H. Chan School of Public Health 677 Huntington Ave, Boston, MA 02115, USA Last updated on June 22, 2016
  • 2.
    Motivation Propensity score matching(Rosenbaum & Rubin 1983) is a well established method, and is widely used in the two-group setting. In clinical practice, however, there are often 3+ comparator drugs to be compared, e.g., antirheumatic drugs for rheumatoid arthritis. For non-binary treatment, generalized propensity score (Imbens 2000) has been proposed, but its use has been limited. Recently developed software (Rassen et al 2013) allows 3-way simultaneous matching on generalized PS, but further generalization is complicated. Question: Is there an alternative that is similar to PS matching, but more easily generalizes to 3+ groups? Hypothesis: Matching weights (Li & Greene 2013) may be a viable candidate. 2 / 31
  • 3.
    Matching weights definition Li& Greene. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013;9:215-234. MWi = min(ei , 1 − ei ) Zi ei + (1 − Zi )(1 − ei ) where ei is propensity score and Zi is binary treatment indicator Advantages Asymptotic equivalence of estimand to 1:1 matching Efficiency gain No tuning parameters (no algorithm, caliper scale or width) Range (0,1) unlike non-stabilized IPTW (1,∞) Disadvantages Potential for common support violation 3 / 31
  • 4.
    PS methods visualized(common treatment) Original IPTW Matching ATTW ATUW MW 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 PS Frequency Treatment Treated Untreated 4 / 31
  • 5.
    Comparing weighting methods IPTWi= 1 Zi ei + (1 − Zi )(1 − ei ) =    1 ei for Zi = 1 1 1 − ei for Zi = 0 ATTWi = ei Zi ei + (1 − Zi )(1 − ei ) =    1 for Zi = 1 ei 1 − ei for Zi = 0 ATUWi = 1 − ei Zi ei + (1 − Zi )(1 − ei ) =    1 − ei ei for Zi = 1 1 for Zi = 0 MWi = min(ei , 1 − ei ) Zi ei + (1 − Zi )(1 − ei ) = ATTWi for ei ≤ 0.5 ATUWi for ei > 0.5 The denominator (IPTW) balances covariates, and the numerator reshapes to the target population. 5 / 31
  • 6.
    Extension of MWto K groups Define a propensity score (eki ) for each treatment category (k ∈ {1, 2}) and redefine the treatment variable as Zi ∈ {1, 2}. MWi = min(e1i , e2i ) 2 k=1 I(Zi = k)eki = Smallest PS PS of assigned treatment Use multinomial logistic regression for PS model Each subject has K propensity scores {e1i , e2i , ..., eKi } K propensity scores sum to 1 Generalize the weights as MWi = min(e1i , . . . , eKi ) K k=1 I(Zi = k)eki 6 / 31
  • 7.
    Simulation study Ti Xi Yi Outcome model βT1,βT2 (main effects) for treatment effects βXT1, βXT2 (interactions) for additional treatment effects in subset Treatment model α10, α20 (intercepts) for treatment prevalence α1X , α2X (covariate association) for covariate overlap level Outcome model β0 (intercept) for baseline risk of disease βX (covariate association) for strength of risk factors Exposure distribution: {(33 : 33 : 33), (10 : 45 : 45), (10 : 10 : 80)} Levels of covariate overlap: small, substantial Baseline risk of disease: {0.05, 0.20} Presence of treatment effect: absent, present Presence of treatment effect heterogeneity: absent, present 7 / 31
  • 8.
    Good overlap Pooroverlap q q q qq q q qq q q q q q q qq q q qq q q q 0 2000 4000 6000 U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 Sample Sizes
  • 9.
    X1 X4 X7 q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 GoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip method pExpo 33:33:33 10:45:45 10:10:80 Average Standardized Mean Differences
  • 10.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 0.75 1.00 1.50 2.00 3.00 0.75 1.00 1.50 2.00 3.00 0.75 1.00 1.50 2.00 3.00 0.75 1.00 1.50 2.00 3.00 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis 0.05 0.2 Bias (Estimated Risk Ratio / True Risk Ratio)
  • 11.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 qqq qqq qqq qqq qq q q q q q qq q q q qqq qqq qqq qqq qq q q q q q qq q q q qqq qqq qqq qqq q q q q q q q q q q q q qqq q qq qqq qqq q q q q q q q q q q q q qqq qqq qqq qqq q qq q q q q qq qq q qqq q qq qqq qqq q q q q q q q qq q q q qqq qqq qqq qqq q q q q q q q qq q q q qqq qqq qqq qqq q qq q q q q qq q q q qqq qqq qqq qqq q q q q q q q q q q q q qqq q qq qqq qqq q q q q q q q q q q q q qqq qqq qqq qqq q q q q q q qqq q q q qqq q qq qqq qqq q q q q q q q qq q q q 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 Mean Squared Error
  • 12.
    Simulation: Summary results Comparingmatching weights to three-way matching and IPTW, we found: Similar sample sizes for MW and matching, but not IPTW Best covariate balance Similarly small bias compared to matching Smaller MSE compared to matching in all scenarios More robust to rare events, unequally sized groups, and poor covariate overlap 12 / 31
  • 13.
    Conclusion MW has beensuggested as a more efficient alternative to 1:1 pairwise matching with a similar estimand (Li & Greene 2013). In the three treatment group setting, MW demonstrated similar bias, but smaller MSE compared to 1:1:1 three-way matching in a simulation study. Efficiency gain compared to 1:1:1 three-way matching was more noticeable in scenarios in which the outcome events were rare, treatment groups were unequally sized, or covariate overlap was poor. Compared to IPTW, MW was more stable in the poor covariate overlap setting. 13 / 31
  • 14.
    Acknowledgment KY currently receivestuition support jointly from Japan Student Services Organization (JASSO) and Harvard T. H. Chan School of Public Health (partially supported by training grants from Pfizer, Takeda, Bayer and PhRMA). 14 / 31
  • 15.
    Additional slides withdetails follow 15 / 31
  • 16.
    PS methods visualized(rare treatment) Original IPTW Matching ATTW ATUW MW 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 PS Frequency Treatment Treated Untreated 16 / 31
  • 17.
    Simulation: Covariate generatingmodel Based on Franklin et al. Metrics for covariate balance in cohort studies of causal effects. Stat Med. 2014;33:1685. Variable Generation Process X1i Normal(0, 12 ) X2i Log-Normal(0, 0.52 ) X3i Normal(0, 102 ) X4i Bernoulli(pi = e2X1i /(1 + e2X1i )) where E[pi ] = 0.5 X5i Bernoulli(p = 0.2) X6i Multinomial(p = (0.5, 0.3, 0.1, 0.05, 0.05)T ) X7i sin(X1i ) X8i X2 2i X9i X3i × X4i X10i X4i × X5i 17 / 31
  • 18.
    Simulation: Treatment generatingmodel ηT1i = log P(Ti = 1|Xi = xi ) P(Ti = 0|Xi = xi ) = α10 + αT 1X xi ηT2i = log P(Ti = 2|Xi = xi ) P(Ti = 0|Xi = xi ) = α20 + αT 2X xi where α10, α20 determine treatment prevalence α1X , α2X determine covariate-treatment association e0i = P(Ti = 0|Xi = xi ) = 1 qi e1i = P(Ti = 1|Xi = xi ) = exp(ηT1i ) qi e2i = P(Ti = 2|Xi = xi ) = exp(ηT2i ) qi where qi = 1 + exp(ηT1i ) + exp(ηT2i ) 18 / 31
  • 19.
    Simulation: Outcome generatingmodel ηYi = log(P(Yi = 1|Ti = ti , Xi = xi )) = β0 + βT X xi + βT1I(ti = 1) + βT2I(ti = 2)+ βXT1x4i I(ti = 1) + βXT2x4i I(ti = 2) where β0 = Intercept determining baseline disease risk βX = Effects of ten covariates (risk factors) on disease risk βT1 = Main effect of Treatment 1 compared to Treatment 0 βT2 = Main effect of Treatment 2 compared to Treatment 0 βXT1 = Additional effect for Treatment 1 vs 0 among X4i = 1 βXT2 = Additional effect for Treatment 2 vs 0 among X4i = 1 Counterfactual disease probabilities P(Yi = 1|Ti = 0, Xi = xi ) P(Yi = 1|Ti = 1, Xi = xi ) P(Yi = 1|Ti = 2, Xi = xi ) Yi ∼ Bernoulli (pYi = P(Yi = 1|Ti = ti , Xi = xi )) 19 / 31
  • 20.
    Simulation: Analyses All computationexcept 3-way matching was performed in R Multinomial logistic regression with all covariates as the propensity score model. Matched analyses: Three-way nearest neighbor algorithm implemented in the pharmacoepi toolbox by Rassen et al generated matched “trios”. Caliper for the matched trio triangle perimeter was defined as 0.6 τ2 1 +τ2 2 +τ2 3 3 where τ2 j = Var(e1|T=j)+Var(e2|T=j) 2 . OLS linear regression was conducted in the matched dataset. Weighted analysis: MW and stabilized IPTW survey package was used to appropriately account for weighting in the outcome log linear model. 20 / 31
  • 21.
    Simulation: Assessment metrics Matched/weightedsample size Covariate standardized mean difference (SMD) averaged across three contrasts bias in risk ratio Simulation and estimated variance of estimators Mean squared error of estimators False positive rates in null scenarios Coverage probability of estimated confidence intervals 21 / 31
  • 22.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 0.40 0.50 0.75 1.00 0.40 0.50 0.75 1.00 0.40 0.50 0.75 1.00 0.40 0.50 0.75 1.00 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 True Risk Ratios (Estimands)
  • 23.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 qqq q qq qqq qqq q q q q q q q qq q q q qq q q qq qq q qq q q q q q q q q qq q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq qq q q q q q qq q q q qqq q qq qqq qqq qq q q q q q qq q q q qq q q qq qq q qq q q q q q q q q qq q q q qq q q qq qq q qq q q q q q q q q qq q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq qq q q q q q qq q q q qqq q qq qqq qqq qq q q q q q q q q q q 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 True Variance
  • 24.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 qqq q qq qqq qqq q q q q q q q qq q q q qqq q qq qqq qqq q q q q q q q qq q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq qq q q q q q qq qq q qqq q qq qqq qqq qq q q q q q qq qq q qqq q qq qq q qq q q q q q q q q qq q q q qq q q qq qq q qq q q q q q q q q qq q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq q qq q q q q q q q q q qqq q qq qqq qqq qq q q q q q qq q q q qqq q q q qqq qqq qq q q q q q qq q q q 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 Mean Estimated Variance
  • 25.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q qq q q q q q q q q q q qq q qq q qq q q q q q q q q q qq q qq q qq q q q q q qq q q q qq q qq q qq q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q qq q qq q q q q q q q q q q qq q qq q qq q q q q q q q q q qq q qq q qq q q qq q qq q qq qq q qq q qq q q q q q q q q q q 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot. Est. True Boot. pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 Variance Comparison
  • 26.
    1v0 2v0 2v1 q qqqqq qqq qqq q q q qqq qqq q q q q qq qqq qqq qqq qqq qqq qqq q q q qq q qqq qqq qqq qq q qqq qqq q q q 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 GoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 Observed Type I Error Rates (Null Scenarios)
  • 27.
    Modification (−) 1v0 Modification (−) 2v0 Modification(−) 2v1 Modification (+) 1v0 Modification (+) 2v0 Modification (+) 2v1 q qq qqq qqq qqq q q q qqq qq q q q q q qq qqq qqq qqq q q q qqq qqq q q q q qq qqq qqq qqq qqq qqq qq q q q q q qq qqq qqq qqq qqq qqq qqq q q q qq q qqq qqq qqq qq q qqq qqq q q q qq q qqq qqq qqq qq q qq q qqq qq q q qq qqq qqq qqq q q q qqq qqq q q q q qq qqq qqq qqq q q q qqq qqq q q q q qq qqq qqq qqq qqq qqq qqq q q q q qq qqq qqq qqq qqq q qq qqq q q q q q q qqq qqq qqq q q q qqq qqq q q q qqq qqq qqq qqq q q q q qq qqq q q q 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 NullmaineffectsNullmaineffectsNon−nullmaineffectsNon−nullmaineffects GoodoverlapPooroverlapGoodoverlapPooroverlap U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip U M Mw Ip pExpo 33:33:33 10:45:45 10:10:80 pDis q 0.05 0.2 Coverage of 95% Confidence Intervals
  • 28.
    Empirical example: Methods Solomonet al. Arch Intern Med 2010;170:1968. Medicare Beneficiary dataset from PA and NJ (1999-2005) Groups: Opioids (12,601) vs COX2 inhibitors (6,172) vs nsNSAIDs (4,874) new users Outcomes: Death (794), fractures (706), GI bleed (230), and cardiovascular events (1,204) Confounders: 35 pre-treatment variables including 5 continuous PS model: Quadratic terms for continuous variables; no interaction Analyzed using MW and three-way matching to see agreement MW sample size 4,618.7-4,635.71 per group; matched sample size 4,611 per group; stablized IPTW sample size 4,926.6-12,585.0 Best balance was achieved by MW in 24 covariates, by matching in 6, and by IPTW in 5. 28 / 31
  • 29.
    Empirical example: Covariatebalance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Hepatic disease Gender ARB use Alzheimer disease Osteoporosis Hypertension Parkinson's disease Thiazide use Bone meneral density test ACE inhibitor use Antiepileptic use Diabetes Upper gastrointestinal disease Hyperlipidemia Gout Benzodiazepine use Beta blocker use Back pain SSRI use Angina H2 blocker use Corticosteroid use Falls PPI use Stroke Myocardial infarction No. physician visits Age Loop diuretic use Fracture White race No. days in hospital No. prescription drugs Antithrombotic use Charlson score 0.00 0.05 0.10 0.15 0.20 Absolute Standardized Mean Difference Methods q Unmatched Matched MW IPTW Unmatched Matched MW IPTW 29 / 31
  • 30.
    Empirical example: Outcomeregression Yoshida K et al. Matching Weights for Three-category Exposure 2/9/2016 Table 1. Comparison of hazard ratios for coxibs and opioids (nonselective NSAIDs as the reference) by different methods and outcomes. Coxibs vs nsNSAIDs Opioids vs nsNSAIDs HR [95% CI] p HR [95% CI] p Death Unmatched 1.702 [1.293, 2.240] <0.001 2.821 [2.185, 3.642] <0.001 Matched 1.415 [1.060, 1.889] 0.018 1.997 [1.492, 2.671] <0.001 MW 1.393 [1.056, 1.837] 0.019 1.973 [1.517, 2.566] <0.001 IPTW 1.385 [1.024, 1.873] 0.035 1.962 [1.480, 2.601] <0.001 Fracture Unmatched 1.181 [0.799, 1.746] 0.405 5.825 [4.195, 8.089] <0.001 Matched 0.947 [0.618, 1.453] 0.804 4.708 [3.308, 6.702] <0.001 MW 1.013 [0.684, 1.502] 0.948 4.733 [3.396, 6.595] <0.001 IPTW 0.887 [0.576, 1.365] 0.585 4.068 [2.814, 5.882] <0.001 GI bleed Unmatched 0.933 [0.605, 1.439] 0.753 1.529 [1.034, 2.262] 0.033 Matched 0.932 [0.587, 1.480] 0.766 1.005 [0.615, 1.643] 0.984 MW 0.857 [0.551, 1.335] 0.496 1.108 [0.737, 1.668] 0.622 IPTW 0.916 [0.575, 1.459] 0.713 1.196 [0.793, 1.804] 0.394 Cardiovascular Unmatched 1.603 [1.298, 1.979] <0.001 2.294 [1.882, 2.797] <0.001 Matched 1.419 [1.135, 1.775] 0.002 1.585 [1.255, 2.003] <0.001 MW 1.355 [1.096, 1.675] 0.005 1.626 [1.326, 1.995] <0.001 IPTW 1.268 [0.979, 1.642] 0.072 1.445 [1.125, 1.856] 0.004 30 / 31
  • 31.
    Outline of proofthat estimands are equivalent A complete common support and exact propensity score matching are assumed. Sk is the set of matched individuals in treatment group k. Wi is the matching weights min(e1i ,...,eKi ) K k=1 I(Zi =k)eki . The estimators for the group mean have the same estimand. Matching 1 n n i=1 Yi I(i ∈ Sk ) 1 n n i=1 I(i ∈ Sk ) = 1 n n i=1 Yki I(i ∈ Sk ) 1 n n i=1 I(i ∈ Sk ) → E[Yki I(i ∈ Sk )] E[I(i ∈ Sk )] . . . = E [E[Yki |Xi ]min(e1i , ..., eKi )] E [min(e1i , ..., eKi )] Weighting 1 n n i=1 Yi I(Zi = k)Wi 1 n n i=1 I(Zi = k)Wi = 1 n n i=1 Yki I(Zi = k)Wi 1 n n i=1 I(Zi = k)Wi → E[ n i=1 Yki I(Zi = k)Wi ] E[ n i=1 I(Zi = k)Wi ] . . . = E [E[Yki |Xi ]min(e1i , ..., eKi )] E [min(e1i , ..., eKi )] 31 / 31