SlideShare a Scribd company logo
1 of 104
Download to read offline
5. Multiple Decision Treatment Regimes: Framework and
Fundamentals
5.1 Multiple Decision Treatment Regimes
5.2 Statistical Framework
5.3 The g-Computation Algorithm
5.4 Estimation of the Value of a Fixed Regime
5.5 Key References
216
Recall: Acute Leukemia
• At baseline: Information x1, history h1 = x1 ∈ H1
• Decision 1: Set of options A1 = {C1,C2}; rule 1: d1(h1): H1 → A1
• Between Decisions 1 and 2: Collect additional information x2, including
responder status
• Accrued information/history h2 = (x1, therapy at decision 1, x2) ∈ H2
• Decision 2: Set of options A2 = {M1,M2,S1,S2}; rule 2:
d2(h2): H2 → {M1,M2} (responder), d2(h2): H2 → {S1,S2} (nonresponder)
• Treatment regime : d = {d1(h1), d2(h2)} = (d1, d2)
217
K decision treatment regime
In general: K decision points/stages
• Baseline information x1 ∈ X1, intermediate information xk ∈ Xk
between Decisions k − 1 and k, k = 2, . . . , K
• Treatment options Ak at Decision k, elements ak ∈ Ak ,
k = 1, . . . , K
• Accrued information or history
h1 = x1 ∈ H1
hk = (x1, a1, . . . , xk−1, ak−1, xk ) ∈ Hk , k = 2, . . . , K,
• Decision rules d1(h1), d2(h2), . . . , dK (hK ), dk : Hk → Ak
• Treatment regime
d = {d1(h1), . . . , dK (hK )} = (d1, d2, . . . , dK )
218
Notation
Overbar notation: With
xk ∈ Xk , ak ∈ Ak , k = 1, . . . , K
it is convenient to define for k = 1, . . . , K
xk = (x1, . . . , xk ), ak = (a1, . . . , ak )
Xk = X1 × · · · × Xk , Ak = A1 × · · · × Ak
• Conventions: a = aK , x = xK , a0 is null
• h1 = x1, hk = (xk , ak−1), k = 2, . . . , K
• H1 = X1, Hk = Xk × Ak−1, k = 2, . . . , K
• dk = (d1, . . . , dk ), k = 1, . . . , K, d = dK = (d1, . . . , dK )
219
Decision points
Decision points depend on the context: For example
• Acute leukemia: Decisions are at milestones in the disease
progression (diagnosis, evaluation of response)
• HIV infection: Decisions are according to a schedule (at monthly
clinic visits)
• Cardiovascular disease (CVD): Decisions are made upon
occurrence of an event (myocardial infarction)
Perspective:
• Timing of decisions can be fixed (HIV) or random (leukemia)
• If any individual would reach all K decision points, the distinction
is not important; we assume this going forward
• If different individuals can experience different numbers of
decisions (CVD), the number of decisions reached is random,
and a more specialized framework is required
• The latter also is the case if the outcome is a time to an event
220
Example: K = 2, acute leukemia
Decision 1: A1 = {C1,C2}, x1 = h1 includes age (years), baseline
white blood cell count WBC1 (×103/µl)
• Example rule:
d1(h1) = C2 I(C) + C1 I(Cc
), C = {age < 50, WBC1 < 10}
Decision 2: A2 = {M1, M2, S1, S2}
• x2 and thus h2 includes Decision 2 WBC2; ECOG, EVENT (≥
grade 3 adverse event from induction therapy), RESP
• Example rule:
d2(h2) = I(RESP = 1){M1 I(M) + M2 I(Mc
)}
+ I(RESP = 0){S1 I(S) + S2 I(Sc
)}
(5.1)
M = {WBC1 < 11.2, WBC2 < 10.5, EVENT = 0, ECOG ≤ 2}
S = {age > 60, WBC2 < 11.0, ECOG ≥ 2}
221
Recursive representation of rules
Fact: If the K rules in d ∈ D are followed by an individual, the options
selected at each decision depend only on the evolving x1, . . . , xK
• Decision 1: Option selected is d1(h1) = d1(x1)
• Between Decisions 1 and 2: x2
• Decision 2: Rule d2(h2) = d2(x2, a1), option selected depends
on the option selected at Decision 1
d2{x2, d1(x1)}
• Between Decisions 2 and 3: x3
• Decision 3: Rule d3(h3) = d3(x3, a2) = d3(x3, a1, a2), option
selected depends on those at Decisions 1 and 2
d3[x3, d1(x1), d2{x2, d1(x1)}]
• And so on. . .
222
Recursive representation of rules
Concise representation: For k = 2, . . . , K
d2(x2) = [d1(x1), d2{x2, d1(x1)}]
d3(x3) = [d1(x1), d2{x2, d1(x1)}, d3{x3, d2(x2)}]
... (5.2)
dK (xK ) = [d1(x1), d2{x2, d1(x1)}, . . . , dK {xK , dK−1(xK−1)}]
• dk (xk ) comprises the options selected through Decision k
• d(x) = dK (xK )
• This representation will be useful later
223
Redundancy
From the perspective of an individual following the K rules in d:
• The definition of rules dk (hk ) = dk (xk , ak−1) is redundant
• a1 is determined by x1, a2 is determined by x2 = (x1, x2), . . .
• But definition of dk as a function of hk = (xk , ak−1) is useful for
characterizing and estimating an optimal regime later
Illustration of redundancy: K = 2, A1 = {0, 1}, A2 = {0, 1},
X1 = {0, 1}, X2 = {0, 1} (x1 and x2 are binary)
• d = (d1, d2) ∈ D, d1 : H1 = X1 → A1, d2 : H2 = X2 × A1 → A2
• For each value of h1 = x1, d1 must return a value in A1, e.g.,
d1(x1 = 0) = 0, d1(x1 = 1) = 1
224
Redundancy
Illustration of redundancy, continued:
• For each of the 23 = 8 possible values of h2 = (x1, x2, a1), d2
must return a value in A2, e.g.,
d2(x1 = 0, x2 = 0, a1 = 0) = 0
d2(x1 = 0, x2 = 0, a1 = 1) = 1∗
d2(x1 = 0, x2 = 1, a1 = 0) = 1
d2(x1 = 0, x2 = 1, a1 = 1) = 1∗
d2(x1 = 1, x2 = 0, a1 = 0) = 0∗
d2(x1 = 1, x2 = 0, a1 = 1) = 1
d2(x1 = 1, x2 = 1, a1 = 0) = 0∗
d2(x1 = 1, x2 = 1, a1 = 1) = 0
• Configurations with ∗ could never occur if an individual followed
regime d
225
5. Multiple Decision Treatment Regimes: Framework and
Fundamentals
5.1 Multiple Decision Treatment Regimes
5.2 Statistical Framework
5.3 The g-Computation Algorithm
5.4 Estimation of the Value of a Fixed Regime
5.5 Key References
226
Outcome of interest
Determination of outcome:
• Can be ascertained after Decision K, e.g., for HIV infected
patients, outcome = viral load (viral RNA copies/mL) measured
at a final clinic visit after Decision K
• Can be defined using intervening information, e.g., for HIV
infected patients with CD4 count (cells/mm3) measured at each
clinic visit (Decisions k = 2, . . . , K) and at a final clinic visit after
Decision K
outcome = total # CD4 counts > 200 cells/mm3
Convention: As for single decision case, assume larger outcomes
are preferred
• E.g., because smaller viral load is better, take
outcome = −viral load
227
Potential outcomes for K decisions
Intuitively: For a randomly chosen individual with history X1
• If he/she were to receive a1 ∈ A1 at Decision 1, the evolution of
his/her disease/disorder process after Decision 1 would be be
influenced by a1
• Suggests: X*
2(a1) = intervening information that would arise
between Decisions 1 and 2 after receiving a1 ∈ A1 at Decision 1
• If he/she then were to receive a2 ∈ A2 at Decision 2, the
evolution of his/her disease/disorder process after Decision 2
would be influenced by a1 followed by a2
• Suggests: X*
3(a2) = intervening information that would arise
between Decisions 2 and 3 after receiving a1 ∈ A1 at Decision 1
and a2 at Decision 2
• And so on. . .
228
Potential outcomes for K decisions
Ultimately: If he/she were to receive options a = aK = (a1, . . . , aK )
at Decisions 1,. . . ,K
Y*
(aK ) = Y*
(a) = outcome that would be achieved
Summarizing: Potential information at each decision and potential
outcome if an individual were to receive a = (a1, . . . , aK )
{X1, X*
2(a1), X*
3(a2), . . . , X*
K (aK−1), Y*
(a)}
• X1 may or may not be included
• Set of all possible potential outcomes for all a ∈ A
W*
= X*
2(a1), X*
3(a2), . . . , X*
K (aK−1), Y*
(a),
for a1 ∈ A1, a2 ∈ A2, . . . , aK−1 ∈ AK−1, a ∈ A (5.3)
229
Feasible sets and classes of treatment regimes
Example: Acute leukemia, K = 2
A1 = {C1,C2}, A2 = {M1, M2, S1, S2}
• At Decision 1, both options are feasible for all patients
• At Decision 2, only maintenance options are feasible for patients
who respond, and only salvage options are feasible for patients
who do not respond
• Is the case almost always in multiple decision problems at
decision points other than Decision 1
• Of course is also possible at Decision 1
230
Feasible sets and classes of treatment regimes
Formal specification: For any history hk = (xk , ak−1) ∈ Hk at
Decision k, k = 1, . . . , K
• The set of feasible treatment options at Decision k is
Ψk (hk ) = Ψk (xk , ak−1) ⊆ Ak , k = 1, . . . , K (5.4)
Ψ1(h1) = Ψ1(x1) ⊆ A1 (a0 null)
• Ψk (hk ) is a function mapping Hk to the set of all possible
subsets of Ak
• Ψk (hk ) can be a strict subset of Ak or all of Ak , depending on hk
• Collectively: Ψ = (Ψ1, . . . , ΨK )
231
Feasible sets and classes of treatment regimes
Example, revisited: Acute leukemia, K = 2
A1 = {C1,C2}, A2 = {M1, M2, S1, S2}
• Suppose C1 and C2 are feasible for all individuals regardless of
h1
Ψ1(h1) = A1 for all h1
• If h2 indicates response
Ψ2(h2) = {M1, M2} ⊂ A2 for all such h2
• If h2 indicates nonresponse
Ψ2(h2) = {S1, S2} ⊂ A2 for all such h2
• More complex specifications of feasible sets that take account of
additional information are of course possible
232
Feasible sets and classes of treatment regimes
Fancier example: Acute leukemia, K = 2
A1 = {C1,C2}, A2 = {M1, M2, S1, S2}
• If C1 is contraindicated for patients with renal impairment
Ψ1(h1) = {C2} if h1 indicates renal impairment
= {C1,C2} = A1 if h1 does not
• If S1 increases risk of adverse events in nonresponders with low
WBC2
Ψ2(h2) = {S1} if h2 indicates nonresponse, WBC2 ≤ w
= {S1, S2} if h2 indicates nonresponse, WBC2 > w
for some threshold w (×103/µl)
233
Feasible sets and classes of treatment regimes
Ideally:
• Specification of the feasible sets is dictated by scientific
considerations
• Disease/disorder context, available treatment options, patient
population, etc
• Specification of Ψk (hk ), k = 1, . . . , K, should incorporate only
information in hk that is critical to treatment selection
Regimes and feasible sets: Given specified feasible sets Ψ
• A regime d = (d1, . . . , dK ) whose rules select treatment options
for history hk from those in Ψk (hk ) is defined in terms of Ψ
• Thus, regimes are Ψ-specific, and the relevant class of all
possible regimes D depends on Ψ (suppressed in the notation)
234
Feasible sets and classes of treatment regimes
In practice: At Decision k, there is a small number k of subsets
Ak,l ⊆ Ak , l = 1, . . . , k that are feasible sets for all hk
• E.g, for acute leukemia, 2 = 2
A2,1 = {M1,M2}, A2,2 = {S1,S2}
• If r2 is the component of h2 indicating response
Ψ1(h2) = A2,1 for h2 with r2 = 1
= A2,2 for h2 with r2 = 0
Decision rules: Different rule for each subset
• E.g., for acute leukemia, as in (5.1)
d2(h2) = I(r2 = 1) d2,1(h2) + I(r2 = 0) d2,2(h2)
d2,1(h2) is a rule selecting maintenance therapy for responders
d2,2(h2) is a rule selecting salvage therapy for nonresponders
235
Feasible sets and classes of treatment regimes
In general: With k distinct subsets Ak,l ⊆ Ak , l = 1, . . . , k , as
feasible sets
• Define sk (hk ) = 1, . . . , k according to which of these subsets
Ψk (hk ) corresponds for given hk
• dk,l(hk ) is the rule corresponding to the lth subset Ak,l
• Decision rule at Decision k has form
dk (hk ) =
k
l=1
I{sk (hk ) = l} dk,l(hk ) (5.5)
• Henceforth, it is understood that dk (hk ) may be expressed as in
(5.5) where appropriate
236
Feasible sets and classes of treatment regimes
Note: For any Ψ-specific regime d = (d1, . . . , dK )
• At Decision k, dk (hk ) = dk (xk , ak−1) returns only options in
Ψk (hk ) = Ψk (xk , ak−1), i.e.,
dk (hk ) = dk (xk , ak−1) ∈ Ψk (hk ) ⊆ Ak
• Thus, dk need map only a subset of Hk = Xk × Ak−1 to Ak
• We discuss this more shortly
237
Potential outcomes for a fixed regime d ∈ D
Intuitively: If a randomly chosen individual with history X1 were to
receive treatment options by following the rules in d
• Decision 1: Treatment determined by d1
• X*
2(d1) = intervening information that would arise between
Decisions 1 and 2
• Decision 2: Treatment determined by d2
• X*
3(d2) = intervening information that would arise between
Decisions 2 and 3 ...
• X*
k (dk−1) = intervening information that would arise between
Decisions k − 1 and k, k = 2, . . . , K
• Y*(d) = Y*(dK ) = outcome that would be achieved if all rules in
d were followed
Potential outcomes under regime d:
{X1, X*
2(d1), X*
3(d2), . . . , X*
K (dK−1), Y*
(d)} (5.6)
238
Potential outcomes for a fixed regime d ∈ D
Formally: These potential outcomes are functions of W* in (5.3)
• Define
X*
k (ak−1) = {X1, X*
2(a1), X*
3(a2), . . . , X*
k (ak−1)}, k = 2, . . . , K
• Then
X*
2(d1) =
a1∈A1
X*
2(a1)I{d1(X1) = a1}
X*
k (dk−1) =
ak−1∈Ak−1
X*
k (ak−1)
k−1
j=1
I dj {X*
j (aj−1), aj−1} = aj (5.7)
k = 3, . . . , K
Y*
(d) =
a∈A
Y*
(a)
K
j=1
I dj {X*
j (aj−1), aj−1} = aj
• Also define
X*
k (dk−1) = {X1, X*
2(d1), X*
3(d2), . . . , X*
k (dk−1)}, k = 2, . . . , K
239
Value of a K-decision regime
With these definitions: The value of d ∈ D is
V(d) = E{Y*
(d)}
• And an optimal regime dopt ∈ D satisfies
E{Y*
(dopt
)} ≥ E{Y*
(d)} for all d ∈ D (5.8)
equivalently
dopt
= arg max
d∈D
E{Y*
(d)} = arg max
d∈D
V(d)
240
Optimal treatment options vs. optimal decisions
For a randomly chosen individual with history H1:
• The optimal sequence of treatment options for this individual is
arg max
a∈A
Y*
(a)
which of course is not knowable in practice
• All that is known at baseline is H1, and dopt
1 , . . . , dopt
K select the
(feasible) options corresponding to the largest expected outcome
• From the definition of Y*(d) in (5.7),
Y*
(d) ≤ max
a∈A
Y*
(a) for all d ∈ D =⇒ Y*
(dopt
) ≤ max
a∈A
Y*
(a)
so an optimal regime might not select optimal options for this
individual at each decision
• Rather, dopt leads to the optimal sequence of decisions that can
be made based on the available information on this individual
241
Data
Ideally: For i = 1, . . . , n, i.i.d.
(X1i, A1i, X2i, A2i, . . . , XKi, AKi, Yi) = (XKi, AKi, Yi) = (Xi, Ai, Yi) (5.9)
• X1 = baseline information at Decision 1, taking values in X1
• Ak = treatment option actually received at Decision k,
k = 1, . . . , K, taking values in Ak
• Xk = intervening information between Decisions k − 1 and k,
k = 2, . . . , K, taking values in Xk
• Xk = (X1, . . . , Xk ), X = XK = (X1, . . . , XK ), and
Ak = (A1, . . . , Ak ), A = AK = (A1, . . . , AK )
• History H1 = X1, Hk = (X1, A1, . . . , Xk−1, Ak−1, Xk ) = (Xk , Ak−1),
k = 2, . . . , K
• Y = observed outcome (after Decision K or function of HK )
242
Data
Data sources:
• Longitudinal observational study: Retrospective from an existing
database, completed conventional clinical trial with followup,
prospective cohort study
• Randomized study: Prospective clinical trial conducted
specifically for this purpose (SMART)
Longitudinal observational study: Challenges
• Time-dependent confounding: A subject’s history at each
decision point is both determined by past treatments and used to
select future treatments
• Characteristics associated with both treatment selection and
future characteristics/ultimate outcome may not be captured in
the data
243
Data
SMART: Randomization at •s
244
Data
Do we really need data like these? Can’t we just “piece together ”
an optimal regime using single decision methods on data from
separate studies (with different subjects in each)?
• E.g., acute leukemia: Estimate dopt
1 from a study comparing
{C1,C2} and dopt
2 from separate studies comparing {M1, M2}
and {S1, S2}
• Delayed effects: The induction therapy with the highest
proportion of responders might have other effects that render
subsequent treatments less effective in regard to survival
• Require data from a study involving the same subjects over the
entire sequence of decisions
245
Statistical problem
Ultimate goal: Based on the data (5.9), estimate dopt ∈ D satisfying
(5.8), i.e.,
E{Y*
(dopt
)} ≥ E{Y*
(d)} for all d ∈ D
Challenge: dopt is defined in terms of the potential outcomes (5.6)
• Must be able to express this definition in terms of the observed
data (5.9)
• In particular, for any d ∈ D, must be able to identify the
distribution of
{X1, X*
2(d1), X*
3(d2), . . . , X*
K (dK−1), Y*
(d)}
which depends on that of (X1, W*), from the distribution of
(X1, A1, X2, A2, . . . , XK , AK , Y)
• Possible under the following assumptions generalizing those in
(3.4), (3.5), and (3.6)
246
Identifiability assumptions
SUTVA (consistency):
Xk = X*
k (Ak−1) =
ak−1∈Ak−1
X*
k (ak−1)I(Ak−1 = ak−1), k = 2, . . . , K
(5.10)
Y = Y*
(A) =
a∈A
Y*
(a)I(A = a)
• Observed intervening information and the final observed
outcome are those that would potentially be seen under the
treatments actually received at each decision point
• Are the same (consistent) regardless of how the treatments are
administered at each decision point
247
Identifiability assumptions
Sequential randomization assumption (SRA): Robins (1986)
W*
⊥⊥ Ak |Xk , Ak−1, k = 1, . . . , K, where A0 is null (5.11)
equivalently
W*
⊥⊥ Ak |Hk , k = 1, . . . , K
• (5.11) is the strongest version; weaker versions require
treatment selection ⊥⊥ only of future potential outcomes
• Unverifiable from the observed data
• Unlikely to hold for data from a longitudinal observational study
not carried out with estimation of treatment regimes in mind
• But (5.11) holds by design in a SMART
248
Identifiability assumptions
Positivity assumption: With feasible sets of treatment options at
each decision point, more complicated
• Intuitively: To identify the distribution of the potential outcomes
(5.6) from that of the observed data (5.9), all treatment options in
the feasible sets Ψk (hk ) = Ψk (xk , ak−1) must be represented in
the observed data, k = 1, . . . , K
• That is, there must be individuals in the data who received each
of the options in Ψk (hk ), k = 1, . . . , K
• E.g, acute leukemia: Decision 2: there must be responders who
received each of M1 and M2 and nonresponders who received
each of S1 and S2
Built up recursively. . .
249
Identifiability assumptions
Decision 1: Set of all possible baseline info h1 = x1
Γ1 = {x1 ∈ X1 satisfying P(X1 = x1) > 0} ⊆ X1 = H1
Set of all possible histories and associated options in Ψ1(h1)
Λ1 = {(x1, a1) such that x1 = h1 ∈ Γ1, a1 ∈ Ψ1(h1)}
All options in Λ1 must be represented in the data
P(A1 = a1|H1 = h1) > 0 for all (h1, a1) ∈ Λ1 (5.12)
First component of the positivity assumption
250
Identifiability assumptions
Decision 2: All possible histories h2 consistent with following a
Ψ-specific regime at Decision 1
Γ2 = (x2, a1) ∈ X2 × A1 satisfying (x1, a1) ∈ Λ1 and
P{X*
2(a1) = x2 | X1 = x1} > 0 ⊆ H2
Under SUTVA (5.10) and SRA (5.11), equivalently
Γ2 = (x2, a1) ∈ X2 × A1 satisfying (x1, a1) ∈ Λ1 and
P(X2 = x2 | X1 = x1, A1 = a1) > 0
Set of all possible histories h2 and associated options in Ψ2(h2)
Λ2 = {(x2, a2) such that (x2, a1) = h2 ∈ Γ2, a2 ∈ Ψ2(h2)}
251
Identifiability assumptions
Decision 2, continued: All options in Λ2 must be represented in the
data
P(A2 = a2 | H2 = h2) > 0 for all (h2, a2) ∈ Λ2
Second component of the positivity assumption
...
Decision k: All histories hk consistent with a Ψ-specific regime
through Decision k − 1
Γk = (xk ,ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λk−1 and
P{X*
k (ak−1) = xk | X*
k−1(ak−2) = xk−1} > 0 ⊆ Hk (5.13)
= (xk ,ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λk−1 and
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0 (5.14)
252
Identifiability assumptions
Decision k, continued: Set of all possible histories hk and
associated options in Ψk (hk )
Λk = {(xk , ak ) such that (xk , ak−1) = hk ∈ Γk , ak ∈ Ψk (hk )}
All options in Ψk (hk ) must be represented in the data
P(Ak = ak | Hk = hk ) > 0 for all (hk , ak ) ∈ Λk
kth component of the positivity assumption
...
253
Identifiability assumptions
Positivity assumption: Summarizing
P(Ak = ak |Hk = hk ) = P(Ak = ak |Xk = xk , Ak−1 = ak−1) > 0
for hk = (xk , ak−1) ∈ Γk and ak ∈ Ψk (hk ) = Ψk (xk , ak−1),
k = 1, . . . , K (5.15)
• The positivity assumption (5.15) holds in a SMART by design if
there are subjects with history hk randomized to all options in
Ψk (hk ) at each Decision k = 1, . . . , K
• No guarantee that for a given Ψ (5.15) holds for data from a
longitudinal observational study (more shortly)
254
Identifiability assumptions
Equivalence of (5.13) and (5.14): Assuming SUTVA, SRA, need to
show for any hk = (xk , ak−1) ∈ Γk in (5.13)
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1)
= P{X*
k (ak−1) = xk | X*
k−1(ak−2) = xk−1} (5.16)
• Proof is by induction (k = 1, 2 are immediate)
• Repeated use of the following lemma
Lemma. Let A and H be random variables, assume
W* ⊥⊥ A|H, and consider two functions f1(W*) and f2(W*) of
W*. If the event {f2(W*) = f2, H = h, A = a} has positive
probability, then
P{f1(W*
) = f1|f2(W*
) = f2, H = h, A = a}
= P{f1(W*
) = f1|f2(W*
) = f2, H = h}
255
Identifiability assumptions
Sketch of induction proof:
• We need to show for any hk = (xk , ak−1) ∈ Γk in (5.13), k = 1, . . . , K
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1)
= P{X*
k (ak−1) = xk | X*
k−1(ak−2) = xk−1} (5.16)
• (5.16) is trivial for k = 1
• k = 2: (5.12) implies P(X1 = x1, A1 = a1) > 0 for (x1, a1) ∈ Λ1, so
P(X2 = x2 | X1 = x1, A1 = a1) is well defined. Then by SUTVA and
SRA, (5.16) holds
P(X2 = x2 | X1 = x1, A1 = a1) = P{X*
2(a1) = x2 | X1 = x1, A1 = a1}
= P{X*
2(a1) = x2 | X1 = x1}
• For general k: Need to show P(Xk−1 = xk−1, Ak−1 = ak−1) > 0, so
that P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) is well defined, and then
show (5.16)
256
Identifiability assumptions
• Assume this holds for k − 1 and then show it holds for k. Thus, for
hk−1 = (xk−1, ak−2) ∈ Γk−1, P(Xk−2 = xk−2, Ak−2 = ak−2) > 0 and
P(Xk−1 = xk−1 | Xk−2 = xk−2, Ak−2 = ak−2)
= P{X*
k−1(ak−2) = xk−1 | X*
k−2(ak−3) = xk−2} > 0
• It follows that P(Xk−1 = xk−1, Ak−2 = ak−2) > 0
• Because hk−1 = (xk−1, ak−2) ∈ Γk−1 and ak−1 ∈ Ψk−1(xk−1, ak−2),
P(Ak−1 = ak−1 | Xk−1 = xk−1, Ak−2 = ak−2) > 0
• It then follows that P(Xk−1 = xk−1, Ak−1 = ak−1) > 0 as required
• Now show (5.16) using SUTVA, SRA, and the Lemma
257
Identifiability assumptions
• By repeated use of SUTVA and SRA
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1)
= P{X*
k (ak−1) = xk |Xk−2 = xk−2, X*
k−1(ak−2) = xk−1
Ak−3 = ak−3, Ak−2 = ak−2}, (5.17)
• By the Lemma with f1(W*
) = X*
k (ak−1), A = Ak−2, H = (Xk−2, Ak−3),
and f2(W*
) = X*
k−1(ak−2), (5.17) =
P{X*
k (ak−1) = xk |Xk−2 = xk−2, X*
k−1(ak−2) = xk−1, Ak−3 = ak−3}
= P{X*
k (ak−1) = xk |Xk−3 = xk−3, X*
k−2(ak−3) = xk−2
X*
k−1(ak−2) = xk−1, Ak−4 = ak−4, Ak−3 = ak−3} (5.18)
• By the Lemma with f1(W*
) = X*
k (ak−1), A = Ak−3, H = (Xk−3, Ak−4),
(5.18) =
P{X*
k (ak−1) = xk |Xk−3 = xk−3, X*
k−2(ak−3) = xk−2,
X*
k−1(ak−2) = xk−1, Ak−4 = ak−4}
258
Identifiability assumptions
• Continuing to apply the Lemma and SUTVA leads to (5.16),
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1)
= P{X*
k (ak−1) = xk | X*
k−1(ak−2) = xk−1}
• Because hk = (xk , ak−1) ∈ Γk , the RHS is > 0
• Thus, we have shown that (5.16) holds for k and
P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0
for hk = (xk , ak−1) ∈ Γk
• Because this holds for k = 1, 2, the result follows by induction
259
Identifiability assumptions
More precise definition of a regime: A Ψ-specific regime
d = (d1, . . . , dK ) satisfies
• Each rule dk , k = 1, . . . , K, is a mapping from Γk ⊆ Hk into Ak
for which dk (hk ) ∈ Ψk (hk ) for every hk ∈ Γk
• The class D of Ψ-specific regimes is the set of all such d
260
Identifiability assumptions
Observational data: For given Ψ, no guarantee that all options in
Ψk (hk ), k = 1, . . . , K, are represented in the data
• Define for k = 1, . . . , K
Γmax
k = (xk , ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λmax
k−1
and P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0
Ψmax
k (hk ) ={ak ∈ Ak satisfying P(Ak = ak |Hk = hk ) > 0
for all hk = (xk , ak−1) ∈ Γmax
k }
Λmax
k = {(xk , ak ) such that (xk , ak−1) = hk ∈ Γmax
k , ak ∈ Ψmax
k (hk )}
• The class of regimes based on Ψmax = (Ψmax
1 , . . . , Ψmax
k ) is the
largest that can be considered
• So must have Ψk (hk ) ⊆ Ψmax
k (hk ), k = 1, . . . , K for all
hk ∈ Γk ⊆ Γmax
k
261
5. Multiple Decision Treatment Regimes: Framework and
Fundamentals
5.1 Multiple Decision Treatment Regimes
5.2 Statistical Framework
5.3 The g-Computation Algorithm
5.4 Estimation of the Value of a Fixed Regime
5.5 Key References
262
Identifiability result
Goal, again: For any Ψ-specific regime d ∈ D, demonstrate that we
can identify the distribution of
{X1, X*
2(d1), X*
3(d2), . . . , X*
K (dK−1), Y*
(d)}
which depends on that of (X1, W*), from the distribution of
(X1, A1, X2, A2, . . . , XK , AK , Y)
Recall: Recursive representation dk (xk ) of the treatment options
selected by d through Decision k in (5.2) if an individual follows d,
k = 2, . . . , K
dk (xk ) = [d1(x1), d2{x2, d1(x1)}, . . . , dk {xk , dk−1(xk−1)}]
263
g-Computation algorithm
Main result: Under SUTVA (5.10), SRA (5.11), and positivity
assumption (5.15), the joint density of the potential outcomes
{X1, X*
2(d1), X*
3(d2), . . . , X*
K (dK−1), Y*(d)} can be obtained as
pX1,X*
2(d1),X*
3(d2),...,X*
K
(dK−1),Y*(d)(x1, . . . , xK , y) (5.19)
= pY|X ,A{y|x, d(x)}
× pXK |XK−1,AK−1
{xK |xK−1, dK−1(xK−1)}
... (5.20)
× pX2|X1,A1
{x2|x1, d1(x1)}
× pX1
(x1)
for any realization (x1, x2, . . . , xK , y) for which (5.19) is positive
• Due to Robins (1986, 1987, 2004)
264
g-Computation algorithm
Additional definition: Relevant realizations (x1, x2, . . . , xK , y) are
determined by feasible sets
• Assume Y*(a) and Y take values y ∈ Y
• Define
ΓK+1 = (x,a, y) ∈ X × A × Y satisfying (x, a) ∈ ΛK and
P{Y*
(a) = y | X*
K (aK−1) = xK } > 0
= (x,a, y) ∈ X × A × Y satisfying (x, a) ∈ ΛK and
P(Y = y | X = x, AK−1 = aK−1) > 0
• This equality can be shown similarly to that of (5.13) and (5.14)
265
g-Computation algorithm
Simplification: Take all random variables discrete, so that (5.19) and
(5.20) become
P{X1 = x1, X*
2(d1) = x2, . . . , X*
K (dK−1) = xK , Y*
(d) = y} (5.21)
= P{Y = y | XK = xK , AK = dK (xK−1)}
× P(XK = xK | XK−1 = xK−1, AK−1 = dK−1(xK−2)}
... (5.22)
× P{X2 = x2 | X1 = x1, A1 = d1(x1)}
× P(X1 = x1)
for any realization (x1, x2, . . . , xK , y) such that
P{X1 = x1, X*
2(d1) = x2, . . . , X*
K (dK−1) = xK , Y*
(d) = y} > 0
• Need to show (5.21) = (5.22)
266
g-Computation algorithm
Demonstration: Factorize (5.21) as
P{X1 = x1, X*
2(d1) = x2, . . . , X*
K (dK−1) = xK , Y*
(d) = y}
= P{Y*
(d) = y | X*
K (dK−1) = xK }
× P{X*
K (dK−1) = xK | X*
K−1(dK−2) = xK−1}
...
× P{X*
2(d1) = x2 | X1 = x1}
× P(X1 = x1)
• All components on RHS are positive because (5.21) > 0
267
g-Computation algorithm
• From (5.22), it suffices to show
P{Y*
(d) = y | X*
K (dK−1) = xK }
= P{Y = y | XK = xK , AK = dK (xK−1)}
where P{Y = y | XK = xK , AK = dK (xK−1)} > 0
and for k = 2, . . . , K
P{X*
k (dk−1) = xk | X*
k−1(dk−2) = xK−1}
= P(Xk = xk | Xk−1 = xk−1, Ak−1 = dk−1(xk−2)}
where P(Xk = xk | Xk−1 = xk−1, Ak−1 = dk−1(xk−2)} > 0
• These follow immediately if, for X*
K+1(d) = Y*(d)
{xk , dk−1(xk−1)} ∈ Γk , k = 2, . . . , K + 1 (5.23)
which can be shown by induction (first show for k = 2)
268
g-Computation algorithm
Sketch of induction proof: First take k = 2
• Because P(X1 = x1) > 0, x1 ∈ Γ1 and d is a Ψ-specific regime,
d1(x1) ∈ Ψ1(h1), so that {x1, d1(x1)} ∈ Λ1
• Because (5.21) > 0,
P{X*
2(d1) = x2 | X1 = x1} > 0 =⇒ {x2, d1(x1)} ∈ Γ2
which is (5.23) for k = 2
• Now assume {xk−1, dk−2(xk−2)} ∈ Γk−1 is true
• Because d is a Ψ-specific regime, dk−1(xk−1) ∈ Ψk−1(hk ) and thus
{xk−1, dk−1(xk−1)} ∈ Λk−1
• Because (5.21) > 0,
P{X*
k (dk−1) = xk | X*
k−1(dk−2) = xk−1} > 0 =⇒ {xk , dk−1(xk−1)} ∈ Γk
completing the induction proof
269
g-Computation algorithm
General result: Compactly stated
pX1,X*
2(d1),X*
3(d2),...,X*
K
(dK−1),Y*(d)(x1, . . . , xK , y) (5.24)
= pY|X ,A{y|x, dK (x)}
K
k=2
pXk |Xk−1,Ak−1
{xk |xk−1, dk−1(xk−1)} pX1
(x1)
which implies, for example,
pY*(d)(y) =
X
pY|X ,A{y|x, d(x)} (5.25)
×
K
k=2
pXk |Xk−1,Ak−1
{xk |xk−1, dk−1(xk−1)} pX1
(x1) dνK (xK ) · · · dν1(x1)
• dνK (xK ) · · · dν1(x1) is the dominating measure
270
g-Computation algorithm
Or the value
E{Y*
(d)} =
X
E{Y|X = x, A = d(x)} (5.26)
×
K
k=2
pXk |Xk−1,Ak−1
{xk |xk−1, dk−1(xk−1)} pX1
(x1) dνK (xK ) · · · dν1(x1)
• Thus, the value V(d) = E{Y*
(d)} of a regime d ∈ D can be expressed
in terms of the observed data
• So it should be possible to estimate V(d) from these data
• As well as to estimate V(dopt
) (later). . .
271
5. Multiple Decision Treatment Regimes: Framework and
Fundamentals
5.1 Multiple Decision Treatment Regimes
5.2 Statistical Framework
5.3 The g-Computation Algorithm
5.4 Estimation of the Value of a Fixed Regime
5.5 Key References
272
Fixed regime d ∈ D
Of interest: Estimation of the value V(d) = E{Y*(d)} of a given, or
fixed, Ψ-specific regime d ∈ D
• In its own right
• As a stepping stone to estimation of an optimal regime dopt
• We consider several methods
• Throughout, take SUTVA (5.10), SRA (5.11), and positivity
assumption (5.15) to hold
273
Estimation via g-computation
In principle: From (5.25) and (5.26), can estimate pY*(d)(y) or
E{Y*(d)} for fixed d ∈ D
• Posit parametric models, e.g., for (5.25)
pY|X ,A(y|x, a; ζK+1)
pXk |Xk−1,Ak−1
(xk |xk−1, ak−1; ζk ), k = 2, . . . , K
pX1
(x1; ζ1)
depending on ζ = (ζT
1 , . . . , ζT
K+1)T (or for (5.26) a model for
E(Y|X = x, A = a) instead)
• Estimate ζ by maximizing the partial likelihood
n
i=1
pY|X ,A(Yi |Xi , Ai ; ζK+1)
K
k=2
pXk |Xk−1,Ak−1
(Xki |Xk−1,i , Ak−1,i ; ζk ) pX1
(X1i ; ζ1)
in ζ to obtain ζ = (ζT
1 , . . . , ζT
K+1)T
274
Estimation via g-computation
In principle:
• Substitute the fitted models in (5.25) or (5.26)
Major obstacle: (5.25) and (5.26) involve integration over the sample
space X = X1 × · · · × XK
• Except in the simplest situations, e.g., all x1, . . . , xK discrete and
low-dimensional, the required integration is almost certainly
analytically intractable and computationally insurmountable
275
Estimation via g-computation
Monte Carlo integration: Robins (1986) proposed approximating
the distribution of Y*(d) for d ∈ D; for r = 1, . . . , M, simulate a
realization from the distribution of Y*(d) as follows
1. Generate random x1r from pX1
(x1; ζ1)
2. Generate random x2r from pX2|X1,A1
{x2|x1r , d1(x1r ); ζ2}
3. Continue in this fashion, generating random xkr from
pXk |Xk−1,Ak−1
{xk |xk−1,r , dk−1(xk−1,r ); ζk }, k = 3, . . . , K
4. Generate random yr from pY|X ,A{y|xr , dK (xr ); ζK+1}
y1, . . . , yM are a sample from the fitted distribution of Y*(d)
Estimator for V(d) = E{Y*(d)}: VGC(d) = M−1
M
r=1
yr
276
Estimation via g-computation
Practical challenges:
• Development of models can be daunting due to high
dimension/complexity of x1, . . . , xK
• Although specifying pY|X ,A(y|x, a; ζK+1) may be feasible for
univariate Y, models
pXk |Xk−1,Ak−1
(xk |xk−1, ak−1; ζk ), k = 2, . . . , K, pX1
(x1; ζ1)
for multivariate Xk are more challenging to specify
• E.g., Xk = (XT
k1, XT
k2)T continuous/discrete, can factor as
pXk1|Xk2,Xk−1,Ak−1
(xk1|xk2, xk−1, ak−1; ζk1)
× pXk2|Xk−1,Ak−1
(xk2|xk−1, ak−1; ζk2)
or pXk2|Xk1,Xk−1,Ak−1
(xk2|xk1, xk−1, ak−1; ζk2)
× pXk1|Xk−1,Ak−1
(xk2|xk−1, ak−1; ζk1)
277
Estimation via g-computation
Practical challenges, continued:
• Moreover, simulation from such models can be demanding
• Analytical derivation of approximate standard errors is not
straightforward (frankly daunting!); use of a nonparametric
bootstrap has been advocated, which is clearly highly
computationally intensive
Bottom line: Estimation of V(d) via the g-computation is not
commonplace in practice
• The main usefulness of g-computation is as a demonstration that
it is possible to identify and estimate V(d) from observed data
under SUTVA, SRA, and positivity
• In principle, a possible approach to estimating dopt ∈ D is to
maximize VGC(d) over all d ∈ D; clearly, this would be a
formidable computational challenge (and is never done in
practice)
278
Inverse probability weighted estimator
Motivation: Alternative representation of E{Y*(d)} in terms of the
observed data
(X1, A1, X2, A2, . . . , XK , AK , Y)
depending on
pA1|H1
(a1|h1) = P(A1 = a1|H1 = h1) = pA1|X1
(a1|x1)
pAk |Hk
(ak |hk ) = P(Ak = ak |Hk = hk ) = pAk |Xk ,Ak−1
(ak |xk , ak−1)
k = 2, . . . , K
Define: Evaluate at d1(X1), dk {Xk , dk−1(Xk−1)}
πd,1(X1) = pA1|X1
{d1(X1)|X1}
πd,k (Xk ) = pAk |Xk ,Ak−1
[dk {Xk , dk−1(Xk−1)}|Xk , dk−1(Xk−1)]
k = 2, . . . , K
279
Inverse probability weighted estimator
Define: Indicator of consistency of options actually received with
those selected by d at all K decisions
Cd = CdK
= I{A1 = d1(X1), . . . , AK = dK (XK )} = I{A = d(X )}
Inverse probability weighted estimator for V(d) = E{Y*(d)}:
VIPW (d) = n−1
n
i=1
Cd,iYi
K
k=2 πd,k (Xki) πd,1(X1i)
(5.27)
• (5.27) is an unbiased estimator for V(d) because
E

 Cd Y
K
k=2 πd,k (Xk ) πd,1(X1)

 = E{Y*
(d)} (5.28)
under SUTVA (5.10), SRA (5.11) and positivity assumption (5.15)
280
Inverse probability weighted estimator
Sketch of proof of (5.28): We show a more general result
E

 Cd f(X , Y)
K
k=2 πd,k (Xk ) πd,1(X1)

 = E[f{X*
K (dK−1), Y*
(d)}] (5.29)
so that (5.28) follows by taking f(x, y) = y
• Using SUTVA and (5.7)
E

 Cd f(X , Y)
K
k=2 πd,k (Xk ) πd,1(X1)

 = E

 Cd f{X*
K (dK−1), Y*
(d)}
K
k=2 πd,k {X*
k (dk−1)} πd,1(X1)


= E



E

 I{A = d(X )}f{X*
K (dK−1), Y*
(d)}
K
k=2 πd,k {X*
k (dk−1)} πd,1(X1)
X1, W*





= E

P{A = d(X )|X1, W*
}f{X*
K (dK−1), Y*
(d)}
K
k=2 πd,k {X*
k (dk−1)} πd,1(X1)

 (5.30)
281
Inverse probability weighted estimator
From (5.30): Must show
K
k=2
πd,k {X*
k (dk−1)} πd,1(X1) = P{A = d(X )|X1, W*
} > 0 (5.31)
• For (x1, w) such that P(X1 = x1, W*
= w) > 0, show
P{A = d(X )|X1 = x1, W*
= w} > 0
• There exist x2, . . . , xK such that P{X*
k (dk−1) = xk } > 0, k = 2, . . . , K,
because X*
K (dK−1) is a function of W*
• Using SUTVA
P{A = d(X )|X1 = x1, W*
= w}
= P{A1 = d1(x1)|X1 = x1, W*
= w}
×
K
k=2
P[Ak = dk {xk , dk−1(xk−1)}|Ak−1 = dk−1(xk−1), X1 = x1, W*
= w]
282
Inverse probability weighted estimator
• Must show each term in this factorization is well defined; true if
P{Ak = dk (xk ), X1 = x1, W*
= w} > 0, k = 1, . . . , K
• By induction; when k = 1
P{A1 = d1(x1), X1 = x1, W*
= w}
= P{A1 = d1(x1) | X1 = x1, W*
= w}P(X1 = x1, W*
= w)
is positive if P{A1 = d1(x1) | X1 = x1, W*
= w} > 0
• This holds because P(X1 = x1) > 0 so x1 ∈ Γ1 and d1(x1) ∈ Ψ1(x1) so
by positivity assumption
P{A1 = d1(x1) | X1 = x1, W*
= w} = P{A1 = d1(x1) | X1 = x1} > 0
• Now assume P{Ak = dk (xk ), X1 = x1, W*
= w} > 0 and show
P{Ak+1 = dk+1(xk+1), X1 = x1, W*
= w} > 0
283
Inverse probability weighted estimator
• X*
k (dk−1) includes X1, so (X1 = x1, W*
= w) and
{X*
k+1(d) = xk+1, W*
= w} are equivalent, and thus
P{Ak+1 = dk+1(xk+1), X1 = x1, W*
= w}
= P[Ak+1 = dk+1{xk+1, dk (xk )}|Ak = dk (xk ), X*
k+1(dk ) = xk+1, W*
= w]
× P{Ak = dk (xk ), X1 = x1, W*
= w}
• Must show first RHS term is > 0; by SUTVA and SRA, this term =
= P[Ak+1 = dk+1{xk+1, dk (xk )}|Xk+1 = xk+1, Ak = dk (xk ), W*
= w]
= P[Ak+1 = dk+1{xk+1, dk (xk )}|Xk+1 = xk+1, Ak = dk (xk )]
• Because {x1, d1(x1)} ∈ Λ1 and P{X*
k (dk−1) = xk } > 0, the argument
leading to (5.23) yields {xk+1, dk (xk )} ∈ Γk+1,
dk+1{xk+1, dk (xk )} ∈ Ψk+1(hk+1), so this term is > 0
284
Inverse probability weighted estimator
• Applying these results yields
P{A = d(X )|X1, W*
} = pA1|X1
{d1(X1)|X1}
×
K
k=2
pAk |Xk ,Ak−1
[dk {Xk , dk−1(Xk−1)}|Xk , dk−1(Xk−1)]
= πd,1(X1)
K
k=2
πd,k {X*
k (dk−1)}
which is the desired equality
285
Inverse probability weighted estimator
Result: (5.28) holds
E

 Cd Y
K
k=2 πd,k (Xk ) πd,1(X1)

 = E{Y*
(d)}
• VIPW (d) is an unbiased estimator for V(d)
• Alternative representation of E{Y*(d)} in terms of observed data
• Taking instead for fixed y
f{X*
K (dK−1), Y*
(d)} = I{Y*
(d) = y}
yields an alternative representation of the marginal density of
Y*(d)
286
Inverse probability weighted estimator
More generally: For fixed (x1, x2, . . . , xK , y), treating all variables as
discrete, taking
f{X*
K (dK−1), Y*
(d)}
= I{X1 = x1, X*
2(d1) = x2, . . . , X*
K (dK−1) = xK , Y*
(d) = y}
yields an alternative representation of the joint density
P{X1 = x1, X*
2(d1) = x2, . . . , X*
K (dK−1) = xK , Y*
(d) = y}
= pX1,X*
2(d1),X*
3(d2),...,X*
K
(dK−1),Y*(d)(x1, . . . , xK , y)
= E

Cd I(X1 = x1, X2 = x2, . . . , XK = xK , Y = y)
K
k=2 πd,k (Xk ) πd,1(X1)


287
Inverse probability weighted estimator
Denominator:
K
k=2
πd,k (Xk ) πd,1(X1)
• Can be interpreted as the propensity for receiving treatment
consistent with regime d through all K decisions given observed
history
• Depends on the propensities of treatment given observed history
pAk |Hk
(ak |hk ) = P(Ak = ak | Hk = hk ), k = 1, . . . , K
• In practice: Posit and fit models for the propensities depending
on parameters γk and estimators γk , k = 1, . . . , K;
considerations for this momentarily
• These models induce models πd,1(X1; γ1) and πd,k (Xk ; γk )
288
Inverse probability weighted estimator
In practice: IPW estimator
VIPW (d) = n−1
n
i=1
Cd,iYi
K
k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)
(5.32)
• Consistent estimator for V(d) as long as the propensity models
are correctly specified
289
Inverse probability weighted estimator
Alternative estimator:
VIPW∗ (d) =


n
i=1
Cd,i
K
k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)


−1
×
n
i=1
Cd,iYi
K
k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)
(5.33)
• Weighted average, also a consistent estimator
• Can be considerably more precise than VIPW (d)
290
Considerations for propensity modeling
Simplest case: Two options at each decision point, feasible for all
individuals, Ak = {0, 1}, k = 1, . . . , K
• Can work with propensity scores
πk (hk ) = P(Ak = 1|Hk = hk ) = πk (xk , ak−1), k = 1, . . . , K
pA1|X1
(a1|x1) = pA1|H1
(a1|h1) = π1(h1)a1
{1 − π1(h1)}1−a1
pAk |Xk ,Ak−1
(ak |xk , ak−1) = pAk |Hk
(ak |hk ) = πk (hk )ak
{1 − πk (hk )}1−ak
k = 2, . . . , K
• Using (5.2)
πd,1(X1) = π1(X1)d1(X1)
{1 − π1(X1)}1−d1(X1)
πd,k (Xk ) = πk {Xk , dk−1(Xk−1)}dk {Xk ,dk−1(Xk−1)}
× [1 − πk {Xk , dk−1(Xk−1)}]1−dk {Xk ,dk−1(Xk−1)}
,
k = 2, . . . , K
291
Considerations for propensity modeling
• Can posit a parametric model πk (hk ; γk ) for each k; e.g., logistic
regression models as in (3.12)
πk (hk ; γk ) =
exp(γk1 + γT
k2hk )
1 + exp(γk1 + γT
k2hk )
, γk = (γk1, γT
k2)T
, k = 1, . . . , K
hk = (1, hT
k )T , k = 1, . . . , K, and fit via maximum likelihood to
obtain γk , k = 1, . . . , K
292
Considerations for propensity modeling
More than 2 options: Feasible for all individuals, Ak = {1, . . . , mk }
• As on Slide 170, with
ωk (hk , ak ) = P(Ak = ak |Hk = hk ), k = 1, . . . , K
or ωk (xk , ak−1, ak ) = P(Ak = ak |Xk = xk , Ak−1 = ak−1)
where
ωk (hk , mk ) = 1 −
mk −1
ak =1
ωk (hk , ak )
• Then
πd,1(X1) =
m1
a1=1
ω1(X1, a1)I{d1(X1)=a1}
=
m1
a1=1
I{d1(X1) = a1}ω1(X1, a1)
πd,k (Xk ) =
mk
ak =1
ωk {Xk dk−1(Xk−1), ak }I[dk {Xk ,dk−1(Xk−1)}=ak ]
,
=
mk
ak =1
I[dk {Xk , dk−1(Xk−1)} = ak ]ωk {Xk , dk−1(Xk−1), ak }
k = 2, . . . , K293
Considerations for propensity modeling
• Can posit parametric models ωk (hk , ak ; γk ) for each k e.g.,
multinomial (polytomous) logistic regression models
ωk (hk , ak ; γk ) =
exp(hT
k γk,ak
)
1 + mk −1
j=1 exp(hT
k γkj)
, ak = 1, . . . , mk − 1
hk = (1, hT
k )T , γk = (γT
k1, . . . , γT
k,mk −1)T and fit via maximum
likelihood to obtain γk , k = 1, . . . , K
294
Considerations for propensity modeling
Feasible sets: k distinct subsets, each with ≥ 2 options in Ak
Ak,l = {1, . . . , mkl}, , l = 1, . . . , k , k = 1, . . . , K
• For k = 1, . . . , K, ak ∈ {1, . . . , mkl} = Ak,l
ωk,l(hk ,ak ) = P(Ak = ak |Hk = hk )
ωk,l(hk , mkl) = 1 −
mkl −1
ak =1
ωk,l(hk , ak )
• For k = 1, . . . , K, can posit k separate logistic or multinomial
(polytomous) logistic regression models
ωk,l(hk , ak ; γkl), l = 1, . . . , k
295
Considerations for propensity modeling
• For each k = 1, . . . , K, implies an overall model
ωk (hk , ak ; γk ) =
k
l=1
I{sk (hk ) = l} ωk,l(hk , ak ; γkl), γk = (γT
k1, . . . , γT
k k
)T
understood that ak takes values in the relevant distinct subset
• For subsets with a single option, P(Ak = ak | Hk = hk ) = 1, and
no model is needed
• Induced models
πd,1(X1; γ1) =
1
l=1
I{s1(h1) = l}
m1l
a1=1
ω1,l (X1, a1; γ1l )I{d1(X1)=a1}
πd,k (Xk ; γk ) =
k
l=1
I{sk (hk ) = l}
×
mkl
ak =1
ωk,l {Xk , dk−1(Xk−1), ak ; γkl }I[dk {Xk ,dk−1(Xk−1)}=ak )]
296
Considerations for propensity modeling
In a SMART: Propensities are known
• As on Slide 90, preferable to estimate the propensities
• Estimate P(Ak = ak | Hk = hk ) by sample proportions corresponding to
each distinct subset Ak,l
• Example: Acute leukemia, 2 = 2, r2 component of h2 indicating
response
Ψ2(h2) = {M1, M2} = A2,1, r2 = 1
= {S1, S2} = A2,2, r2 = 0
• Estimate P(A2 = a2 | H2 = h2) for a2 ∈ A2,1 by
n
i=1
I(R2i = 1)
−1 n
i=1
I(R2i = 1)I(A2i = a2)
• Estimate P(A2 = a2 | H2 = h2) for a2 ∈ A2,2 by
n
i=1
I(R2i = 0)
−1 n
i=1
I(R2i = 0)I(A2i = a2)
297
Inverse probability weighted estimator
Equivalent representation: For both VIPW (d) and VIPW∗(d)
• When Cd = 1, A1 = d1(X1), and Ak = dk {Xk , dk−1(Xk−1)},
k = 2, . . . , K, and it is straightforward that the denominator
K
k=2
πd,k (Xki; γk ) πd,1(X1i; γ1)
can be replaced by the fitted model for
K
k=1
pAk |Hk
(Aki|Hki)
without altering the value of either estimator
298
Inverse probability weighted estimator
• E.g., with 2 feasible options at each decision point, replace the
denominator by
K
k=1
πk (Hki; γk )Aki {1 − πk (Hki; γk )}1−Aki
• With more than 2 options, by
K
k=1
ωk (Hki, Aki; γk )
• In some literature accounts, the estimators are defined with
these quantities in the denominators
• This will be important later
299
Inverse probability weighted estimator
Remarks:
• VIPW (d) and VIPW∗(d) are consistent estimators as long as the
propensity models are correctly specified; can be inconsistent
otherwise
• Except for estimation of propensities, these estimators use data
only from subjects for whom A = d(X )
• For larger K, there are likely very few such subjects
• These estimators can be unstable in finite samples due to
division by propensity of treatment consistent with d small and
can exhibit large sampling variation
• As in the single decision case, even if the propensities are
known, it is preferable to estimate them via maximum likelihood
• Large sample approximations to the sampling distributions of
VIPW (d) and VIPW∗(d) follow from M-estimation theory
300
Augmented inverse probability weighted estimator
Drop out analogy: For individuals for whom A = d(X )
• Analogy to a monotone coarsening problem; i.e., “drop out”
• Under SUTVA, an individual with
A1 = d1(X1), . . . , Ak−1 = dk−1{Xk−1, dk−2(Xk−2)}
but for whom
Ak = dk {Xk , dk−1(Xk−1)}
has X2 = X*
2(d1), . . . , Xk = X*
k (dk−1), i.e., Xk = X*
k (dk−1)
• But his Xk+1, . . . , XK and Y do not reflect the potential outcomes
he would have if he had continued to follow rules dk , . . . , dK
• Effectively, in terms of receiving treatment options consistent
with d, the individual has “dropped out” at Decision k
301
Augmented inverse probability weighted estimator
Improvement: From the perspective of information on
{X1, X*
K (dK−1), Y*(d)} and especially Y*(d)
• X*
k (dk−1) is observed, but X*
k+1(dk ), . . . , X*
K (dK−1), Y*(d) are
“missing”
• Can we exploit the partial information from such individuals to
gain efficiency?
Dropout analogy: Under SUTVA, SRA, and positivity, this “drop out”
is according to a missing (coarsening) at random mechanism (shown
by Zhang et al., 2013), allowing semiparametric theory for monotone
coarsening at random to be used (Robins et al., 1994; Tsiatis, 2006)
302
Augmented inverse probability weighted estimator
Define:
• Indicator of treatment options consistent with d through Decision
k
Cdk
= I{Ak = dk (Xk )}, k = 1, . . . , K
with Cd0
≡ 1
• For brevity, write πd,1(X1) = πd,1(X1) and
πd,k (Xk ) =



k
j=2
πd,j(Xj)



πd,1(X1), k = 2, . . . , K
with πd,0 ≡ 1
• Substitution of propensity models leads to models
πd,k (Xk ; γk ), k = 1, . . . , K
γk = (γT
1 , . . . , γT
k )T , k = 1, . . . , K
303
Augmented inverse probability weighted estimator
Analogous to AIPW estimator for single decision: Under these
conditions, if the models for the propensities
pAk |Hk
(ak |hk ), k = 1, . . . , K
are correctly specified, from semiparametric theory, all consistent and
asymptotically normal estimators for V(d) for fixed d ∈ D are
asymptotically equivalent to an estimator of the form
VAIPW (d) = n−1
n
i=1


Cd,iYi
K
k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)
(5.34)
+
K
k=1
Cdk−1,i
πd,k−1(Xk−1,i; γk−1)
−
Cdk ,i
πd,k (Xki, γk )
Lk (Xki)


• Lk (xk ) are arbitrary functions of xk , k = 1, . . . , K
• γk = (γT
1 , . . . , γT
k )T , k = 1, . . . , K
304
Augmented inverse probability weighted estimator
Features of VAIPW (d):
• Taking Lk (xk ) ≡ 0, k = 1, . . . , K, yields VIPW (d) in (5.32)
• When K = 1, because Cd0
≡ 1, πd,0 ≡ 1, and X1 = H1, (5.34)
reduces to the AIPW estimator (3.16) for the single decision case
• If all K propensity models are correctly specified, so there are
true values γk,0 of γk , k = 1, . . . , K, the “augmentation term” in
(5.34) evaluated at γk,0, k = 1, . . . , K, converges in probability to
zero for arbitrary Lk (xk ), k = 1, . . . , K
• Thus, (5.34) is a consistent estimator for V(d) with asymptotic
variance depending on the choice of Lk (xk )
305
Augmented inverse probability weighted estimator
Efficient estimator: From semiparametric theory, the estimator with
smallest asymptotic variance among those in the class (5.34) takes
Lk (xk ) = E{Y*
(d) | X*
k (dk−1) = xk }, k = 1, . . . , K (5.35)
• The conditional expectations in (5.35) are functionals of the
distribution of the potential outcomes
{X1, X*
2(d1), . . . , X*
k (dk−1), Y*(d)} and are unknown in practice
• As in the single decision case, posit and fit models
Qd,k (xk ; βk ), k = 1, . . . , K, (5.36)
for E{Y*(d) | X*
k (dk−1) = xk }, k = 1, . . . , K, and substitute in
(5.34)
• We present approaches to developing and fitting models (5.36)
when we discuss optimal regimes later
306
Augmented inverse probability weighted estimator
Result: Given estimators βk , k = 1, . . . , K, obtained as we discuss
later, the AIPW estimator is
VAIPW (d) = n−1
n
i=1


Cd,iYi
K
k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)
(5.37)
+
K
k=1
Cdk−1,i
πd,k−1(Xk−1,i; γk−1)
−
Cdk ,i
πd,k (Xk,i, γk )
Qd,k (Xki; βk )


• VAIPW (d) in (5.37) is doubly robust; i.e., is consistent if either (1)
the models for the propensities and thus for πd,1(x1) and
πd,k (xk ), k = 2, . . . , K, or (2) the models for for
E{Y*(d) | X*
k (dk ) = xk } in (5.36) are correctly specified
• If the observed data are from a SMART, the propensities are
known, and VAIPW (d) is guaranteed to be consistent regardless
of the models (5.36)
307
Augmented inverse probability weighted estimator
Efficient estimator: If all models are correct, VAIPW (d) is efficient
among estimators in class (5.37), achieving the smallest asymptotic
variance
Remarks:
• VAIPW (d) can exhibit considerably less sampling variation than
the simple IPW estimators (5.32) and (5.33)
• As for VIPW (d) and VIPW∗(d), large sample approximate
sampling distribution can be obtained via M-estimation theory
(although pretty involved)
• Zhang et al. (2013) propose (5.37) in a different but equivalent
form following directly from that in Tsiatis (2006)
308
Estimation via marginal structural models
Alternative approach: When scientific interest focuses on regimes
with simple rules that can be represented in terms of a
low-dimensional parameter η
• Formally, restrict to a subset Dη ⊂ D with elements
dη = {d1(h1; η1), . . . , dK (hK ; ηK )}, η = (ηT
1 , . . . , ηT
K )T
• Goal: Estimate the value of a fixed regime dη∗ ∈ Dη
corresponding to a particular η∗, V(dη∗ ) = E{Y*(dη∗ )}
• As in the example coming next, Dη may be even simpler, with
η = η1 = · · · = ηK
309
Estimation via marginal structural models
Example: Treatment of HIV-infected patients
• K monthly clinic visits, decision on whether or not to administer
antiretrovial (ARV) therapy for the next month based on CD4
T-cell count (cells/mm3) (larger is better)
• Two feasible options: 1 = ARV therapy for the next month or 0 =
no ARV therapy for the next month
• Focus on rules involving a common CD4 threshold η below
which ARV therapy is administered and above which it is not
dk (hk ; η) = I(CD4k ≤ η), k = 1, . . . , K
CD4k = CD4 T cell count (cells/mm3) immediately prior to
Decision k
• Dη comprises regimes dη with rules of this form
• Final outcome: Y*(dη) = negative viral load (viral RNA
copies/mL) measured 1 month after Decision K if administered
ARV therapy using rules in dη
310
Estimation via marginal structural models
Focus: Estimation of V(dη∗ ) for particular threshold η∗
• Ideal: Data from a study where HIV patients were given ARV
therapy according to dη∗ and viral load was ascertained after
Decision K
• More likely: Available data are observational, with information
Xk , k = 1, . . . , K, including CD4k , options Ak actually received,
and −final viral load Y recorded
• In these data, there are likely very few if any individuals who
received ARV therapy according to the rules in dη∗
• An approach to using these data to estimate V(dη∗ ) is suggested
by work of Orellana et al. (2010ab)
311
Estimation via marginal structural models
Marginal structural model: A model for V(dη) as a function of η
• I.e., V(dη) = µ(η) for some function µ( · ), posit a parametric
model
V(dη) = E{Y*
(dη)} = µ(η; α) (5.38)
referred to as a marginal structural model (MSM)
• For example, a quadratic model
µ(η; α) = α1 + α2η + α3η2
, α = (α1, α2, α3)T
• Estimate α based on the data by an appopriate method to obtain
α, and estimate V(dη∗ ) by
VMSM(dη∗ ) = µ(η∗
; α)
• η plays the role of “covariate” and µ(η∗; α) is the “predicted
value” at the particular value η∗ of interest
312
Estimation via marginal structural models
Hope: The MSM is a correct specification of the true relationship
µ(η) across a plausible range of thresholds of interest
Estimation of α: Ideally, data from a prospective randomized study
• Each subject i = 1, . . . , n is randomized to one of m
predetermined thresholds η(j), j = 1, . . . , m, in the range of
interest and receives ARV according to dη(j)
• Natural estimator α solves in α the estimating equation
n
i=1
m
j=1
Cdη(j)
,i
∂µ(η(j); α)
∂α
w(η(j)){Yi − µ(η(j); α)} = 0 (5.39)
where w(η) is a weight function, Cdη
= I{A = dη(X )}
• w(η) ≡ 1 yields OLS estimation
• Each subject i has treatment experience consistent exactly one
of the η(j), j = 1, . . . , m
• (5.39) is an unbiased estimating equation if µ(η; α) is correct
313
Estimation via marginal structural models
Observtional data: Approach is motivated by (5.39)
• Some individuals have treatment experience consistent with ≥ 1
values of η, e.g., with K = 2
Experience Consistent with
CD41=300, A1 = 1, CD42=400, A2 = 0 η ∈ (300, 400)
CD41=300, A1 = 1, CD42=400, A2 = 1 η > 400
CD41=300, A1 = 0, CD42=400, A2 = 0 η < 300
• Others have treatment experience consistent with no value of η
Experience Consistent with
CD41=300, A1 = 0, CD42=400, A2 = 1 no η
314
Estimation via marginal structural models
Observational data: Estimator α is solution to
n
i=1 Dη


Cdη,i
K
k=2 πdη,k (Xki; γk ) πdη,1(X1i; γ1)
×
∂µ(η; α)
∂α
w(η){Yi − µ(η; α)}

 dν(dη) = 0
• dν(dη) is an appropriate dominating measure on dη ∈ Dη
• w(η) is a weight function
• A given individual i can contribute information on multiple
thresholds or none at all
• As in the IPW estimators, each of her contributions is weighted
by the reciprocal of an estimator for the propensity of receiving
treatment consistent with dη given observed history
315
Estimation via marginal structural models
For example: Interest in η ∈ [100, 500], for j = 1, . . . , m, partition
η(j) = 100 + 400(j − 1)/(m − 1), dν(dη) places point mass on η(j)
n
i=1
m
j=1


Cdη(j)
,i
K
k=2 πη(j),k (Xki; γk ) πη(j),1(X1i; γ1)
×
∂µ(η(j); α)
∂α
w(η(j)){Yi − µ(η(j); α)}

 = 0
• Under SUTVA, SRA, and positivity, if µ(η; α) and the propensity
models are correctly specified, these estimating equations can
be shown to be unbiased, and α is an M-estimator
• Here, if there is a sufficient # of individuals who received
treatment consistent with at least one η(j), j = 1, . . . , m, α should
be a reasonable estimator in practice
• An augmented version of the estimating equation is possible
316
5. Multiple Decision Treatment Regimes: Framework and
Fundamentals
5.1 Multiple Decision Treatment Regimes
5.2 Statistical Framework
5.3 The g-Computation Algorithm
5.4 Estimation of the Value of a Fixed Regime
5.5 Key References
317
References
Orellana, L., Rotnitzky, A., and Robins, J. M. (2010a). Dynamic
regime marginal structural mean models for estimation of optimal
dynamic treatment regimes, part I: Main content. The International
Journal of Biostatistics, 6.
Orellana, L., Rotnitzky, A., and Robins, J. M. (2010b). Dynamic
regime marginal structural mean models for estimation of optimal
dynamic treatment regimes, part II: Proofs and additional results. The
International Journal of Biostatistics, 6.
Robins, J. (1986). A new approach to causal inference in mortality
studies with sustained exposure periods - application to control of the
healthy worker survivor effect. Mathematical Modelling, 7,
1393–1512.
Robins, J. (1987). Addendum to: A new approach to causal inference
in mortality studies with sustained exposure periods - application to
control of the healthy worker survivor effect. Computers and
Mathematics with Applications, 14, 923–945.318
References
Robins, J. M. (2004). Optimal structural nested models for optimal
sequential decisions. In Lin, D. Y. and Heagerty, P., editors,
Proceedings of the Second Seattle Symposium on Biostatistics,
pages 189–326, New York. Springer.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013).
Robust estimation of optimal dynamic treatment regimes for
sequential treatment decisions. Biometrika, 100, 681–694.
Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimating
individual treatment rules using outcome weighted learning. Journal
of the American Statistical Association, 107, 1106–1118.
319

More Related Content

What's hot

GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016Pablo Ginestet
 
Elementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions ManualElementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions Manualzuxigytix
 
Initial value problems
Initial value problemsInitial value problems
Initial value problemsAli Jan Hasan
 
Ma 104 differential equations
Ma 104 differential equationsMa 104 differential equations
Ma 104 differential equationsarvindpt1
 
Expectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionExpectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionLoc Nguyen
 
Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysisVARUN KUMAR
 
Soạn thảo văn bản bằng LATEX
Soạn thảo văn bản bằng LATEXSoạn thảo văn bản bằng LATEX
Soạn thảo văn bản bằng LATEXHuỳnh Lâm
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultPiyush Agarwal
 
OR Linear Programming
OR Linear ProgrammingOR Linear Programming
OR Linear Programmingchaitu87
 
AA Section 6-4
AA Section 6-4AA Section 6-4
AA Section 6-4Jimbo Lamb
 
Nonlinear perturbed difference equations
Nonlinear perturbed difference equationsNonlinear perturbed difference equations
Nonlinear perturbed difference equationsTahia ZERIZER
 
Quadratic equations / Alge
Quadratic equations / AlgeQuadratic equations / Alge
Quadratic equations / Algeindianeducation
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questionsGeorgeGidudu
 
Maths Notes - Differential Equations
Maths Notes - Differential EquationsMaths Notes - Differential Equations
Maths Notes - Differential EquationsJames McMurray
 

What's hot (20)

GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
Tokyo conference
Tokyo conferenceTokyo conference
Tokyo conference
 
Elementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions ManualElementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions Manual
 
Review of probability calculus
Review of probability calculusReview of probability calculus
Review of probability calculus
 
MDEs presentation
MDEs presentationMDEs presentation
MDEs presentation
 
E028047054
E028047054E028047054
E028047054
 
Initial value problems
Initial value problemsInitial value problems
Initial value problems
 
Ma 104 differential equations
Ma 104 differential equationsMa 104 differential equations
Ma 104 differential equations
 
Expectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionExpectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial Assumption
 
Normal density and discreminant analysis
Normal density and discreminant analysisNormal density and discreminant analysis
Normal density and discreminant analysis
 
Soạn thảo văn bản bằng LATEX
Soạn thảo văn bản bằng LATEXSoạn thảo văn bản bằng LATEX
Soạn thảo văn bản bằng LATEX
 
Section2 stochastic
Section2 stochasticSection2 stochastic
Section2 stochastic
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
OR Linear Programming
OR Linear ProgrammingOR Linear Programming
OR Linear Programming
 
AA Section 6-4
AA Section 6-4AA Section 6-4
AA Section 6-4
 
Nonlinear perturbed difference equations
Nonlinear perturbed difference equationsNonlinear perturbed difference equations
Nonlinear perturbed difference equations
 
Quadratic equations / Alge
Quadratic equations / AlgeQuadratic equations / Alge
Quadratic equations / Alge
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questions
 
Maths Notes - Differential Equations
Maths Notes - Differential EquationsMaths Notes - Differential Equations
Maths Notes - Differential Equations
 

Similar to 2019 PMED Spring Course - Multiple Decision Treatment Regimes: Fundamentals - Marie Davidian, February 13, 2019

Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Leo Asselborn
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsLeo Asselborn
 
Output Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxOutput Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxNaveenaKarthik3
 
Runge-Kutta methods with examples
Runge-Kutta methods with examplesRunge-Kutta methods with examples
Runge-Kutta methods with examplesSajjad Hossain
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsWen-Ting Wang
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsFrank Nielsen
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential slides
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017Fred J. Hickernell
 
L5 determination of natural frequency & mode shape
L5 determination of natural frequency & mode shapeL5 determination of natural frequency & mode shape
L5 determination of natural frequency & mode shapeSam Alalimi
 
Solutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by AntonSolutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by AntonPamelaew
 
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions ManualCalculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions Manualnodyligomi
 

Similar to 2019 PMED Spring Course - Multiple Decision Treatment Regimes: Fundamentals - Marie Davidian, February 13, 2019 (20)

Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance Constraints
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
Output Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxOutput Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptx
 
Runge-Kutta methods with examples
Runge-Kutta methods with examplesRunge-Kutta methods with examples
Runge-Kutta methods with examples
 
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
 
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial Patterns
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histograms
 
CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)
 
Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential Solution to schrodinger equation with dirac comb potential
Solution to schrodinger equation with dirac comb potential
 
main
mainmain
main
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Introduction to Sparse Methods
Introduction to Sparse Methods Introduction to Sparse Methods
Introduction to Sparse Methods
 
lecture4.ppt
lecture4.pptlecture4.ppt
lecture4.ppt
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
 
L5 determination of natural frequency & mode shape
L5 determination of natural frequency & mode shapeL5 determination of natural frequency & mode shape
L5 determination of natural frequency & mode shape
 
Solutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by AntonSolutions Manual for Calculus Early Transcendentals 10th Edition by Anton
Solutions Manual for Calculus Early Transcendentals 10th Edition by Anton
 
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions ManualCalculus Early Transcendentals 10th Edition Anton Solutions Manual
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

2019 PMED Spring Course - Multiple Decision Treatment Regimes: Fundamentals - Marie Davidian, February 13, 2019

  • 1. 5. Multiple Decision Treatment Regimes: Framework and Fundamentals 5.1 Multiple Decision Treatment Regimes 5.2 Statistical Framework 5.3 The g-Computation Algorithm 5.4 Estimation of the Value of a Fixed Regime 5.5 Key References 216
  • 2. Recall: Acute Leukemia • At baseline: Information x1, history h1 = x1 ∈ H1 • Decision 1: Set of options A1 = {C1,C2}; rule 1: d1(h1): H1 → A1 • Between Decisions 1 and 2: Collect additional information x2, including responder status • Accrued information/history h2 = (x1, therapy at decision 1, x2) ∈ H2 • Decision 2: Set of options A2 = {M1,M2,S1,S2}; rule 2: d2(h2): H2 → {M1,M2} (responder), d2(h2): H2 → {S1,S2} (nonresponder) • Treatment regime : d = {d1(h1), d2(h2)} = (d1, d2) 217
  • 3. K decision treatment regime In general: K decision points/stages • Baseline information x1 ∈ X1, intermediate information xk ∈ Xk between Decisions k − 1 and k, k = 2, . . . , K • Treatment options Ak at Decision k, elements ak ∈ Ak , k = 1, . . . , K • Accrued information or history h1 = x1 ∈ H1 hk = (x1, a1, . . . , xk−1, ak−1, xk ) ∈ Hk , k = 2, . . . , K, • Decision rules d1(h1), d2(h2), . . . , dK (hK ), dk : Hk → Ak • Treatment regime d = {d1(h1), . . . , dK (hK )} = (d1, d2, . . . , dK ) 218
  • 4. Notation Overbar notation: With xk ∈ Xk , ak ∈ Ak , k = 1, . . . , K it is convenient to define for k = 1, . . . , K xk = (x1, . . . , xk ), ak = (a1, . . . , ak ) Xk = X1 × · · · × Xk , Ak = A1 × · · · × Ak • Conventions: a = aK , x = xK , a0 is null • h1 = x1, hk = (xk , ak−1), k = 2, . . . , K • H1 = X1, Hk = Xk × Ak−1, k = 2, . . . , K • dk = (d1, . . . , dk ), k = 1, . . . , K, d = dK = (d1, . . . , dK ) 219
  • 5. Decision points Decision points depend on the context: For example • Acute leukemia: Decisions are at milestones in the disease progression (diagnosis, evaluation of response) • HIV infection: Decisions are according to a schedule (at monthly clinic visits) • Cardiovascular disease (CVD): Decisions are made upon occurrence of an event (myocardial infarction) Perspective: • Timing of decisions can be fixed (HIV) or random (leukemia) • If any individual would reach all K decision points, the distinction is not important; we assume this going forward • If different individuals can experience different numbers of decisions (CVD), the number of decisions reached is random, and a more specialized framework is required • The latter also is the case if the outcome is a time to an event 220
  • 6. Example: K = 2, acute leukemia Decision 1: A1 = {C1,C2}, x1 = h1 includes age (years), baseline white blood cell count WBC1 (×103/µl) • Example rule: d1(h1) = C2 I(C) + C1 I(Cc ), C = {age < 50, WBC1 < 10} Decision 2: A2 = {M1, M2, S1, S2} • x2 and thus h2 includes Decision 2 WBC2; ECOG, EVENT (≥ grade 3 adverse event from induction therapy), RESP • Example rule: d2(h2) = I(RESP = 1){M1 I(M) + M2 I(Mc )} + I(RESP = 0){S1 I(S) + S2 I(Sc )} (5.1) M = {WBC1 < 11.2, WBC2 < 10.5, EVENT = 0, ECOG ≤ 2} S = {age > 60, WBC2 < 11.0, ECOG ≥ 2} 221
  • 7. Recursive representation of rules Fact: If the K rules in d ∈ D are followed by an individual, the options selected at each decision depend only on the evolving x1, . . . , xK • Decision 1: Option selected is d1(h1) = d1(x1) • Between Decisions 1 and 2: x2 • Decision 2: Rule d2(h2) = d2(x2, a1), option selected depends on the option selected at Decision 1 d2{x2, d1(x1)} • Between Decisions 2 and 3: x3 • Decision 3: Rule d3(h3) = d3(x3, a2) = d3(x3, a1, a2), option selected depends on those at Decisions 1 and 2 d3[x3, d1(x1), d2{x2, d1(x1)}] • And so on. . . 222
  • 8. Recursive representation of rules Concise representation: For k = 2, . . . , K d2(x2) = [d1(x1), d2{x2, d1(x1)}] d3(x3) = [d1(x1), d2{x2, d1(x1)}, d3{x3, d2(x2)}] ... (5.2) dK (xK ) = [d1(x1), d2{x2, d1(x1)}, . . . , dK {xK , dK−1(xK−1)}] • dk (xk ) comprises the options selected through Decision k • d(x) = dK (xK ) • This representation will be useful later 223
  • 9. Redundancy From the perspective of an individual following the K rules in d: • The definition of rules dk (hk ) = dk (xk , ak−1) is redundant • a1 is determined by x1, a2 is determined by x2 = (x1, x2), . . . • But definition of dk as a function of hk = (xk , ak−1) is useful for characterizing and estimating an optimal regime later Illustration of redundancy: K = 2, A1 = {0, 1}, A2 = {0, 1}, X1 = {0, 1}, X2 = {0, 1} (x1 and x2 are binary) • d = (d1, d2) ∈ D, d1 : H1 = X1 → A1, d2 : H2 = X2 × A1 → A2 • For each value of h1 = x1, d1 must return a value in A1, e.g., d1(x1 = 0) = 0, d1(x1 = 1) = 1 224
  • 10. Redundancy Illustration of redundancy, continued: • For each of the 23 = 8 possible values of h2 = (x1, x2, a1), d2 must return a value in A2, e.g., d2(x1 = 0, x2 = 0, a1 = 0) = 0 d2(x1 = 0, x2 = 0, a1 = 1) = 1∗ d2(x1 = 0, x2 = 1, a1 = 0) = 1 d2(x1 = 0, x2 = 1, a1 = 1) = 1∗ d2(x1 = 1, x2 = 0, a1 = 0) = 0∗ d2(x1 = 1, x2 = 0, a1 = 1) = 1 d2(x1 = 1, x2 = 1, a1 = 0) = 0∗ d2(x1 = 1, x2 = 1, a1 = 1) = 0 • Configurations with ∗ could never occur if an individual followed regime d 225
  • 11. 5. Multiple Decision Treatment Regimes: Framework and Fundamentals 5.1 Multiple Decision Treatment Regimes 5.2 Statistical Framework 5.3 The g-Computation Algorithm 5.4 Estimation of the Value of a Fixed Regime 5.5 Key References 226
  • 12. Outcome of interest Determination of outcome: • Can be ascertained after Decision K, e.g., for HIV infected patients, outcome = viral load (viral RNA copies/mL) measured at a final clinic visit after Decision K • Can be defined using intervening information, e.g., for HIV infected patients with CD4 count (cells/mm3) measured at each clinic visit (Decisions k = 2, . . . , K) and at a final clinic visit after Decision K outcome = total # CD4 counts > 200 cells/mm3 Convention: As for single decision case, assume larger outcomes are preferred • E.g., because smaller viral load is better, take outcome = −viral load 227
  • 13. Potential outcomes for K decisions Intuitively: For a randomly chosen individual with history X1 • If he/she were to receive a1 ∈ A1 at Decision 1, the evolution of his/her disease/disorder process after Decision 1 would be be influenced by a1 • Suggests: X* 2(a1) = intervening information that would arise between Decisions 1 and 2 after receiving a1 ∈ A1 at Decision 1 • If he/she then were to receive a2 ∈ A2 at Decision 2, the evolution of his/her disease/disorder process after Decision 2 would be influenced by a1 followed by a2 • Suggests: X* 3(a2) = intervening information that would arise between Decisions 2 and 3 after receiving a1 ∈ A1 at Decision 1 and a2 at Decision 2 • And so on. . . 228
  • 14. Potential outcomes for K decisions Ultimately: If he/she were to receive options a = aK = (a1, . . . , aK ) at Decisions 1,. . . ,K Y* (aK ) = Y* (a) = outcome that would be achieved Summarizing: Potential information at each decision and potential outcome if an individual were to receive a = (a1, . . . , aK ) {X1, X* 2(a1), X* 3(a2), . . . , X* K (aK−1), Y* (a)} • X1 may or may not be included • Set of all possible potential outcomes for all a ∈ A W* = X* 2(a1), X* 3(a2), . . . , X* K (aK−1), Y* (a), for a1 ∈ A1, a2 ∈ A2, . . . , aK−1 ∈ AK−1, a ∈ A (5.3) 229
  • 15. Feasible sets and classes of treatment regimes Example: Acute leukemia, K = 2 A1 = {C1,C2}, A2 = {M1, M2, S1, S2} • At Decision 1, both options are feasible for all patients • At Decision 2, only maintenance options are feasible for patients who respond, and only salvage options are feasible for patients who do not respond • Is the case almost always in multiple decision problems at decision points other than Decision 1 • Of course is also possible at Decision 1 230
  • 16. Feasible sets and classes of treatment regimes Formal specification: For any history hk = (xk , ak−1) ∈ Hk at Decision k, k = 1, . . . , K • The set of feasible treatment options at Decision k is Ψk (hk ) = Ψk (xk , ak−1) ⊆ Ak , k = 1, . . . , K (5.4) Ψ1(h1) = Ψ1(x1) ⊆ A1 (a0 null) • Ψk (hk ) is a function mapping Hk to the set of all possible subsets of Ak • Ψk (hk ) can be a strict subset of Ak or all of Ak , depending on hk • Collectively: Ψ = (Ψ1, . . . , ΨK ) 231
  • 17. Feasible sets and classes of treatment regimes Example, revisited: Acute leukemia, K = 2 A1 = {C1,C2}, A2 = {M1, M2, S1, S2} • Suppose C1 and C2 are feasible for all individuals regardless of h1 Ψ1(h1) = A1 for all h1 • If h2 indicates response Ψ2(h2) = {M1, M2} ⊂ A2 for all such h2 • If h2 indicates nonresponse Ψ2(h2) = {S1, S2} ⊂ A2 for all such h2 • More complex specifications of feasible sets that take account of additional information are of course possible 232
  • 18. Feasible sets and classes of treatment regimes Fancier example: Acute leukemia, K = 2 A1 = {C1,C2}, A2 = {M1, M2, S1, S2} • If C1 is contraindicated for patients with renal impairment Ψ1(h1) = {C2} if h1 indicates renal impairment = {C1,C2} = A1 if h1 does not • If S1 increases risk of adverse events in nonresponders with low WBC2 Ψ2(h2) = {S1} if h2 indicates nonresponse, WBC2 ≤ w = {S1, S2} if h2 indicates nonresponse, WBC2 > w for some threshold w (×103/µl) 233
  • 19. Feasible sets and classes of treatment regimes Ideally: • Specification of the feasible sets is dictated by scientific considerations • Disease/disorder context, available treatment options, patient population, etc • Specification of Ψk (hk ), k = 1, . . . , K, should incorporate only information in hk that is critical to treatment selection Regimes and feasible sets: Given specified feasible sets Ψ • A regime d = (d1, . . . , dK ) whose rules select treatment options for history hk from those in Ψk (hk ) is defined in terms of Ψ • Thus, regimes are Ψ-specific, and the relevant class of all possible regimes D depends on Ψ (suppressed in the notation) 234
  • 20. Feasible sets and classes of treatment regimes In practice: At Decision k, there is a small number k of subsets Ak,l ⊆ Ak , l = 1, . . . , k that are feasible sets for all hk • E.g, for acute leukemia, 2 = 2 A2,1 = {M1,M2}, A2,2 = {S1,S2} • If r2 is the component of h2 indicating response Ψ1(h2) = A2,1 for h2 with r2 = 1 = A2,2 for h2 with r2 = 0 Decision rules: Different rule for each subset • E.g., for acute leukemia, as in (5.1) d2(h2) = I(r2 = 1) d2,1(h2) + I(r2 = 0) d2,2(h2) d2,1(h2) is a rule selecting maintenance therapy for responders d2,2(h2) is a rule selecting salvage therapy for nonresponders 235
  • 21. Feasible sets and classes of treatment regimes In general: With k distinct subsets Ak,l ⊆ Ak , l = 1, . . . , k , as feasible sets • Define sk (hk ) = 1, . . . , k according to which of these subsets Ψk (hk ) corresponds for given hk • dk,l(hk ) is the rule corresponding to the lth subset Ak,l • Decision rule at Decision k has form dk (hk ) = k l=1 I{sk (hk ) = l} dk,l(hk ) (5.5) • Henceforth, it is understood that dk (hk ) may be expressed as in (5.5) where appropriate 236
  • 22. Feasible sets and classes of treatment regimes Note: For any Ψ-specific regime d = (d1, . . . , dK ) • At Decision k, dk (hk ) = dk (xk , ak−1) returns only options in Ψk (hk ) = Ψk (xk , ak−1), i.e., dk (hk ) = dk (xk , ak−1) ∈ Ψk (hk ) ⊆ Ak • Thus, dk need map only a subset of Hk = Xk × Ak−1 to Ak • We discuss this more shortly 237
  • 23. Potential outcomes for a fixed regime d ∈ D Intuitively: If a randomly chosen individual with history X1 were to receive treatment options by following the rules in d • Decision 1: Treatment determined by d1 • X* 2(d1) = intervening information that would arise between Decisions 1 and 2 • Decision 2: Treatment determined by d2 • X* 3(d2) = intervening information that would arise between Decisions 2 and 3 ... • X* k (dk−1) = intervening information that would arise between Decisions k − 1 and k, k = 2, . . . , K • Y*(d) = Y*(dK ) = outcome that would be achieved if all rules in d were followed Potential outcomes under regime d: {X1, X* 2(d1), X* 3(d2), . . . , X* K (dK−1), Y* (d)} (5.6) 238
  • 24. Potential outcomes for a fixed regime d ∈ D Formally: These potential outcomes are functions of W* in (5.3) • Define X* k (ak−1) = {X1, X* 2(a1), X* 3(a2), . . . , X* k (ak−1)}, k = 2, . . . , K • Then X* 2(d1) = a1∈A1 X* 2(a1)I{d1(X1) = a1} X* k (dk−1) = ak−1∈Ak−1 X* k (ak−1) k−1 j=1 I dj {X* j (aj−1), aj−1} = aj (5.7) k = 3, . . . , K Y* (d) = a∈A Y* (a) K j=1 I dj {X* j (aj−1), aj−1} = aj • Also define X* k (dk−1) = {X1, X* 2(d1), X* 3(d2), . . . , X* k (dk−1)}, k = 2, . . . , K 239
  • 25. Value of a K-decision regime With these definitions: The value of d ∈ D is V(d) = E{Y* (d)} • And an optimal regime dopt ∈ D satisfies E{Y* (dopt )} ≥ E{Y* (d)} for all d ∈ D (5.8) equivalently dopt = arg max d∈D E{Y* (d)} = arg max d∈D V(d) 240
  • 26. Optimal treatment options vs. optimal decisions For a randomly chosen individual with history H1: • The optimal sequence of treatment options for this individual is arg max a∈A Y* (a) which of course is not knowable in practice • All that is known at baseline is H1, and dopt 1 , . . . , dopt K select the (feasible) options corresponding to the largest expected outcome • From the definition of Y*(d) in (5.7), Y* (d) ≤ max a∈A Y* (a) for all d ∈ D =⇒ Y* (dopt ) ≤ max a∈A Y* (a) so an optimal regime might not select optimal options for this individual at each decision • Rather, dopt leads to the optimal sequence of decisions that can be made based on the available information on this individual 241
  • 27. Data Ideally: For i = 1, . . . , n, i.i.d. (X1i, A1i, X2i, A2i, . . . , XKi, AKi, Yi) = (XKi, AKi, Yi) = (Xi, Ai, Yi) (5.9) • X1 = baseline information at Decision 1, taking values in X1 • Ak = treatment option actually received at Decision k, k = 1, . . . , K, taking values in Ak • Xk = intervening information between Decisions k − 1 and k, k = 2, . . . , K, taking values in Xk • Xk = (X1, . . . , Xk ), X = XK = (X1, . . . , XK ), and Ak = (A1, . . . , Ak ), A = AK = (A1, . . . , AK ) • History H1 = X1, Hk = (X1, A1, . . . , Xk−1, Ak−1, Xk ) = (Xk , Ak−1), k = 2, . . . , K • Y = observed outcome (after Decision K or function of HK ) 242
  • 28. Data Data sources: • Longitudinal observational study: Retrospective from an existing database, completed conventional clinical trial with followup, prospective cohort study • Randomized study: Prospective clinical trial conducted specifically for this purpose (SMART) Longitudinal observational study: Challenges • Time-dependent confounding: A subject’s history at each decision point is both determined by past treatments and used to select future treatments • Characteristics associated with both treatment selection and future characteristics/ultimate outcome may not be captured in the data 243
  • 30. Data Do we really need data like these? Can’t we just “piece together ” an optimal regime using single decision methods on data from separate studies (with different subjects in each)? • E.g., acute leukemia: Estimate dopt 1 from a study comparing {C1,C2} and dopt 2 from separate studies comparing {M1, M2} and {S1, S2} • Delayed effects: The induction therapy with the highest proportion of responders might have other effects that render subsequent treatments less effective in regard to survival • Require data from a study involving the same subjects over the entire sequence of decisions 245
  • 31. Statistical problem Ultimate goal: Based on the data (5.9), estimate dopt ∈ D satisfying (5.8), i.e., E{Y* (dopt )} ≥ E{Y* (d)} for all d ∈ D Challenge: dopt is defined in terms of the potential outcomes (5.6) • Must be able to express this definition in terms of the observed data (5.9) • In particular, for any d ∈ D, must be able to identify the distribution of {X1, X* 2(d1), X* 3(d2), . . . , X* K (dK−1), Y* (d)} which depends on that of (X1, W*), from the distribution of (X1, A1, X2, A2, . . . , XK , AK , Y) • Possible under the following assumptions generalizing those in (3.4), (3.5), and (3.6) 246
  • 32. Identifiability assumptions SUTVA (consistency): Xk = X* k (Ak−1) = ak−1∈Ak−1 X* k (ak−1)I(Ak−1 = ak−1), k = 2, . . . , K (5.10) Y = Y* (A) = a∈A Y* (a)I(A = a) • Observed intervening information and the final observed outcome are those that would potentially be seen under the treatments actually received at each decision point • Are the same (consistent) regardless of how the treatments are administered at each decision point 247
  • 33. Identifiability assumptions Sequential randomization assumption (SRA): Robins (1986) W* ⊥⊥ Ak |Xk , Ak−1, k = 1, . . . , K, where A0 is null (5.11) equivalently W* ⊥⊥ Ak |Hk , k = 1, . . . , K • (5.11) is the strongest version; weaker versions require treatment selection ⊥⊥ only of future potential outcomes • Unverifiable from the observed data • Unlikely to hold for data from a longitudinal observational study not carried out with estimation of treatment regimes in mind • But (5.11) holds by design in a SMART 248
  • 34. Identifiability assumptions Positivity assumption: With feasible sets of treatment options at each decision point, more complicated • Intuitively: To identify the distribution of the potential outcomes (5.6) from that of the observed data (5.9), all treatment options in the feasible sets Ψk (hk ) = Ψk (xk , ak−1) must be represented in the observed data, k = 1, . . . , K • That is, there must be individuals in the data who received each of the options in Ψk (hk ), k = 1, . . . , K • E.g, acute leukemia: Decision 2: there must be responders who received each of M1 and M2 and nonresponders who received each of S1 and S2 Built up recursively. . . 249
  • 35. Identifiability assumptions Decision 1: Set of all possible baseline info h1 = x1 Γ1 = {x1 ∈ X1 satisfying P(X1 = x1) > 0} ⊆ X1 = H1 Set of all possible histories and associated options in Ψ1(h1) Λ1 = {(x1, a1) such that x1 = h1 ∈ Γ1, a1 ∈ Ψ1(h1)} All options in Λ1 must be represented in the data P(A1 = a1|H1 = h1) > 0 for all (h1, a1) ∈ Λ1 (5.12) First component of the positivity assumption 250
  • 36. Identifiability assumptions Decision 2: All possible histories h2 consistent with following a Ψ-specific regime at Decision 1 Γ2 = (x2, a1) ∈ X2 × A1 satisfying (x1, a1) ∈ Λ1 and P{X* 2(a1) = x2 | X1 = x1} > 0 ⊆ H2 Under SUTVA (5.10) and SRA (5.11), equivalently Γ2 = (x2, a1) ∈ X2 × A1 satisfying (x1, a1) ∈ Λ1 and P(X2 = x2 | X1 = x1, A1 = a1) > 0 Set of all possible histories h2 and associated options in Ψ2(h2) Λ2 = {(x2, a2) such that (x2, a1) = h2 ∈ Γ2, a2 ∈ Ψ2(h2)} 251
  • 37. Identifiability assumptions Decision 2, continued: All options in Λ2 must be represented in the data P(A2 = a2 | H2 = h2) > 0 for all (h2, a2) ∈ Λ2 Second component of the positivity assumption ... Decision k: All histories hk consistent with a Ψ-specific regime through Decision k − 1 Γk = (xk ,ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λk−1 and P{X* k (ak−1) = xk | X* k−1(ak−2) = xk−1} > 0 ⊆ Hk (5.13) = (xk ,ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λk−1 and P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0 (5.14) 252
  • 38. Identifiability assumptions Decision k, continued: Set of all possible histories hk and associated options in Ψk (hk ) Λk = {(xk , ak ) such that (xk , ak−1) = hk ∈ Γk , ak ∈ Ψk (hk )} All options in Ψk (hk ) must be represented in the data P(Ak = ak | Hk = hk ) > 0 for all (hk , ak ) ∈ Λk kth component of the positivity assumption ... 253
  • 39. Identifiability assumptions Positivity assumption: Summarizing P(Ak = ak |Hk = hk ) = P(Ak = ak |Xk = xk , Ak−1 = ak−1) > 0 for hk = (xk , ak−1) ∈ Γk and ak ∈ Ψk (hk ) = Ψk (xk , ak−1), k = 1, . . . , K (5.15) • The positivity assumption (5.15) holds in a SMART by design if there are subjects with history hk randomized to all options in Ψk (hk ) at each Decision k = 1, . . . , K • No guarantee that for a given Ψ (5.15) holds for data from a longitudinal observational study (more shortly) 254
  • 40. Identifiability assumptions Equivalence of (5.13) and (5.14): Assuming SUTVA, SRA, need to show for any hk = (xk , ak−1) ∈ Γk in (5.13) P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) = P{X* k (ak−1) = xk | X* k−1(ak−2) = xk−1} (5.16) • Proof is by induction (k = 1, 2 are immediate) • Repeated use of the following lemma Lemma. Let A and H be random variables, assume W* ⊥⊥ A|H, and consider two functions f1(W*) and f2(W*) of W*. If the event {f2(W*) = f2, H = h, A = a} has positive probability, then P{f1(W* ) = f1|f2(W* ) = f2, H = h, A = a} = P{f1(W* ) = f1|f2(W* ) = f2, H = h} 255
  • 41. Identifiability assumptions Sketch of induction proof: • We need to show for any hk = (xk , ak−1) ∈ Γk in (5.13), k = 1, . . . , K P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) = P{X* k (ak−1) = xk | X* k−1(ak−2) = xk−1} (5.16) • (5.16) is trivial for k = 1 • k = 2: (5.12) implies P(X1 = x1, A1 = a1) > 0 for (x1, a1) ∈ Λ1, so P(X2 = x2 | X1 = x1, A1 = a1) is well defined. Then by SUTVA and SRA, (5.16) holds P(X2 = x2 | X1 = x1, A1 = a1) = P{X* 2(a1) = x2 | X1 = x1, A1 = a1} = P{X* 2(a1) = x2 | X1 = x1} • For general k: Need to show P(Xk−1 = xk−1, Ak−1 = ak−1) > 0, so that P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) is well defined, and then show (5.16) 256
  • 42. Identifiability assumptions • Assume this holds for k − 1 and then show it holds for k. Thus, for hk−1 = (xk−1, ak−2) ∈ Γk−1, P(Xk−2 = xk−2, Ak−2 = ak−2) > 0 and P(Xk−1 = xk−1 | Xk−2 = xk−2, Ak−2 = ak−2) = P{X* k−1(ak−2) = xk−1 | X* k−2(ak−3) = xk−2} > 0 • It follows that P(Xk−1 = xk−1, Ak−2 = ak−2) > 0 • Because hk−1 = (xk−1, ak−2) ∈ Γk−1 and ak−1 ∈ Ψk−1(xk−1, ak−2), P(Ak−1 = ak−1 | Xk−1 = xk−1, Ak−2 = ak−2) > 0 • It then follows that P(Xk−1 = xk−1, Ak−1 = ak−1) > 0 as required • Now show (5.16) using SUTVA, SRA, and the Lemma 257
  • 43. Identifiability assumptions • By repeated use of SUTVA and SRA P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) = P{X* k (ak−1) = xk |Xk−2 = xk−2, X* k−1(ak−2) = xk−1 Ak−3 = ak−3, Ak−2 = ak−2}, (5.17) • By the Lemma with f1(W* ) = X* k (ak−1), A = Ak−2, H = (Xk−2, Ak−3), and f2(W* ) = X* k−1(ak−2), (5.17) = P{X* k (ak−1) = xk |Xk−2 = xk−2, X* k−1(ak−2) = xk−1, Ak−3 = ak−3} = P{X* k (ak−1) = xk |Xk−3 = xk−3, X* k−2(ak−3) = xk−2 X* k−1(ak−2) = xk−1, Ak−4 = ak−4, Ak−3 = ak−3} (5.18) • By the Lemma with f1(W* ) = X* k (ak−1), A = Ak−3, H = (Xk−3, Ak−4), (5.18) = P{X* k (ak−1) = xk |Xk−3 = xk−3, X* k−2(ak−3) = xk−2, X* k−1(ak−2) = xk−1, Ak−4 = ak−4} 258
  • 44. Identifiability assumptions • Continuing to apply the Lemma and SUTVA leads to (5.16), P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) = P{X* k (ak−1) = xk | X* k−1(ak−2) = xk−1} • Because hk = (xk , ak−1) ∈ Γk , the RHS is > 0 • Thus, we have shown that (5.16) holds for k and P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0 for hk = (xk , ak−1) ∈ Γk • Because this holds for k = 1, 2, the result follows by induction 259
  • 45. Identifiability assumptions More precise definition of a regime: A Ψ-specific regime d = (d1, . . . , dK ) satisfies • Each rule dk , k = 1, . . . , K, is a mapping from Γk ⊆ Hk into Ak for which dk (hk ) ∈ Ψk (hk ) for every hk ∈ Γk • The class D of Ψ-specific regimes is the set of all such d 260
  • 46. Identifiability assumptions Observational data: For given Ψ, no guarantee that all options in Ψk (hk ), k = 1, . . . , K, are represented in the data • Define for k = 1, . . . , K Γmax k = (xk , ak−1) ∈ Xk × Ak−1 satisfying (xk−1, ak−1) ∈ Λmax k−1 and P(Xk = xk | Xk−1 = xk−1, Ak−1 = ak−1) > 0 Ψmax k (hk ) ={ak ∈ Ak satisfying P(Ak = ak |Hk = hk ) > 0 for all hk = (xk , ak−1) ∈ Γmax k } Λmax k = {(xk , ak ) such that (xk , ak−1) = hk ∈ Γmax k , ak ∈ Ψmax k (hk )} • The class of regimes based on Ψmax = (Ψmax 1 , . . . , Ψmax k ) is the largest that can be considered • So must have Ψk (hk ) ⊆ Ψmax k (hk ), k = 1, . . . , K for all hk ∈ Γk ⊆ Γmax k 261
  • 47. 5. Multiple Decision Treatment Regimes: Framework and Fundamentals 5.1 Multiple Decision Treatment Regimes 5.2 Statistical Framework 5.3 The g-Computation Algorithm 5.4 Estimation of the Value of a Fixed Regime 5.5 Key References 262
  • 48. Identifiability result Goal, again: For any Ψ-specific regime d ∈ D, demonstrate that we can identify the distribution of {X1, X* 2(d1), X* 3(d2), . . . , X* K (dK−1), Y* (d)} which depends on that of (X1, W*), from the distribution of (X1, A1, X2, A2, . . . , XK , AK , Y) Recall: Recursive representation dk (xk ) of the treatment options selected by d through Decision k in (5.2) if an individual follows d, k = 2, . . . , K dk (xk ) = [d1(x1), d2{x2, d1(x1)}, . . . , dk {xk , dk−1(xk−1)}] 263
  • 49. g-Computation algorithm Main result: Under SUTVA (5.10), SRA (5.11), and positivity assumption (5.15), the joint density of the potential outcomes {X1, X* 2(d1), X* 3(d2), . . . , X* K (dK−1), Y*(d)} can be obtained as pX1,X* 2(d1),X* 3(d2),...,X* K (dK−1),Y*(d)(x1, . . . , xK , y) (5.19) = pY|X ,A{y|x, d(x)} × pXK |XK−1,AK−1 {xK |xK−1, dK−1(xK−1)} ... (5.20) × pX2|X1,A1 {x2|x1, d1(x1)} × pX1 (x1) for any realization (x1, x2, . . . , xK , y) for which (5.19) is positive • Due to Robins (1986, 1987, 2004) 264
  • 50. g-Computation algorithm Additional definition: Relevant realizations (x1, x2, . . . , xK , y) are determined by feasible sets • Assume Y*(a) and Y take values y ∈ Y • Define ΓK+1 = (x,a, y) ∈ X × A × Y satisfying (x, a) ∈ ΛK and P{Y* (a) = y | X* K (aK−1) = xK } > 0 = (x,a, y) ∈ X × A × Y satisfying (x, a) ∈ ΛK and P(Y = y | X = x, AK−1 = aK−1) > 0 • This equality can be shown similarly to that of (5.13) and (5.14) 265
  • 51. g-Computation algorithm Simplification: Take all random variables discrete, so that (5.19) and (5.20) become P{X1 = x1, X* 2(d1) = x2, . . . , X* K (dK−1) = xK , Y* (d) = y} (5.21) = P{Y = y | XK = xK , AK = dK (xK−1)} × P(XK = xK | XK−1 = xK−1, AK−1 = dK−1(xK−2)} ... (5.22) × P{X2 = x2 | X1 = x1, A1 = d1(x1)} × P(X1 = x1) for any realization (x1, x2, . . . , xK , y) such that P{X1 = x1, X* 2(d1) = x2, . . . , X* K (dK−1) = xK , Y* (d) = y} > 0 • Need to show (5.21) = (5.22) 266
  • 52. g-Computation algorithm Demonstration: Factorize (5.21) as P{X1 = x1, X* 2(d1) = x2, . . . , X* K (dK−1) = xK , Y* (d) = y} = P{Y* (d) = y | X* K (dK−1) = xK } × P{X* K (dK−1) = xK | X* K−1(dK−2) = xK−1} ... × P{X* 2(d1) = x2 | X1 = x1} × P(X1 = x1) • All components on RHS are positive because (5.21) > 0 267
  • 53. g-Computation algorithm • From (5.22), it suffices to show P{Y* (d) = y | X* K (dK−1) = xK } = P{Y = y | XK = xK , AK = dK (xK−1)} where P{Y = y | XK = xK , AK = dK (xK−1)} > 0 and for k = 2, . . . , K P{X* k (dk−1) = xk | X* k−1(dk−2) = xK−1} = P(Xk = xk | Xk−1 = xk−1, Ak−1 = dk−1(xk−2)} where P(Xk = xk | Xk−1 = xk−1, Ak−1 = dk−1(xk−2)} > 0 • These follow immediately if, for X* K+1(d) = Y*(d) {xk , dk−1(xk−1)} ∈ Γk , k = 2, . . . , K + 1 (5.23) which can be shown by induction (first show for k = 2) 268
  • 54. g-Computation algorithm Sketch of induction proof: First take k = 2 • Because P(X1 = x1) > 0, x1 ∈ Γ1 and d is a Ψ-specific regime, d1(x1) ∈ Ψ1(h1), so that {x1, d1(x1)} ∈ Λ1 • Because (5.21) > 0, P{X* 2(d1) = x2 | X1 = x1} > 0 =⇒ {x2, d1(x1)} ∈ Γ2 which is (5.23) for k = 2 • Now assume {xk−1, dk−2(xk−2)} ∈ Γk−1 is true • Because d is a Ψ-specific regime, dk−1(xk−1) ∈ Ψk−1(hk ) and thus {xk−1, dk−1(xk−1)} ∈ Λk−1 • Because (5.21) > 0, P{X* k (dk−1) = xk | X* k−1(dk−2) = xk−1} > 0 =⇒ {xk , dk−1(xk−1)} ∈ Γk completing the induction proof 269
  • 55. g-Computation algorithm General result: Compactly stated pX1,X* 2(d1),X* 3(d2),...,X* K (dK−1),Y*(d)(x1, . . . , xK , y) (5.24) = pY|X ,A{y|x, dK (x)} K k=2 pXk |Xk−1,Ak−1 {xk |xk−1, dk−1(xk−1)} pX1 (x1) which implies, for example, pY*(d)(y) = X pY|X ,A{y|x, d(x)} (5.25) × K k=2 pXk |Xk−1,Ak−1 {xk |xk−1, dk−1(xk−1)} pX1 (x1) dνK (xK ) · · · dν1(x1) • dνK (xK ) · · · dν1(x1) is the dominating measure 270
  • 56. g-Computation algorithm Or the value E{Y* (d)} = X E{Y|X = x, A = d(x)} (5.26) × K k=2 pXk |Xk−1,Ak−1 {xk |xk−1, dk−1(xk−1)} pX1 (x1) dνK (xK ) · · · dν1(x1) • Thus, the value V(d) = E{Y* (d)} of a regime d ∈ D can be expressed in terms of the observed data • So it should be possible to estimate V(d) from these data • As well as to estimate V(dopt ) (later). . . 271
  • 57. 5. Multiple Decision Treatment Regimes: Framework and Fundamentals 5.1 Multiple Decision Treatment Regimes 5.2 Statistical Framework 5.3 The g-Computation Algorithm 5.4 Estimation of the Value of a Fixed Regime 5.5 Key References 272
  • 58. Fixed regime d ∈ D Of interest: Estimation of the value V(d) = E{Y*(d)} of a given, or fixed, Ψ-specific regime d ∈ D • In its own right • As a stepping stone to estimation of an optimal regime dopt • We consider several methods • Throughout, take SUTVA (5.10), SRA (5.11), and positivity assumption (5.15) to hold 273
  • 59. Estimation via g-computation In principle: From (5.25) and (5.26), can estimate pY*(d)(y) or E{Y*(d)} for fixed d ∈ D • Posit parametric models, e.g., for (5.25) pY|X ,A(y|x, a; ζK+1) pXk |Xk−1,Ak−1 (xk |xk−1, ak−1; ζk ), k = 2, . . . , K pX1 (x1; ζ1) depending on ζ = (ζT 1 , . . . , ζT K+1)T (or for (5.26) a model for E(Y|X = x, A = a) instead) • Estimate ζ by maximizing the partial likelihood n i=1 pY|X ,A(Yi |Xi , Ai ; ζK+1) K k=2 pXk |Xk−1,Ak−1 (Xki |Xk−1,i , Ak−1,i ; ζk ) pX1 (X1i ; ζ1) in ζ to obtain ζ = (ζT 1 , . . . , ζT K+1)T 274
  • 60. Estimation via g-computation In principle: • Substitute the fitted models in (5.25) or (5.26) Major obstacle: (5.25) and (5.26) involve integration over the sample space X = X1 × · · · × XK • Except in the simplest situations, e.g., all x1, . . . , xK discrete and low-dimensional, the required integration is almost certainly analytically intractable and computationally insurmountable 275
  • 61. Estimation via g-computation Monte Carlo integration: Robins (1986) proposed approximating the distribution of Y*(d) for d ∈ D; for r = 1, . . . , M, simulate a realization from the distribution of Y*(d) as follows 1. Generate random x1r from pX1 (x1; ζ1) 2. Generate random x2r from pX2|X1,A1 {x2|x1r , d1(x1r ); ζ2} 3. Continue in this fashion, generating random xkr from pXk |Xk−1,Ak−1 {xk |xk−1,r , dk−1(xk−1,r ); ζk }, k = 3, . . . , K 4. Generate random yr from pY|X ,A{y|xr , dK (xr ); ζK+1} y1, . . . , yM are a sample from the fitted distribution of Y*(d) Estimator for V(d) = E{Y*(d)}: VGC(d) = M−1 M r=1 yr 276
  • 62. Estimation via g-computation Practical challenges: • Development of models can be daunting due to high dimension/complexity of x1, . . . , xK • Although specifying pY|X ,A(y|x, a; ζK+1) may be feasible for univariate Y, models pXk |Xk−1,Ak−1 (xk |xk−1, ak−1; ζk ), k = 2, . . . , K, pX1 (x1; ζ1) for multivariate Xk are more challenging to specify • E.g., Xk = (XT k1, XT k2)T continuous/discrete, can factor as pXk1|Xk2,Xk−1,Ak−1 (xk1|xk2, xk−1, ak−1; ζk1) × pXk2|Xk−1,Ak−1 (xk2|xk−1, ak−1; ζk2) or pXk2|Xk1,Xk−1,Ak−1 (xk2|xk1, xk−1, ak−1; ζk2) × pXk1|Xk−1,Ak−1 (xk2|xk−1, ak−1; ζk1) 277
  • 63. Estimation via g-computation Practical challenges, continued: • Moreover, simulation from such models can be demanding • Analytical derivation of approximate standard errors is not straightforward (frankly daunting!); use of a nonparametric bootstrap has been advocated, which is clearly highly computationally intensive Bottom line: Estimation of V(d) via the g-computation is not commonplace in practice • The main usefulness of g-computation is as a demonstration that it is possible to identify and estimate V(d) from observed data under SUTVA, SRA, and positivity • In principle, a possible approach to estimating dopt ∈ D is to maximize VGC(d) over all d ∈ D; clearly, this would be a formidable computational challenge (and is never done in practice) 278
  • 64. Inverse probability weighted estimator Motivation: Alternative representation of E{Y*(d)} in terms of the observed data (X1, A1, X2, A2, . . . , XK , AK , Y) depending on pA1|H1 (a1|h1) = P(A1 = a1|H1 = h1) = pA1|X1 (a1|x1) pAk |Hk (ak |hk ) = P(Ak = ak |Hk = hk ) = pAk |Xk ,Ak−1 (ak |xk , ak−1) k = 2, . . . , K Define: Evaluate at d1(X1), dk {Xk , dk−1(Xk−1)} πd,1(X1) = pA1|X1 {d1(X1)|X1} πd,k (Xk ) = pAk |Xk ,Ak−1 [dk {Xk , dk−1(Xk−1)}|Xk , dk−1(Xk−1)] k = 2, . . . , K 279
  • 65. Inverse probability weighted estimator Define: Indicator of consistency of options actually received with those selected by d at all K decisions Cd = CdK = I{A1 = d1(X1), . . . , AK = dK (XK )} = I{A = d(X )} Inverse probability weighted estimator for V(d) = E{Y*(d)}: VIPW (d) = n−1 n i=1 Cd,iYi K k=2 πd,k (Xki) πd,1(X1i) (5.27) • (5.27) is an unbiased estimator for V(d) because E   Cd Y K k=2 πd,k (Xk ) πd,1(X1)   = E{Y* (d)} (5.28) under SUTVA (5.10), SRA (5.11) and positivity assumption (5.15) 280
  • 66. Inverse probability weighted estimator Sketch of proof of (5.28): We show a more general result E   Cd f(X , Y) K k=2 πd,k (Xk ) πd,1(X1)   = E[f{X* K (dK−1), Y* (d)}] (5.29) so that (5.28) follows by taking f(x, y) = y • Using SUTVA and (5.7) E   Cd f(X , Y) K k=2 πd,k (Xk ) πd,1(X1)   = E   Cd f{X* K (dK−1), Y* (d)} K k=2 πd,k {X* k (dk−1)} πd,1(X1)   = E    E   I{A = d(X )}f{X* K (dK−1), Y* (d)} K k=2 πd,k {X* k (dk−1)} πd,1(X1) X1, W*      = E  P{A = d(X )|X1, W* }f{X* K (dK−1), Y* (d)} K k=2 πd,k {X* k (dk−1)} πd,1(X1)   (5.30) 281
  • 67. Inverse probability weighted estimator From (5.30): Must show K k=2 πd,k {X* k (dk−1)} πd,1(X1) = P{A = d(X )|X1, W* } > 0 (5.31) • For (x1, w) such that P(X1 = x1, W* = w) > 0, show P{A = d(X )|X1 = x1, W* = w} > 0 • There exist x2, . . . , xK such that P{X* k (dk−1) = xk } > 0, k = 2, . . . , K, because X* K (dK−1) is a function of W* • Using SUTVA P{A = d(X )|X1 = x1, W* = w} = P{A1 = d1(x1)|X1 = x1, W* = w} × K k=2 P[Ak = dk {xk , dk−1(xk−1)}|Ak−1 = dk−1(xk−1), X1 = x1, W* = w] 282
  • 68. Inverse probability weighted estimator • Must show each term in this factorization is well defined; true if P{Ak = dk (xk ), X1 = x1, W* = w} > 0, k = 1, . . . , K • By induction; when k = 1 P{A1 = d1(x1), X1 = x1, W* = w} = P{A1 = d1(x1) | X1 = x1, W* = w}P(X1 = x1, W* = w) is positive if P{A1 = d1(x1) | X1 = x1, W* = w} > 0 • This holds because P(X1 = x1) > 0 so x1 ∈ Γ1 and d1(x1) ∈ Ψ1(x1) so by positivity assumption P{A1 = d1(x1) | X1 = x1, W* = w} = P{A1 = d1(x1) | X1 = x1} > 0 • Now assume P{Ak = dk (xk ), X1 = x1, W* = w} > 0 and show P{Ak+1 = dk+1(xk+1), X1 = x1, W* = w} > 0 283
  • 69. Inverse probability weighted estimator • X* k (dk−1) includes X1, so (X1 = x1, W* = w) and {X* k+1(d) = xk+1, W* = w} are equivalent, and thus P{Ak+1 = dk+1(xk+1), X1 = x1, W* = w} = P[Ak+1 = dk+1{xk+1, dk (xk )}|Ak = dk (xk ), X* k+1(dk ) = xk+1, W* = w] × P{Ak = dk (xk ), X1 = x1, W* = w} • Must show first RHS term is > 0; by SUTVA and SRA, this term = = P[Ak+1 = dk+1{xk+1, dk (xk )}|Xk+1 = xk+1, Ak = dk (xk ), W* = w] = P[Ak+1 = dk+1{xk+1, dk (xk )}|Xk+1 = xk+1, Ak = dk (xk )] • Because {x1, d1(x1)} ∈ Λ1 and P{X* k (dk−1) = xk } > 0, the argument leading to (5.23) yields {xk+1, dk (xk )} ∈ Γk+1, dk+1{xk+1, dk (xk )} ∈ Ψk+1(hk+1), so this term is > 0 284
  • 70. Inverse probability weighted estimator • Applying these results yields P{A = d(X )|X1, W* } = pA1|X1 {d1(X1)|X1} × K k=2 pAk |Xk ,Ak−1 [dk {Xk , dk−1(Xk−1)}|Xk , dk−1(Xk−1)] = πd,1(X1) K k=2 πd,k {X* k (dk−1)} which is the desired equality 285
  • 71. Inverse probability weighted estimator Result: (5.28) holds E   Cd Y K k=2 πd,k (Xk ) πd,1(X1)   = E{Y* (d)} • VIPW (d) is an unbiased estimator for V(d) • Alternative representation of E{Y*(d)} in terms of observed data • Taking instead for fixed y f{X* K (dK−1), Y* (d)} = I{Y* (d) = y} yields an alternative representation of the marginal density of Y*(d) 286
  • 72. Inverse probability weighted estimator More generally: For fixed (x1, x2, . . . , xK , y), treating all variables as discrete, taking f{X* K (dK−1), Y* (d)} = I{X1 = x1, X* 2(d1) = x2, . . . , X* K (dK−1) = xK , Y* (d) = y} yields an alternative representation of the joint density P{X1 = x1, X* 2(d1) = x2, . . . , X* K (dK−1) = xK , Y* (d) = y} = pX1,X* 2(d1),X* 3(d2),...,X* K (dK−1),Y*(d)(x1, . . . , xK , y) = E  Cd I(X1 = x1, X2 = x2, . . . , XK = xK , Y = y) K k=2 πd,k (Xk ) πd,1(X1)   287
  • 73. Inverse probability weighted estimator Denominator: K k=2 πd,k (Xk ) πd,1(X1) • Can be interpreted as the propensity for receiving treatment consistent with regime d through all K decisions given observed history • Depends on the propensities of treatment given observed history pAk |Hk (ak |hk ) = P(Ak = ak | Hk = hk ), k = 1, . . . , K • In practice: Posit and fit models for the propensities depending on parameters γk and estimators γk , k = 1, . . . , K; considerations for this momentarily • These models induce models πd,1(X1; γ1) and πd,k (Xk ; γk ) 288
  • 74. Inverse probability weighted estimator In practice: IPW estimator VIPW (d) = n−1 n i=1 Cd,iYi K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1) (5.32) • Consistent estimator for V(d) as long as the propensity models are correctly specified 289
  • 75. Inverse probability weighted estimator Alternative estimator: VIPW∗ (d) =   n i=1 Cd,i K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1)   −1 × n i=1 Cd,iYi K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1) (5.33) • Weighted average, also a consistent estimator • Can be considerably more precise than VIPW (d) 290
  • 76. Considerations for propensity modeling Simplest case: Two options at each decision point, feasible for all individuals, Ak = {0, 1}, k = 1, . . . , K • Can work with propensity scores πk (hk ) = P(Ak = 1|Hk = hk ) = πk (xk , ak−1), k = 1, . . . , K pA1|X1 (a1|x1) = pA1|H1 (a1|h1) = π1(h1)a1 {1 − π1(h1)}1−a1 pAk |Xk ,Ak−1 (ak |xk , ak−1) = pAk |Hk (ak |hk ) = πk (hk )ak {1 − πk (hk )}1−ak k = 2, . . . , K • Using (5.2) πd,1(X1) = π1(X1)d1(X1) {1 − π1(X1)}1−d1(X1) πd,k (Xk ) = πk {Xk , dk−1(Xk−1)}dk {Xk ,dk−1(Xk−1)} × [1 − πk {Xk , dk−1(Xk−1)}]1−dk {Xk ,dk−1(Xk−1)} , k = 2, . . . , K 291
  • 77. Considerations for propensity modeling • Can posit a parametric model πk (hk ; γk ) for each k; e.g., logistic regression models as in (3.12) πk (hk ; γk ) = exp(γk1 + γT k2hk ) 1 + exp(γk1 + γT k2hk ) , γk = (γk1, γT k2)T , k = 1, . . . , K hk = (1, hT k )T , k = 1, . . . , K, and fit via maximum likelihood to obtain γk , k = 1, . . . , K 292
  • 78. Considerations for propensity modeling More than 2 options: Feasible for all individuals, Ak = {1, . . . , mk } • As on Slide 170, with ωk (hk , ak ) = P(Ak = ak |Hk = hk ), k = 1, . . . , K or ωk (xk , ak−1, ak ) = P(Ak = ak |Xk = xk , Ak−1 = ak−1) where ωk (hk , mk ) = 1 − mk −1 ak =1 ωk (hk , ak ) • Then πd,1(X1) = m1 a1=1 ω1(X1, a1)I{d1(X1)=a1} = m1 a1=1 I{d1(X1) = a1}ω1(X1, a1) πd,k (Xk ) = mk ak =1 ωk {Xk dk−1(Xk−1), ak }I[dk {Xk ,dk−1(Xk−1)}=ak ] , = mk ak =1 I[dk {Xk , dk−1(Xk−1)} = ak ]ωk {Xk , dk−1(Xk−1), ak } k = 2, . . . , K293
  • 79. Considerations for propensity modeling • Can posit parametric models ωk (hk , ak ; γk ) for each k e.g., multinomial (polytomous) logistic regression models ωk (hk , ak ; γk ) = exp(hT k γk,ak ) 1 + mk −1 j=1 exp(hT k γkj) , ak = 1, . . . , mk − 1 hk = (1, hT k )T , γk = (γT k1, . . . , γT k,mk −1)T and fit via maximum likelihood to obtain γk , k = 1, . . . , K 294
  • 80. Considerations for propensity modeling Feasible sets: k distinct subsets, each with ≥ 2 options in Ak Ak,l = {1, . . . , mkl}, , l = 1, . . . , k , k = 1, . . . , K • For k = 1, . . . , K, ak ∈ {1, . . . , mkl} = Ak,l ωk,l(hk ,ak ) = P(Ak = ak |Hk = hk ) ωk,l(hk , mkl) = 1 − mkl −1 ak =1 ωk,l(hk , ak ) • For k = 1, . . . , K, can posit k separate logistic or multinomial (polytomous) logistic regression models ωk,l(hk , ak ; γkl), l = 1, . . . , k 295
  • 81. Considerations for propensity modeling • For each k = 1, . . . , K, implies an overall model ωk (hk , ak ; γk ) = k l=1 I{sk (hk ) = l} ωk,l(hk , ak ; γkl), γk = (γT k1, . . . , γT k k )T understood that ak takes values in the relevant distinct subset • For subsets with a single option, P(Ak = ak | Hk = hk ) = 1, and no model is needed • Induced models πd,1(X1; γ1) = 1 l=1 I{s1(h1) = l} m1l a1=1 ω1,l (X1, a1; γ1l )I{d1(X1)=a1} πd,k (Xk ; γk ) = k l=1 I{sk (hk ) = l} × mkl ak =1 ωk,l {Xk , dk−1(Xk−1), ak ; γkl }I[dk {Xk ,dk−1(Xk−1)}=ak )] 296
  • 82. Considerations for propensity modeling In a SMART: Propensities are known • As on Slide 90, preferable to estimate the propensities • Estimate P(Ak = ak | Hk = hk ) by sample proportions corresponding to each distinct subset Ak,l • Example: Acute leukemia, 2 = 2, r2 component of h2 indicating response Ψ2(h2) = {M1, M2} = A2,1, r2 = 1 = {S1, S2} = A2,2, r2 = 0 • Estimate P(A2 = a2 | H2 = h2) for a2 ∈ A2,1 by n i=1 I(R2i = 1) −1 n i=1 I(R2i = 1)I(A2i = a2) • Estimate P(A2 = a2 | H2 = h2) for a2 ∈ A2,2 by n i=1 I(R2i = 0) −1 n i=1 I(R2i = 0)I(A2i = a2) 297
  • 83. Inverse probability weighted estimator Equivalent representation: For both VIPW (d) and VIPW∗(d) • When Cd = 1, A1 = d1(X1), and Ak = dk {Xk , dk−1(Xk−1)}, k = 2, . . . , K, and it is straightforward that the denominator K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1) can be replaced by the fitted model for K k=1 pAk |Hk (Aki|Hki) without altering the value of either estimator 298
  • 84. Inverse probability weighted estimator • E.g., with 2 feasible options at each decision point, replace the denominator by K k=1 πk (Hki; γk )Aki {1 − πk (Hki; γk )}1−Aki • With more than 2 options, by K k=1 ωk (Hki, Aki; γk ) • In some literature accounts, the estimators are defined with these quantities in the denominators • This will be important later 299
  • 85. Inverse probability weighted estimator Remarks: • VIPW (d) and VIPW∗(d) are consistent estimators as long as the propensity models are correctly specified; can be inconsistent otherwise • Except for estimation of propensities, these estimators use data only from subjects for whom A = d(X ) • For larger K, there are likely very few such subjects • These estimators can be unstable in finite samples due to division by propensity of treatment consistent with d small and can exhibit large sampling variation • As in the single decision case, even if the propensities are known, it is preferable to estimate them via maximum likelihood • Large sample approximations to the sampling distributions of VIPW (d) and VIPW∗(d) follow from M-estimation theory 300
  • 86. Augmented inverse probability weighted estimator Drop out analogy: For individuals for whom A = d(X ) • Analogy to a monotone coarsening problem; i.e., “drop out” • Under SUTVA, an individual with A1 = d1(X1), . . . , Ak−1 = dk−1{Xk−1, dk−2(Xk−2)} but for whom Ak = dk {Xk , dk−1(Xk−1)} has X2 = X* 2(d1), . . . , Xk = X* k (dk−1), i.e., Xk = X* k (dk−1) • But his Xk+1, . . . , XK and Y do not reflect the potential outcomes he would have if he had continued to follow rules dk , . . . , dK • Effectively, in terms of receiving treatment options consistent with d, the individual has “dropped out” at Decision k 301
  • 87. Augmented inverse probability weighted estimator Improvement: From the perspective of information on {X1, X* K (dK−1), Y*(d)} and especially Y*(d) • X* k (dk−1) is observed, but X* k+1(dk ), . . . , X* K (dK−1), Y*(d) are “missing” • Can we exploit the partial information from such individuals to gain efficiency? Dropout analogy: Under SUTVA, SRA, and positivity, this “drop out” is according to a missing (coarsening) at random mechanism (shown by Zhang et al., 2013), allowing semiparametric theory for monotone coarsening at random to be used (Robins et al., 1994; Tsiatis, 2006) 302
  • 88. Augmented inverse probability weighted estimator Define: • Indicator of treatment options consistent with d through Decision k Cdk = I{Ak = dk (Xk )}, k = 1, . . . , K with Cd0 ≡ 1 • For brevity, write πd,1(X1) = πd,1(X1) and πd,k (Xk ) =    k j=2 πd,j(Xj)    πd,1(X1), k = 2, . . . , K with πd,0 ≡ 1 • Substitution of propensity models leads to models πd,k (Xk ; γk ), k = 1, . . . , K γk = (γT 1 , . . . , γT k )T , k = 1, . . . , K 303
  • 89. Augmented inverse probability weighted estimator Analogous to AIPW estimator for single decision: Under these conditions, if the models for the propensities pAk |Hk (ak |hk ), k = 1, . . . , K are correctly specified, from semiparametric theory, all consistent and asymptotically normal estimators for V(d) for fixed d ∈ D are asymptotically equivalent to an estimator of the form VAIPW (d) = n−1 n i=1   Cd,iYi K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1) (5.34) + K k=1 Cdk−1,i πd,k−1(Xk−1,i; γk−1) − Cdk ,i πd,k (Xki, γk ) Lk (Xki)   • Lk (xk ) are arbitrary functions of xk , k = 1, . . . , K • γk = (γT 1 , . . . , γT k )T , k = 1, . . . , K 304
  • 90. Augmented inverse probability weighted estimator Features of VAIPW (d): • Taking Lk (xk ) ≡ 0, k = 1, . . . , K, yields VIPW (d) in (5.32) • When K = 1, because Cd0 ≡ 1, πd,0 ≡ 1, and X1 = H1, (5.34) reduces to the AIPW estimator (3.16) for the single decision case • If all K propensity models are correctly specified, so there are true values γk,0 of γk , k = 1, . . . , K, the “augmentation term” in (5.34) evaluated at γk,0, k = 1, . . . , K, converges in probability to zero for arbitrary Lk (xk ), k = 1, . . . , K • Thus, (5.34) is a consistent estimator for V(d) with asymptotic variance depending on the choice of Lk (xk ) 305
  • 91. Augmented inverse probability weighted estimator Efficient estimator: From semiparametric theory, the estimator with smallest asymptotic variance among those in the class (5.34) takes Lk (xk ) = E{Y* (d) | X* k (dk−1) = xk }, k = 1, . . . , K (5.35) • The conditional expectations in (5.35) are functionals of the distribution of the potential outcomes {X1, X* 2(d1), . . . , X* k (dk−1), Y*(d)} and are unknown in practice • As in the single decision case, posit and fit models Qd,k (xk ; βk ), k = 1, . . . , K, (5.36) for E{Y*(d) | X* k (dk−1) = xk }, k = 1, . . . , K, and substitute in (5.34) • We present approaches to developing and fitting models (5.36) when we discuss optimal regimes later 306
  • 92. Augmented inverse probability weighted estimator Result: Given estimators βk , k = 1, . . . , K, obtained as we discuss later, the AIPW estimator is VAIPW (d) = n−1 n i=1   Cd,iYi K k=2 πd,k (Xki; γk ) πd,1(X1i; γ1) (5.37) + K k=1 Cdk−1,i πd,k−1(Xk−1,i; γk−1) − Cdk ,i πd,k (Xk,i, γk ) Qd,k (Xki; βk )   • VAIPW (d) in (5.37) is doubly robust; i.e., is consistent if either (1) the models for the propensities and thus for πd,1(x1) and πd,k (xk ), k = 2, . . . , K, or (2) the models for for E{Y*(d) | X* k (dk ) = xk } in (5.36) are correctly specified • If the observed data are from a SMART, the propensities are known, and VAIPW (d) is guaranteed to be consistent regardless of the models (5.36) 307
  • 93. Augmented inverse probability weighted estimator Efficient estimator: If all models are correct, VAIPW (d) is efficient among estimators in class (5.37), achieving the smallest asymptotic variance Remarks: • VAIPW (d) can exhibit considerably less sampling variation than the simple IPW estimators (5.32) and (5.33) • As for VIPW (d) and VIPW∗(d), large sample approximate sampling distribution can be obtained via M-estimation theory (although pretty involved) • Zhang et al. (2013) propose (5.37) in a different but equivalent form following directly from that in Tsiatis (2006) 308
  • 94. Estimation via marginal structural models Alternative approach: When scientific interest focuses on regimes with simple rules that can be represented in terms of a low-dimensional parameter η • Formally, restrict to a subset Dη ⊂ D with elements dη = {d1(h1; η1), . . . , dK (hK ; ηK )}, η = (ηT 1 , . . . , ηT K )T • Goal: Estimate the value of a fixed regime dη∗ ∈ Dη corresponding to a particular η∗, V(dη∗ ) = E{Y*(dη∗ )} • As in the example coming next, Dη may be even simpler, with η = η1 = · · · = ηK 309
  • 95. Estimation via marginal structural models Example: Treatment of HIV-infected patients • K monthly clinic visits, decision on whether or not to administer antiretrovial (ARV) therapy for the next month based on CD4 T-cell count (cells/mm3) (larger is better) • Two feasible options: 1 = ARV therapy for the next month or 0 = no ARV therapy for the next month • Focus on rules involving a common CD4 threshold η below which ARV therapy is administered and above which it is not dk (hk ; η) = I(CD4k ≤ η), k = 1, . . . , K CD4k = CD4 T cell count (cells/mm3) immediately prior to Decision k • Dη comprises regimes dη with rules of this form • Final outcome: Y*(dη) = negative viral load (viral RNA copies/mL) measured 1 month after Decision K if administered ARV therapy using rules in dη 310
  • 96. Estimation via marginal structural models Focus: Estimation of V(dη∗ ) for particular threshold η∗ • Ideal: Data from a study where HIV patients were given ARV therapy according to dη∗ and viral load was ascertained after Decision K • More likely: Available data are observational, with information Xk , k = 1, . . . , K, including CD4k , options Ak actually received, and −final viral load Y recorded • In these data, there are likely very few if any individuals who received ARV therapy according to the rules in dη∗ • An approach to using these data to estimate V(dη∗ ) is suggested by work of Orellana et al. (2010ab) 311
  • 97. Estimation via marginal structural models Marginal structural model: A model for V(dη) as a function of η • I.e., V(dη) = µ(η) for some function µ( · ), posit a parametric model V(dη) = E{Y* (dη)} = µ(η; α) (5.38) referred to as a marginal structural model (MSM) • For example, a quadratic model µ(η; α) = α1 + α2η + α3η2 , α = (α1, α2, α3)T • Estimate α based on the data by an appopriate method to obtain α, and estimate V(dη∗ ) by VMSM(dη∗ ) = µ(η∗ ; α) • η plays the role of “covariate” and µ(η∗; α) is the “predicted value” at the particular value η∗ of interest 312
  • 98. Estimation via marginal structural models Hope: The MSM is a correct specification of the true relationship µ(η) across a plausible range of thresholds of interest Estimation of α: Ideally, data from a prospective randomized study • Each subject i = 1, . . . , n is randomized to one of m predetermined thresholds η(j), j = 1, . . . , m, in the range of interest and receives ARV according to dη(j) • Natural estimator α solves in α the estimating equation n i=1 m j=1 Cdη(j) ,i ∂µ(η(j); α) ∂α w(η(j)){Yi − µ(η(j); α)} = 0 (5.39) where w(η) is a weight function, Cdη = I{A = dη(X )} • w(η) ≡ 1 yields OLS estimation • Each subject i has treatment experience consistent exactly one of the η(j), j = 1, . . . , m • (5.39) is an unbiased estimating equation if µ(η; α) is correct 313
  • 99. Estimation via marginal structural models Observtional data: Approach is motivated by (5.39) • Some individuals have treatment experience consistent with ≥ 1 values of η, e.g., with K = 2 Experience Consistent with CD41=300, A1 = 1, CD42=400, A2 = 0 η ∈ (300, 400) CD41=300, A1 = 1, CD42=400, A2 = 1 η > 400 CD41=300, A1 = 0, CD42=400, A2 = 0 η < 300 • Others have treatment experience consistent with no value of η Experience Consistent with CD41=300, A1 = 0, CD42=400, A2 = 1 no η 314
  • 100. Estimation via marginal structural models Observational data: Estimator α is solution to n i=1 Dη   Cdη,i K k=2 πdη,k (Xki; γk ) πdη,1(X1i; γ1) × ∂µ(η; α) ∂α w(η){Yi − µ(η; α)}   dν(dη) = 0 • dν(dη) is an appropriate dominating measure on dη ∈ Dη • w(η) is a weight function • A given individual i can contribute information on multiple thresholds or none at all • As in the IPW estimators, each of her contributions is weighted by the reciprocal of an estimator for the propensity of receiving treatment consistent with dη given observed history 315
  • 101. Estimation via marginal structural models For example: Interest in η ∈ [100, 500], for j = 1, . . . , m, partition η(j) = 100 + 400(j − 1)/(m − 1), dν(dη) places point mass on η(j) n i=1 m j=1   Cdη(j) ,i K k=2 πη(j),k (Xki; γk ) πη(j),1(X1i; γ1) × ∂µ(η(j); α) ∂α w(η(j)){Yi − µ(η(j); α)}   = 0 • Under SUTVA, SRA, and positivity, if µ(η; α) and the propensity models are correctly specified, these estimating equations can be shown to be unbiased, and α is an M-estimator • Here, if there is a sufficient # of individuals who received treatment consistent with at least one η(j), j = 1, . . . , m, α should be a reasonable estimator in practice • An augmented version of the estimating equation is possible 316
  • 102. 5. Multiple Decision Treatment Regimes: Framework and Fundamentals 5.1 Multiple Decision Treatment Regimes 5.2 Statistical Framework 5.3 The g-Computation Algorithm 5.4 Estimation of the Value of a Fixed Regime 5.5 Key References 317
  • 103. References Orellana, L., Rotnitzky, A., and Robins, J. M. (2010a). Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: Main content. The International Journal of Biostatistics, 6. Orellana, L., Rotnitzky, A., and Robins, J. M. (2010b). Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part II: Proofs and additional results. The International Journal of Biostatistics, 6. Robins, J. (1986). A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512. Robins, J. (1987). Addendum to: A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Computers and Mathematics with Applications, 14, 923–945.318
  • 104. References Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Lin, D. Y. and Heagerty, P., editors, Proceedings of the Second Seattle Symposium on Biostatistics, pages 189–326, New York. Springer. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, 100, 681–694. Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012). Estimating individual treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107, 1106–1118. 319