2. HIV
• HIV is a sexually-transmitted disease found
across the globe.
• Over 37.9 million people living with HIV (PLHIV)
worldwide and over 35 million deaths so far
• Sub-Saharan Africa accounts for 25.7 million
PLHIV, over 65% of current worldwide estimates
3. Interventions
• Thankfully, HIV can be treated
• An intervention is a measure taken in order to limit the spread of HIV
• Interventions we deal with:
• ART – Anti Retroviral Therapy
• VMMC – Voluntary Male Medical Circumcision
• PrEP – Pre Exposure Prophylaxis
• HCT – Home Counciling/Testing
• ANC/PMTCT – AnteNatal Care/Prevention of Mother To Child Transmission
• They are expensive!
4. Motivation
• Money isn’t infinite
• Treating diseases like HIV is expensive, and if money is distributed
among treatment programs inefficiently, lives may be lost in the
inefficiency
• Our main goal is to develop a framework for intervention optimization
which, for a given budget, minimizes the effect of HIV.
7. EMOD Details
• Disease- and region- specific stochastic disease model.
• HIV
• South Africa & Kenya
• EMOD simulates a population over discrete timesteps, mimics the behavior
of an endemic disease.
• EMOD can be likened to a massive Markov Chain - agents move from state
to state based on their current status and internal probabilities
• These probabilities are represented by the parameters which we will change.
• EMOD uses calibrated datasets to run ‘sub’ simulations
• Each ‘sub’ simulation is slightly different and generates one set of output files
• We average the outputs for ‘sub’ simulations to correspond to a single parameter set
• It takes 2 minutes to evaluate a single set of input parameters
8. Cost Calculation
• Cost is the total amount of money spent on interventions past our
threshold year, 𝑡0
• For each intervention:
• 𝐼𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛 𝐶𝑜𝑠𝑡 = 𝑁𝑖 ∗ 𝐶𝑖 ∗ 𝑑𝑓 𝑡 −𝑡0
• 𝑁𝑖 is the number of times this intervention is given in year 𝑖.
• 𝐶 is the intervention cost.
• 𝑑𝑓 is the decay factor, normally 0.97
• Used to mitigate uncertainty caused by randomness
• Total cost is the sum of each intervention cost.
9. DALY Calculation
• DALYs stand for disability-adjusted life years, they are the amount of
time lost due to HIV.
• We begin counting DALYs after a threshold year, denoted 𝑡0
• For each person who died due to HIV:
• 𝐷𝐴𝐿𝑌𝑠 = 𝐿𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 − 𝐿𝑅𝑒𝑎𝑙𝑖𝑧𝑒𝑑 ∗ 𝑑𝑓 𝑡 −𝑡0
• 𝐿𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 is the expected lifespan
• 𝐿𝑅𝑒𝑎𝑙𝑖𝑧𝑒𝑑 is the actual lifespan
• 𝑑𝑓 is the decay factor, normally 0.97
• Used to mitigate uncertainty caused by randomness
10. Main Algorithm
• Initial sampling of parameter space to obtain ‘training’ data
• While True:
• Construct a surrogate model using training data
• Find the optimal allocation policy using surrogate model
• Simulate hypothesized optimal
• If simulation output satisfies our stopping criteria, break
• Augment training data using new data
11.
12. Optimal Allocation Policy Algorithm
1. Evaluate 1M random parameter sets with a surrogate model
2. While True
1. Throw out those which violate the constraint (Cost > TargetCost)
2. Select the top 1/15th of the viable points according to DALY count
1. Break if only a single point is selected
3. Perturb the selected points into 10 new points
3. Repeat step 1 and the previous loop 5 times, return the parameter
set with the least DALY count
1M evaluations at 2 min/evaluation = 2M mins ≈ 4 years!
13.
14. Tensor Train Motivation
• In order to work on higher dimension problems, we need a new
method for constructing surrogate models
• This is where tensor trains come in:
• Generalization of SVD to higher dimensions
• The version of tensor trains I used is not the original version.
15. Tensor Train Formulation
• Tensor trains try to represent a multivariate function as a
product of univariate polynomials with matrix coefficients
• 𝑓 𝑥1, 𝑥2, … , 𝑥𝑛 = 𝐺0
𝑇
𝑥0 ∗ 𝐺1(𝑥1) ∗ … ∗ 𝐺𝑛−1(𝑥𝑛−1) ∗ 𝐺𝑛(𝑥𝑛)
• 𝐺𝑖 𝑥𝑖 = 𝑗=0
𝑘
𝑏𝑗 𝑥𝑖 ∗ 𝐶𝑖𝑗
• 𝑘 is the order which we are using for our polynomial approximations
• 𝐶𝑖𝑗 are the coefficient matrices to the polynomial terms
• 𝑏𝑖(𝑥) is the 𝑖𝑡ℎ
basis function (Hermite polynomials, Taylor polynomials,
etc.)
• Each 𝐺𝑖 is referred to as a ’cart’
• Together, they form a ‘train,’ hence, Tensor Train
16. Tensor Train Shape
• To help understand the train, it may be useful to have a visual
rendering:
In this case, each univariate function is
estimated by a third-order polynomial
We want to solve for the coefficient
matrices
The main ‘issue’ arises when attempting to
solve a cart in the middle of the train –
difficult to isolate
17. Fitting Process
• We will fit the model iteratively.
• Our objective is to manipulate terms into the least squares form 𝐴𝑥 = 𝑏,
where 𝐴 is some representation of our current cart, 𝑥 is some form of our
unknown coefficient matrices, and 𝑏 is our outputs.
• To explain fitting the model, we begin with the example of a single point 𝑥,
and the corresponding output 𝑦
We assume 𝑘 is the order of the polynomial estimation, 𝑏𝑖(𝑥𝑖) is the 𝑖𝑡ℎ
basis function evaluated at 𝑥𝑖
• For each component of 𝑥, generate 𝑏0(𝑥𝑖) through 𝑏𝑘(𝑥𝑖)
• Iterate over each cart, assume we are at cart 𝐺𝑖(𝑥𝑖):
• ”Fix” on the selected cart, evaluate the other carts at their respective 𝑥’s
19. Z Term
• We now compute three important terms: 𝑈, 𝑉, and 𝑍.
• 𝑈 is the product of all computed carts LEFT of the fixed cart
• 𝑉 is the product of all computed carts RIGHT of the fixed cart
• For our FIXED cart, 𝐺𝑖(𝑥𝑖), we evaluate 𝑍:
• 𝑍 = 𝑗=0
𝑘
𝑏𝑗 𝑥𝑖 ∗ 𝐶𝑖𝑗, where each 𝐶𝑖𝑗 is taken as a set of UNKNOWNS
(shape 𝑟1 ∗ 𝑟2 ∗ 𝑘).
• Each entry of 𝑍 contains 𝑘 + 1
unknowns.
20. Vectorization Trick
• We now apply a Kronecker identity:
• 𝑈𝑇
𝑍𝑉 = 𝑌
• 𝑉𝑇 ⊗ 𝑈𝑇 vec 𝑍 = vec(𝑌) = 𝑌
• This equation has the following shape:
22. Least Squares
• Now we have an equation with known values on the left and
purely unknowns on the right, which can be solved very simply with
least squares.
• This method can be generalized as well:
• With additional sample points, both the column vector and 𝑌 grow in the first
dimension.
• Since we are always solving for the same set of unknowns, least squares still
works!
23. Single Point Evaluation
When using the train, to evaluate a single parameter set, 𝑥,
we go through the following process:
1. Iterate through each train cart, assume the current cart is cart 𝑖
1. Using the 𝑥𝑖 corresponding to the current cart, generate 𝑏0(𝑥𝑖) through
𝑏𝑘 𝑥𝑖 .
2. Compute 𝐺𝑖(𝑥𝑖) = 𝑗=0
𝑘
𝑏𝑗 𝑥𝑖 ∗ 𝐶𝑖𝑗
2. Once each cart has been evaluated, multiply the carts together to
produce 𝑌.
25. Advantages/Potential Issues
• Advantages
• Massively dimension-reducing
• Able to model complex interactions between variables
• Generalizable to higher dimensions
• Customizable (more on that later)
• Disadvantages
• Requires a minimum quantity of points to begin solving, otherwise an
underdetermined system
• Sometimes unstable
• Results may not adhere to physical limitations
27. Problem Description
• 11 parameters total:
• 8 for ART
• 1 for VMMC
• 1 for PrEP
• 1 for HCT
• 10 range from 0-1 and one ranges from 0-0.2
• Bounds built into the optimization algorithm
28. Parameter Name Parameter Description Default Value Range
PresProb Probability a symptomatic individual
will get an HIV test
0.94 [0-1]
Child_6w Probability that a 6-week-old child
presents for HIV testing.
0.337 [0-1]
TestUptake Increased rate of testing after 2009 0.9 [0-1]
Staging Probability someone begins ART
Staging from a positive diagnosis
0.837 [0-1]
PreART Probability someone continues
PreART, instead of being LTFU
0.75 [0-1]
FastART Probability someone immediately
begins taking ART from the first stage
of testing after 2016.
0.95 [0-1]
On_ART Probability that someone who has
dropped out of ART is willing to re-
enroll in ART
0.9 [0-1]
KeepART Probability someone keeps using ART
after 2016
0.9 [0-1]
ARTInterrupted Probability ART is interrupted due to
unforeseen events.
0.2 [0-0.2]
PrEP Probability someone with a negative
HIV test begins to use PrEP
0.15 [0-1]
VMMC VMMC Coverage 0.8 [0-1]
29. Results
• I began training with 1000 training points, which took 2 days to run
• Surrogate models were within 1% error on the training data
• They ran into some trouble with ‘unexplored’ regions, but the algorithm was
able to update the models
Parameter sets chosen by the
optimization algorithm vs.
randomly selected points (all
evaluated by EMOD)
30. Varying Price Points
• The algorithm was able to generate different investment
recommendations for different price points – $1M to $3M
• Algorithm was able to locate and cluster around local minima
Cost $1,113,055.47 $1,465,936.00 $1,978,418.95 $2,513,331.67 $2,985,608.23
DALY 2,470.75 2,461.45 2,417.12 2,442.63 2,416.71
31. Important Parameters
• I sorted each simulation that I ran into one of 4 categories:
• Cheap – below $1.5M
• Expensive – above $1.5M
• Good – below 7K DALYs
• Bad – above 7K DALYs
• Calculated mean & STD for each parameter
37. 42-dim Problem Description
• I took a new model (Kenya) and searched for every parameter I could
find:
• 12 parameters for PrEP
• 2 parameters for ANC
• 4 parameters for HCT/testing
• 6 parameters for ART
• 18 parameters for VMMC
38. Results
Model Type Many Degs of Freedom Few Degs of Freedom Many Degs of Freedom + Preconditioning
# Negative Points (1M
random samples)
2547 761 0
# Points Over 100K
DALYs
1395 157 0
39. Kenya Results
• Cost was bounded at $3M
• Expected DALYs were now overestimated at 10K, while DALYs actually ranged from ~1800-2500
• Cost ranged from 2.4M to 3M
• Each point was within the allowed range.
• Cost was accurate to within $100K, probably because the expected Cost was in the middle of the output
range
• No clear local minima/groups of points
40. Future Work
• Using gradient descent to solve for optima instead of random
sampling.
• Account for ‘hidden costs’ of optimization
• Application to an accurate epidemiological problem
• The parameters chosen are not necessarily accurate to real-world capabilities,
and represent a proof of concept
• Testing work on other problems
• Rank–1 update instead of re-solving least squares?