1. Predicting human-driving behavior to help driverless vehicles drive:
random intercept Bayesian additive regression trees
Yaoyaun Vincent Tana
(Email:vincetan@umich.edu), Carol Flannaganb
, Michael Elliotta,c
aUniversity of Michigan Department of Biostatistics, bUniversity of Michigan Transportation Research Institute, cInstitute for Social Research
Background
• Develop a model to help engineers developing driverless vehicles predict
whether a human-driven vehicle would stop before executing a left turn
at an intersection (Tan et. al., 2015).
• Dataset – naturalistic driving data (Sayer et. al., 2011).
• Preliminary work suggested that Bayesian additive regression trees (BART)
produce a more stable prediction performance compared to Super Learner
(which included elastic net, logistic regression, K-Nearest Neighbor, Gen-
eralized Additive Models, mean of the outcomes, and BART).
Issue
• BART (Chipman et. al., 2010) developed assuming independent subject
observations; but, our dataset consist of longitudinal observations.
• Current literature: Zhang et. al. (2007) – used distributions more com-
monly seen in spatial statistics to handle within subject correlation; Low-
Kam et. al. (2015) – model too complicated for our problem.
We propose ...
• A simple framework to extend BART to longitudinal datasets.
• Add a random intercept. Two alternative distributional assumptions: (i)
normal and (ii) multiplication of two independent normal distributions
which gives a folded non-central t prior on the within subject correlation
parameter (Gelman, 2006).
• Provide a strategy to easily implement riBART by making use of the ex-
isting BART implementation packages in R (R Core Team, 2015)
Random intercept BART (riBART)
Bayesian additive regression trees
• Power maintained when estimating non-linear associations and high-dimensional
interactions.
• Achieved by modeling the mean outcome given predictors using a sum of
weak regression trees.
Normal distribution random intercept
Suppose continuous outcomes Yik and p covariates Xik, k = 1, . . . , K, i =
1, . . . , nk. k indexes the subjects and i indexes the observations within a
subject. The riBART model is
Yik =
m
j=1
g(Xik, Tj, Mj) + ak + ik, (1)
• ik ∼ N(0, σ2), ak ∼ N(0, τ2), ak⊥ ik.
• Tj is the jth binary regression tree structure.
• Mj = {µ1j, . . . , µbjj} is the set of bj terminal node parameters associated
with tree structure Tj.
• ak is the random intercept we introduce to BART.
• Binary outcomes – let Φ[.] be the c.d.f. of a standard normal, replace
equation (1) as
P(Yik = 1|Xik) = Φ[Ga(Xik)]
Ga(Xik) =
m
j=1
g(Xik, Tj, Mj) + ak, ak ∼ N(0, τ2)
Priors
• P(Tj) – use three aspects: (i) probability that node at depth d = 0, 1, 2, . . .
is an internal node is α(1 + d)−β, α ∈ (0, 1), β ∈ [0, ∞). Smaller α im-
plies terminal node less likely to split. Smaller β increases the number of
terminal nodes. (ii) Uniform distribution used to choose which covariate
to be selected for the decision rule in an internal node. (iii) Uniform dis-
tribution for the value of the selected covariate for the decision rule in an
internal node.
• µij|Tj ∼ N(µµ, σ2
µ).
• σ2 ∼ νλ/χ2
ν. For binary outcomes σ2 ≡ 1.
• P(τ) ∝ 1.
Posterior
• For continuous outcomes – given ak, ˜Yik = Yik − ak reduces to a usual
BART model (Zhang et. al., 2007).
• For binary outcomes – data augmentation (Tanner & Wong, 1987; Albert
& Chib, 1993):
(Zik|Yik = 1) = max{N(Ga(Xik), 1), 0},
(Zik|Yik = 0) = min{N(Ga(Xik), 1), 0}.
˜Zik = Zik − ak reduces to a continuous BART model
Folded non-central t prior for τ2
• P(τ) ∝ 1 may have inappropriate effects on inferences especially when
K is small or when τ2 is close to 0 (Gelman, 2006). Folded non-central t
(FNCT) prior for τ2 produces better results.
• Decompose ak in equation (1) as
ak = ξηk ξ ∼ N(0, B2), ηk ∼ N(0, θ2). (2)
• Similar priors with normal riBART with addition of θ2 ∼ ef/χ2
e.
• Evaluate the resulting posterior distribution with B → ∞ and set e =
f = 0.5.
• Posterior draws follow normal riBART closely with ak = ξηk and τ =
|ξ|θ.
Implementation
1. Begin with an initial estimate of σ (σ = 1 for all iterations in binary
outcomes) and ak (typically, ak = 0) or ξ and ηk (typically ξ = 1 and
ηk = 0).
2. Subtract ak from Yik (or Zik for binary outcomes) to obtain ˜Yik (or ˜Zik)
3. Provide the outcomes ˜Yik (or ˜Zik) with the covariates Xik to any com-
puter package or program that is able to implement BART. Set the de-
grees of freedom for the prior distribution of σ to 100,000 and use the
initial estimate of σ from Step 1 as the initial σ estimate in the BART
program. Draw 100 posterior draws with 0 burn ins for the m Tj and
Mjs.
4. Extract m
j=1 g(Xik, Tj, Mj)| ˜Yik, σ of the 100th posterior draw from
the BART program.
5. Use m
j=1 g(Xik, Tj, Mj)| ˜Yik, σ in the posterior draws of σ, ak and τ
or ξ, ηk, and θ.
6. Repeat Steps 2-4 to obtain the desired amount of burn ins and posterior
draws.
Key idea
• Posterior draws for the random intercept can be done outside of BART.
• Step 2 – Run 100 posterior draws instead of 1 posterior draw because
most BART packages initialize all m Tjs with a single terminal node.
• Step 3 – Set degrees of freedom to 100,000 to force each draw of BART
to be close to the σ estimate we provide (For the binary outcome, σ = 1).
Simulation study
Goal: Investigate whether riBART will improve prediction performance
compared to BART on a correlated dataset.
• Xikq ∼ Uniform(0, 1), q = 1, . . . , 5.
• Ga(Xik) = 1.35[sin(πXik1Xik2)+2(Xik3 −0.5)2 −Xik4 −0.5Xik5]+ak,
where ak ∼ N(0, τ2).
• πGa
(Xik) = Φ[Ga(Xik)].
• Yik ∼ Bernoulli(πGa
(Xik)).
Table 1: Description of simulation scenarios with 95% coverage results for
the posterior draw of τ under normal and folded non-central t (FNCT) prior
riBART.
95% Coverage
Rep. measures No. of subjects τ Normal FNCT
Scenario 1 5 50 1 100% 0%
Scenario 2 20 100 1 100% 0%
Scenario 3 5 50 0.5 0% 0%
Scenario 4 20 100 0.5 37% 0%
• Implemented riBART using strategy outlined.
• 1,000 burn ins and 5,000 posterior draws.
• 200 simulations. For each simulation, fix nk, K, and τ based on scenario,
then generate Xikq, Ga(Xik), and Yik.
Figure 1: Histogram of the 200 AUC produced under normal riBART and
BART.
AUC
Frequency
0.82 0.84 0.86 0.88 0.90 0.92 0.94
0102030405060
BART
riBART
AUC
Frequency
0.75 0.80 0.85 0.90
01020304050
BART
riBART
AUC
Frequency
0.84 0.86 0.88 0.90 0.92
0102030405060
BART
riBART
AUC
Frequency
0.78 0.80 0.82 0.84
010203040506070
BART
riBART
Top left: Scenario 1; Top right: Scenario 2; Bottom left: Scenario 3;
Bottom right: Scenario 4.
• Similar AUCs produced for both the normal riBART and FNCT prior
riBART.
• Number of repeated measurements, nk, and the within subject correlation,
τ, increases ⇒ riBART improves prediction performance of BART.
Table 2: Mean of bias in τ for normal riBART and folded non-central t
(FNCT) prior riBART.
Normal FNCT
Scenario 1 -0.093 -0.836
Scenario 2 1.546 -0.419
Scenario 3 0.400 -0.388
Scenario 4 0.531 -0.421
• Bias calculated as τ
(l)
bias = ˆτ(l) − τ where ˆτ(l) is the empirical posterior
mean of τ in simulation l.
• Posterior mean of τ is biased under normal riBART and FNCT prior riB-
ART.
• If estimation of τ still desired, should use normal riBART because cover-
age is better.
Predicting driver stop
• 108 drivers, 3,795 turns of which 1,823 left turns.
• On average 35 turns per driver. Range: 8 to 139 turns.
• See Tan et. al. (2015) for details about dataset and data manipulations.
• Compare riBART, BART, logistic regression, and rilogistic regression.
• Estimate variance of AUC using linear approximation of AUC from Somer’s
D (Hanley & McNeil, 1982) to compute 95% CI.
Figure 2: Comparing AUC profile with 95% CI of riBART (incorporate
within-driver correlation), BART (ignore within-driver correlation), linear
logistic regression (assumes linearity and ignores complex interaction), and
random intercept logistic regression (incorporate within-driver correlation
but assumes linearity and ignores complex interaction).
−80 −60 −40 −20 0
0.60.70.80.91.0
Distance from reference (m)
AUC
BART
riBART
Logistic
riLogistic
Conclusion
• In application, use of riBART dramatically improves prediction of driver
stopping behavior ⇒ each driver’s propensity to stop should be esti-
mated and used in prediction models.
• Use riBART when
(i) average number of repeated measurements is large, around twenty
and
(ii) when within subject correlation τ is suspected to be high.
• Caution should be exercised when using riBART for inference.
• riBART implementation novel but computational burden is an issue.
• Future work - generalize to multiple linear random effects.
References
• Albert, J. and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data. Journal
of the American Statistical Association, 88, 669-679.
• Chipman, H.A., George, E.I., McCulloch, R.E. (2010). BART: Bayesian Additive Regression Trees. The
Annals of Applied Statistics, 4(1):266-298.
• Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (Comment on
Article by Browne and Draper). Bayesian Analysis, 1, 515-534.
• Hanley, J. and McNeil, B. (1982). The Meaning and Use of the Area under a Receiver Operating Charac-
teristic (ROC) Curve. Radiology, 143, 29-36.
• Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J., and Nel, A. (2015). A Bayesian regression
tree approach to identify the effect of nanoparticles’ properties on toxicity profiles. The Annals of Applied
Statistics, 9, 383-401.
• R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria.
• Sayer, J., Bogard, S., Buonarosa, M., LeBlanc, D., Funkhouser, D., Bao, S., Blankespoor, A., and Winkler,
C. (2011). Integrated Vehicle-Based Safety Systems Light-Vehicle Field Operational Test Key Findings
Report DOT HS 811 416. Retrieved August 26, 2015, from http://www.nhtsa.gov/DOT/NHTSA/
NVS/Crash%20Avoidance/Technical%20Publications/2011/811416.pdf
• Tan, Y., Elliott, M., and Flannagan, C. (2015). Development of a Real-time Prediction Model of Driver
Behavior at Intersections Using Kinematic Time Series Data. In JSM Proceedings, Transportation Statis-
tics Interest Group.
• Tanner, M. and Wong, W. (1987). The Calculation of Posterior Distributions by Data Augmentation.
Journal of the American Statistical Association, 82, 528-540.
• Zhang, S., Shih, Y., and M¨uller, P. (2007). A Spatially-adjusted Bayesian Additive Regression Tree Model
to Merge Two Datasets. Bayesian Analysis, 2, 611-634.
Acknowledgments
This work was supported jointly by Dr. Michael Elliott and an ATLAS Research Excellence Program project
awarded to Dr. Carol Flannagan. We would like to thank Dr. Jian Kang and Dr. Brisa S´anchez for their
suggestions.