SlideShare a Scribd company logo
1 of 1
Download to read offline
Predicting human-driving behavior to help driverless vehicles drive:
random intercept Bayesian additive regression trees
Yaoyaun Vincent Tana
(Email:vincetan@umich.edu), Carol Flannaganb
, Michael Elliotta,c
aUniversity of Michigan Department of Biostatistics, bUniversity of Michigan Transportation Research Institute, cInstitute for Social Research
Background
• Develop a model to help engineers developing driverless vehicles predict
whether a human-driven vehicle would stop before executing a left turn
at an intersection (Tan et. al., 2015).
• Dataset – naturalistic driving data (Sayer et. al., 2011).
• Preliminary work suggested that Bayesian additive regression trees (BART)
produce a more stable prediction performance compared to Super Learner
(which included elastic net, logistic regression, K-Nearest Neighbor, Gen-
eralized Additive Models, mean of the outcomes, and BART).
Issue
• BART (Chipman et. al., 2010) developed assuming independent subject
observations; but, our dataset consist of longitudinal observations.
• Current literature: Zhang et. al. (2007) – used distributions more com-
monly seen in spatial statistics to handle within subject correlation; Low-
Kam et. al. (2015) – model too complicated for our problem.
We propose ...
• A simple framework to extend BART to longitudinal datasets.
• Add a random intercept. Two alternative distributional assumptions: (i)
normal and (ii) multiplication of two independent normal distributions
which gives a folded non-central t prior on the within subject correlation
parameter (Gelman, 2006).
• Provide a strategy to easily implement riBART by making use of the ex-
isting BART implementation packages in R (R Core Team, 2015)
Random intercept BART (riBART)
Bayesian additive regression trees
• Power maintained when estimating non-linear associations and high-dimensional
interactions.
• Achieved by modeling the mean outcome given predictors using a sum of
weak regression trees.
Normal distribution random intercept
Suppose continuous outcomes Yik and p covariates Xik, k = 1, . . . , K, i =
1, . . . , nk. k indexes the subjects and i indexes the observations within a
subject. The riBART model is
Yik =
m
j=1
g(Xik, Tj, Mj) + ak + ik, (1)
• ik ∼ N(0, σ2), ak ∼ N(0, τ2), ak⊥ ik.
• Tj is the jth binary regression tree structure.
• Mj = {µ1j, . . . , µbjj} is the set of bj terminal node parameters associated
with tree structure Tj.
• ak is the random intercept we introduce to BART.
• Binary outcomes – let Φ[.] be the c.d.f. of a standard normal, replace
equation (1) as
P(Yik = 1|Xik) = Φ[Ga(Xik)]
Ga(Xik) =
m
j=1
g(Xik, Tj, Mj) + ak, ak ∼ N(0, τ2)
Priors
• P(Tj) – use three aspects: (i) probability that node at depth d = 0, 1, 2, . . .
is an internal node is α(1 + d)−β, α ∈ (0, 1), β ∈ [0, ∞). Smaller α im-
plies terminal node less likely to split. Smaller β increases the number of
terminal nodes. (ii) Uniform distribution used to choose which covariate
to be selected for the decision rule in an internal node. (iii) Uniform dis-
tribution for the value of the selected covariate for the decision rule in an
internal node.
• µij|Tj ∼ N(µµ, σ2
µ).
• σ2 ∼ νλ/χ2
ν. For binary outcomes σ2 ≡ 1.
• P(τ) ∝ 1.
Posterior
• For continuous outcomes – given ak, ˜Yik = Yik − ak reduces to a usual
BART model (Zhang et. al., 2007).
• For binary outcomes – data augmentation (Tanner & Wong, 1987; Albert
& Chib, 1993):
(Zik|Yik = 1) = max{N(Ga(Xik), 1), 0},
(Zik|Yik = 0) = min{N(Ga(Xik), 1), 0}.
˜Zik = Zik − ak reduces to a continuous BART model
Folded non-central t prior for τ2
• P(τ) ∝ 1 may have inappropriate effects on inferences especially when
K is small or when τ2 is close to 0 (Gelman, 2006). Folded non-central t
(FNCT) prior for τ2 produces better results.
• Decompose ak in equation (1) as
ak = ξηk ξ ∼ N(0, B2), ηk ∼ N(0, θ2). (2)
• Similar priors with normal riBART with addition of θ2 ∼ ef/χ2
e.
• Evaluate the resulting posterior distribution with B → ∞ and set e =
f = 0.5.
• Posterior draws follow normal riBART closely with ak = ξηk and τ =
|ξ|θ.
Implementation
1. Begin with an initial estimate of σ (σ = 1 for all iterations in binary
outcomes) and ak (typically, ak = 0) or ξ and ηk (typically ξ = 1 and
ηk = 0).
2. Subtract ak from Yik (or Zik for binary outcomes) to obtain ˜Yik (or ˜Zik)
3. Provide the outcomes ˜Yik (or ˜Zik) with the covariates Xik to any com-
puter package or program that is able to implement BART. Set the de-
grees of freedom for the prior distribution of σ to 100,000 and use the
initial estimate of σ from Step 1 as the initial σ estimate in the BART
program. Draw 100 posterior draws with 0 burn ins for the m Tj and
Mjs.
4. Extract m
j=1 g(Xik, Tj, Mj)| ˜Yik, σ of the 100th posterior draw from
the BART program.
5. Use m
j=1 g(Xik, Tj, Mj)| ˜Yik, σ in the posterior draws of σ, ak and τ
or ξ, ηk, and θ.
6. Repeat Steps 2-4 to obtain the desired amount of burn ins and posterior
draws.
Key idea
• Posterior draws for the random intercept can be done outside of BART.
• Step 2 – Run 100 posterior draws instead of 1 posterior draw because
most BART packages initialize all m Tjs with a single terminal node.
• Step 3 – Set degrees of freedom to 100,000 to force each draw of BART
to be close to the σ estimate we provide (For the binary outcome, σ = 1).
Simulation study
Goal: Investigate whether riBART will improve prediction performance
compared to BART on a correlated dataset.
• Xikq ∼ Uniform(0, 1), q = 1, . . . , 5.
• Ga(Xik) = 1.35[sin(πXik1Xik2)+2(Xik3 −0.5)2 −Xik4 −0.5Xik5]+ak,
where ak ∼ N(0, τ2).
• πGa
(Xik) = Φ[Ga(Xik)].
• Yik ∼ Bernoulli(πGa
(Xik)).
Table 1: Description of simulation scenarios with 95% coverage results for
the posterior draw of τ under normal and folded non-central t (FNCT) prior
riBART.
95% Coverage
Rep. measures No. of subjects τ Normal FNCT
Scenario 1 5 50 1 100% 0%
Scenario 2 20 100 1 100% 0%
Scenario 3 5 50 0.5 0% 0%
Scenario 4 20 100 0.5 37% 0%
• Implemented riBART using strategy outlined.
• 1,000 burn ins and 5,000 posterior draws.
• 200 simulations. For each simulation, fix nk, K, and τ based on scenario,
then generate Xikq, Ga(Xik), and Yik.
Figure 1: Histogram of the 200 AUC produced under normal riBART and
BART.
AUC
Frequency
0.82 0.84 0.86 0.88 0.90 0.92 0.94
0102030405060
BART
riBART
AUC
Frequency
0.75 0.80 0.85 0.90
01020304050
BART
riBART
AUC
Frequency
0.84 0.86 0.88 0.90 0.92
0102030405060
BART
riBART
AUC
Frequency
0.78 0.80 0.82 0.84
010203040506070
BART
riBART
Top left: Scenario 1; Top right: Scenario 2; Bottom left: Scenario 3;
Bottom right: Scenario 4.
• Similar AUCs produced for both the normal riBART and FNCT prior
riBART.
• Number of repeated measurements, nk, and the within subject correlation,
τ, increases ⇒ riBART improves prediction performance of BART.
Table 2: Mean of bias in τ for normal riBART and folded non-central t
(FNCT) prior riBART.
Normal FNCT
Scenario 1 -0.093 -0.836
Scenario 2 1.546 -0.419
Scenario 3 0.400 -0.388
Scenario 4 0.531 -0.421
• Bias calculated as τ
(l)
bias = ˆτ(l) − τ where ˆτ(l) is the empirical posterior
mean of τ in simulation l.
• Posterior mean of τ is biased under normal riBART and FNCT prior riB-
ART.
• If estimation of τ still desired, should use normal riBART because cover-
age is better.
Predicting driver stop
• 108 drivers, 3,795 turns of which 1,823 left turns.
• On average 35 turns per driver. Range: 8 to 139 turns.
• See Tan et. al. (2015) for details about dataset and data manipulations.
• Compare riBART, BART, logistic regression, and rilogistic regression.
• Estimate variance of AUC using linear approximation of AUC from Somer’s
D (Hanley & McNeil, 1982) to compute 95% CI.
Figure 2: Comparing AUC profile with 95% CI of riBART (incorporate
within-driver correlation), BART (ignore within-driver correlation), linear
logistic regression (assumes linearity and ignores complex interaction), and
random intercept logistic regression (incorporate within-driver correlation
but assumes linearity and ignores complex interaction).
−80 −60 −40 −20 0
0.60.70.80.91.0
Distance from reference (m)
AUC
BART
riBART
Logistic
riLogistic
Conclusion
• In application, use of riBART dramatically improves prediction of driver
stopping behavior ⇒ each driver’s propensity to stop should be esti-
mated and used in prediction models.
• Use riBART when
(i) average number of repeated measurements is large, around twenty
and
(ii) when within subject correlation τ is suspected to be high.
• Caution should be exercised when using riBART for inference.
• riBART implementation novel but computational burden is an issue.
• Future work - generalize to multiple linear random effects.
References
• Albert, J. and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data. Journal
of the American Statistical Association, 88, 669-679.
• Chipman, H.A., George, E.I., McCulloch, R.E. (2010). BART: Bayesian Additive Regression Trees. The
Annals of Applied Statistics, 4(1):266-298.
• Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (Comment on
Article by Browne and Draper). Bayesian Analysis, 1, 515-534.
• Hanley, J. and McNeil, B. (1982). The Meaning and Use of the Area under a Receiver Operating Charac-
teristic (ROC) Curve. Radiology, 143, 29-36.
• Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J., and Nel, A. (2015). A Bayesian regression
tree approach to identify the effect of nanoparticles’ properties on toxicity profiles. The Annals of Applied
Statistics, 9, 383-401.
• R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria.
• Sayer, J., Bogard, S., Buonarosa, M., LeBlanc, D., Funkhouser, D., Bao, S., Blankespoor, A., and Winkler,
C. (2011). Integrated Vehicle-Based Safety Systems Light-Vehicle Field Operational Test Key Findings
Report DOT HS 811 416. Retrieved August 26, 2015, from http://www.nhtsa.gov/DOT/NHTSA/
NVS/Crash%20Avoidance/Technical%20Publications/2011/811416.pdf
• Tan, Y., Elliott, M., and Flannagan, C. (2015). Development of a Real-time Prediction Model of Driver
Behavior at Intersections Using Kinematic Time Series Data. In JSM Proceedings, Transportation Statis-
tics Interest Group.
• Tanner, M. and Wong, W. (1987). The Calculation of Posterior Distributions by Data Augmentation.
Journal of the American Statistical Association, 82, 528-540.
• Zhang, S., Shih, Y., and M¨uller, P. (2007). A Spatially-adjusted Bayesian Additive Regression Tree Model
to Merge Two Datasets. Bayesian Analysis, 2, 611-634.
Acknowledgments
This work was supported jointly by Dr. Michael Elliott and an ATLAS Research Excellence Program project
awarded to Dr. Carol Flannagan. We would like to thank Dr. Jian Kang and Dr. Brisa S´anchez for their
suggestions.

More Related Content

What's hot

Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...IJMERJOURNAL
 
DriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesDriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesZubin Bhuyan
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Jae-kwang Kim
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysisJae-kwang Kim
 
Adesanya dissagregation of data corrected
Adesanya dissagregation of data correctedAdesanya dissagregation of data corrected
Adesanya dissagregation of data correctedAlexander Decker
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...hirokazutanaka
 
1987 . thermo economic functional analysis and optimization [frangopoulos]
1987 . thermo economic functional analysis and  optimization [frangopoulos]1987 . thermo economic functional analysis and  optimization [frangopoulos]
1987 . thermo economic functional analysis and optimization [frangopoulos]Wanderson Araujo
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causalityQuang Hoang
 
An Altering Distance Function in Fuzzy Metric Fixed Point Theorems
An Altering Distance Function in Fuzzy Metric Fixed Point TheoremsAn Altering Distance Function in Fuzzy Metric Fixed Point Theorems
An Altering Distance Function in Fuzzy Metric Fixed Point Theoremsijtsrd
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Stereographic Circular Normal Moment Distribution
Stereographic Circular Normal Moment DistributionStereographic Circular Normal Moment Distribution
Stereographic Circular Normal Moment Distributionmathsjournal
 

What's hot (19)

MNAR
MNARMNAR
MNAR
 
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
Parameter Estimation for the Weibul distribution model Using Least-Squares Me...
 
DriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesDriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spaces
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Some sampling techniques for big data analysis
Some sampling techniques for big data analysisSome sampling techniques for big data analysis
Some sampling techniques for big data analysis
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Adesanya dissagregation of data corrected
Adesanya dissagregation of data correctedAdesanya dissagregation of data corrected
Adesanya dissagregation of data corrected
 
kape_science
kape_sciencekape_science
kape_science
 
D0621619
D0621619D0621619
D0621619
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
 
1987 . thermo economic functional analysis and optimization [frangopoulos]
1987 . thermo economic functional analysis and  optimization [frangopoulos]1987 . thermo economic functional analysis and  optimization [frangopoulos]
1987 . thermo economic functional analysis and optimization [frangopoulos]
 
ICTAM-POSTERIV
ICTAM-POSTERIVICTAM-POSTERIV
ICTAM-POSTERIV
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causality
 
Co-integration
Co-integration Co-integration
Co-integration
 
An Altering Distance Function in Fuzzy Metric Fixed Point Theorems
An Altering Distance Function in Fuzzy Metric Fixed Point TheoremsAn Altering Distance Function in Fuzzy Metric Fixed Point Theorems
An Altering Distance Function in Fuzzy Metric Fixed Point Theorems
 
AN IMPROVED NONINVASIVE AND MULTIMODEL PSO ALGORITHM FOR EXTRACTING ARTIFACT...
AN IMPROVED NONINVASIVE AND MULTIMODEL PSO ALGORITHM FOR  EXTRACTING ARTIFACT...AN IMPROVED NONINVASIVE AND MULTIMODEL PSO ALGORITHM FOR  EXTRACTING ARTIFACT...
AN IMPROVED NONINVASIVE AND MULTIMODEL PSO ALGORITHM FOR EXTRACTING ARTIFACT...
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Stereographic Circular Normal Moment Distribution
Stereographic Circular Normal Moment DistributionStereographic Circular Normal Moment Distribution
Stereographic Circular Normal Moment Distribution
 
201977 1-1-3-pb
201977 1-1-3-pb201977 1-1-3-pb
201977 1-1-3-pb
 

Similar to MSSISS riBART 20160321

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDChristos Kallidonis
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Andrea Castelletti
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...Alexander Decker
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingUSC
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Naoki Hayashi
 
DCWP_CVPR2023.pptx
DCWP_CVPR2023.pptxDCWP_CVPR2023.pptx
DCWP_CVPR2023.pptx건영 박
 
Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsRonald Teo
 

Similar to MSSISS riBART 20160321 (20)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
201977 1-1-1-pb
201977 1-1-1-pb201977 1-1-1-pb
201977 1-1-1-pb
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Computing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCDComputing the masses of hyperons and charmed baryons from Lattice QCD
Computing the masses of hyperons and charmed baryons from Lattice QCD
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modeling
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
 
Input analysis
Input analysisInput analysis
Input analysis
 
DCWP_CVPR2023.pptx
DCWP_CVPR2023.pptxDCWP_CVPR2023.pptx
DCWP_CVPR2023.pptx
 
presentation_btp
presentation_btppresentation_btp
presentation_btp
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 

MSSISS riBART 20160321

  • 1. Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian additive regression trees Yaoyaun Vincent Tana (Email:vincetan@umich.edu), Carol Flannaganb , Michael Elliotta,c aUniversity of Michigan Department of Biostatistics, bUniversity of Michigan Transportation Research Institute, cInstitute for Social Research Background • Develop a model to help engineers developing driverless vehicles predict whether a human-driven vehicle would stop before executing a left turn at an intersection (Tan et. al., 2015). • Dataset – naturalistic driving data (Sayer et. al., 2011). • Preliminary work suggested that Bayesian additive regression trees (BART) produce a more stable prediction performance compared to Super Learner (which included elastic net, logistic regression, K-Nearest Neighbor, Gen- eralized Additive Models, mean of the outcomes, and BART). Issue • BART (Chipman et. al., 2010) developed assuming independent subject observations; but, our dataset consist of longitudinal observations. • Current literature: Zhang et. al. (2007) – used distributions more com- monly seen in spatial statistics to handle within subject correlation; Low- Kam et. al. (2015) – model too complicated for our problem. We propose ... • A simple framework to extend BART to longitudinal datasets. • Add a random intercept. Two alternative distributional assumptions: (i) normal and (ii) multiplication of two independent normal distributions which gives a folded non-central t prior on the within subject correlation parameter (Gelman, 2006). • Provide a strategy to easily implement riBART by making use of the ex- isting BART implementation packages in R (R Core Team, 2015) Random intercept BART (riBART) Bayesian additive regression trees • Power maintained when estimating non-linear associations and high-dimensional interactions. • Achieved by modeling the mean outcome given predictors using a sum of weak regression trees. Normal distribution random intercept Suppose continuous outcomes Yik and p covariates Xik, k = 1, . . . , K, i = 1, . . . , nk. k indexes the subjects and i indexes the observations within a subject. The riBART model is Yik = m j=1 g(Xik, Tj, Mj) + ak + ik, (1) • ik ∼ N(0, σ2), ak ∼ N(0, τ2), ak⊥ ik. • Tj is the jth binary regression tree structure. • Mj = {µ1j, . . . , µbjj} is the set of bj terminal node parameters associated with tree structure Tj. • ak is the random intercept we introduce to BART. • Binary outcomes – let Φ[.] be the c.d.f. of a standard normal, replace equation (1) as P(Yik = 1|Xik) = Φ[Ga(Xik)] Ga(Xik) = m j=1 g(Xik, Tj, Mj) + ak, ak ∼ N(0, τ2) Priors • P(Tj) – use three aspects: (i) probability that node at depth d = 0, 1, 2, . . . is an internal node is α(1 + d)−β, α ∈ (0, 1), β ∈ [0, ∞). Smaller α im- plies terminal node less likely to split. Smaller β increases the number of terminal nodes. (ii) Uniform distribution used to choose which covariate to be selected for the decision rule in an internal node. (iii) Uniform dis- tribution for the value of the selected covariate for the decision rule in an internal node. • µij|Tj ∼ N(µµ, σ2 µ). • σ2 ∼ νλ/χ2 ν. For binary outcomes σ2 ≡ 1. • P(τ) ∝ 1. Posterior • For continuous outcomes – given ak, ˜Yik = Yik − ak reduces to a usual BART model (Zhang et. al., 2007). • For binary outcomes – data augmentation (Tanner & Wong, 1987; Albert & Chib, 1993): (Zik|Yik = 1) = max{N(Ga(Xik), 1), 0}, (Zik|Yik = 0) = min{N(Ga(Xik), 1), 0}. ˜Zik = Zik − ak reduces to a continuous BART model Folded non-central t prior for τ2 • P(τ) ∝ 1 may have inappropriate effects on inferences especially when K is small or when τ2 is close to 0 (Gelman, 2006). Folded non-central t (FNCT) prior for τ2 produces better results. • Decompose ak in equation (1) as ak = ξηk ξ ∼ N(0, B2), ηk ∼ N(0, θ2). (2) • Similar priors with normal riBART with addition of θ2 ∼ ef/χ2 e. • Evaluate the resulting posterior distribution with B → ∞ and set e = f = 0.5. • Posterior draws follow normal riBART closely with ak = ξηk and τ = |ξ|θ. Implementation 1. Begin with an initial estimate of σ (σ = 1 for all iterations in binary outcomes) and ak (typically, ak = 0) or ξ and ηk (typically ξ = 1 and ηk = 0). 2. Subtract ak from Yik (or Zik for binary outcomes) to obtain ˜Yik (or ˜Zik) 3. Provide the outcomes ˜Yik (or ˜Zik) with the covariates Xik to any com- puter package or program that is able to implement BART. Set the de- grees of freedom for the prior distribution of σ to 100,000 and use the initial estimate of σ from Step 1 as the initial σ estimate in the BART program. Draw 100 posterior draws with 0 burn ins for the m Tj and Mjs. 4. Extract m j=1 g(Xik, Tj, Mj)| ˜Yik, σ of the 100th posterior draw from the BART program. 5. Use m j=1 g(Xik, Tj, Mj)| ˜Yik, σ in the posterior draws of σ, ak and τ or ξ, ηk, and θ. 6. Repeat Steps 2-4 to obtain the desired amount of burn ins and posterior draws. Key idea • Posterior draws for the random intercept can be done outside of BART. • Step 2 – Run 100 posterior draws instead of 1 posterior draw because most BART packages initialize all m Tjs with a single terminal node. • Step 3 – Set degrees of freedom to 100,000 to force each draw of BART to be close to the σ estimate we provide (For the binary outcome, σ = 1). Simulation study Goal: Investigate whether riBART will improve prediction performance compared to BART on a correlated dataset. • Xikq ∼ Uniform(0, 1), q = 1, . . . , 5. • Ga(Xik) = 1.35[sin(πXik1Xik2)+2(Xik3 −0.5)2 −Xik4 −0.5Xik5]+ak, where ak ∼ N(0, τ2). • πGa (Xik) = Φ[Ga(Xik)]. • Yik ∼ Bernoulli(πGa (Xik)). Table 1: Description of simulation scenarios with 95% coverage results for the posterior draw of τ under normal and folded non-central t (FNCT) prior riBART. 95% Coverage Rep. measures No. of subjects τ Normal FNCT Scenario 1 5 50 1 100% 0% Scenario 2 20 100 1 100% 0% Scenario 3 5 50 0.5 0% 0% Scenario 4 20 100 0.5 37% 0% • Implemented riBART using strategy outlined. • 1,000 burn ins and 5,000 posterior draws. • 200 simulations. For each simulation, fix nk, K, and τ based on scenario, then generate Xikq, Ga(Xik), and Yik. Figure 1: Histogram of the 200 AUC produced under normal riBART and BART. AUC Frequency 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0102030405060 BART riBART AUC Frequency 0.75 0.80 0.85 0.90 01020304050 BART riBART AUC Frequency 0.84 0.86 0.88 0.90 0.92 0102030405060 BART riBART AUC Frequency 0.78 0.80 0.82 0.84 010203040506070 BART riBART Top left: Scenario 1; Top right: Scenario 2; Bottom left: Scenario 3; Bottom right: Scenario 4. • Similar AUCs produced for both the normal riBART and FNCT prior riBART. • Number of repeated measurements, nk, and the within subject correlation, τ, increases ⇒ riBART improves prediction performance of BART. Table 2: Mean of bias in τ for normal riBART and folded non-central t (FNCT) prior riBART. Normal FNCT Scenario 1 -0.093 -0.836 Scenario 2 1.546 -0.419 Scenario 3 0.400 -0.388 Scenario 4 0.531 -0.421 • Bias calculated as τ (l) bias = ˆτ(l) − τ where ˆτ(l) is the empirical posterior mean of τ in simulation l. • Posterior mean of τ is biased under normal riBART and FNCT prior riB- ART. • If estimation of τ still desired, should use normal riBART because cover- age is better. Predicting driver stop • 108 drivers, 3,795 turns of which 1,823 left turns. • On average 35 turns per driver. Range: 8 to 139 turns. • See Tan et. al. (2015) for details about dataset and data manipulations. • Compare riBART, BART, logistic regression, and rilogistic regression. • Estimate variance of AUC using linear approximation of AUC from Somer’s D (Hanley & McNeil, 1982) to compute 95% CI. Figure 2: Comparing AUC profile with 95% CI of riBART (incorporate within-driver correlation), BART (ignore within-driver correlation), linear logistic regression (assumes linearity and ignores complex interaction), and random intercept logistic regression (incorporate within-driver correlation but assumes linearity and ignores complex interaction). −80 −60 −40 −20 0 0.60.70.80.91.0 Distance from reference (m) AUC BART riBART Logistic riLogistic Conclusion • In application, use of riBART dramatically improves prediction of driver stopping behavior ⇒ each driver’s propensity to stop should be esti- mated and used in prediction models. • Use riBART when (i) average number of repeated measurements is large, around twenty and (ii) when within subject correlation τ is suspected to be high. • Caution should be exercised when using riBART for inference. • riBART implementation novel but computational burden is an issue. • Future work - generalize to multiple linear random effects. References • Albert, J. and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association, 88, 669-679. • Chipman, H.A., George, E.I., McCulloch, R.E. (2010). BART: Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4(1):266-298. • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (Comment on Article by Browne and Draper). Bayesian Analysis, 1, 515-534. • Hanley, J. and McNeil, B. (1982). The Meaning and Use of the Area under a Receiver Operating Charac- teristic (ROC) Curve. Radiology, 143, 29-36. • Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J., and Nel, A. (2015). A Bayesian regression tree approach to identify the effect of nanoparticles’ properties on toxicity profiles. The Annals of Applied Statistics, 9, 383-401. • R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. • Sayer, J., Bogard, S., Buonarosa, M., LeBlanc, D., Funkhouser, D., Bao, S., Blankespoor, A., and Winkler, C. (2011). Integrated Vehicle-Based Safety Systems Light-Vehicle Field Operational Test Key Findings Report DOT HS 811 416. Retrieved August 26, 2015, from http://www.nhtsa.gov/DOT/NHTSA/ NVS/Crash%20Avoidance/Technical%20Publications/2011/811416.pdf • Tan, Y., Elliott, M., and Flannagan, C. (2015). Development of a Real-time Prediction Model of Driver Behavior at Intersections Using Kinematic Time Series Data. In JSM Proceedings, Transportation Statis- tics Interest Group. • Tanner, M. and Wong, W. (1987). The Calculation of Posterior Distributions by Data Augmentation. Journal of the American Statistical Association, 82, 528-540. • Zhang, S., Shih, Y., and M¨uller, P. (2007). A Spatially-adjusted Bayesian Additive Regression Tree Model to Merge Two Datasets. Bayesian Analysis, 2, 611-634. Acknowledgments This work was supported jointly by Dr. Michael Elliott and an ATLAS Research Excellence Program project awarded to Dr. Carol Flannagan. We would like to thank Dr. Jian Kang and Dr. Brisa S´anchez for their suggestions.