For the discovery of a regression relationship between y and x, a vector of p potential predictors, the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches. To exploit the potential monotonicity of the predictors, we introduce mBART, a constrained version of BART that incorporates monotonicity with a multivariate basis of monotone trees, thereby avoiding the further confines of a full parametric form. Using mBART to estimate such effects yields (i) function estimates that are smoother and more interpretable, (ii) better out-of-sample predictive performance and (iii) less post-data uncertainty. By using mBART to simultaneously estimate both the increasing and the decreasing regions of a predictor, mBART opens up a new approach to the discovery and estimation of the decomposition of a function into its monotone components.
(This is joint work with H. Chipman, R. McCulloch and T. Shively).
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
This talk considers parameter estimation in the two-component symmetric Gaussian mixtures in $d$ dimensions with $n$ independent samples. We show that, even in the absence of any separation between components, with high probability, theEMalgorithm converges to an estimate in at most $O(\sqrt{n} \log n)$ iterations, which is within $O((d/n)^{1/4} (\log n)^{3/4})$ in Euclidean distance to the true parameter, provided that $n=\Omega(d \log^2 d)$. This is within a logarithmic factor to the minimax optimal rate of $(d/n)^{1/4}$. The proof relies on establishing (a) a non-linear contraction behavior of the populationEMmapping (b) concentration of theEMtrajectory near the population version, to prove that random initialization works. This is in contrast to previous analysis in Daskalakis, Tzamos, and Zampetakis (2017) that requires sample splitting and restart theEMiteration after normalization, and Balakrishnan, Wainwright, and Yu (2017) that requires strong conditions on both the separation of the components and the quality of the initialization. Furthermore, we obtain the asymptotic efficient estimation when the signal is stronger than the minimax rate.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
This document provides an introduction to random matrix theory and its applications in machine learning. It discusses several classical random matrix ensembles like the Gaussian Orthogonal Ensemble (GOE) and Wishart ensemble. These ensembles are used to model phenomena in fields like number theory, physics, and machine learning. Specifically, the GOE is used to model Hamiltonians of heavy nuclei, while the Wishart ensemble relates to the Hessian of least squares problems. The tutorial will cover applications of random matrix theory to analyzing loss landscapes, numerical algorithms, and the generalization properties of machine learning models.
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning.
Part 3 covers: 1. Motivation: Average-case versus worst-case in high dimensions 2. Algorithm halting times (runtimes) 3. Outlook
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
This talk considers parameter estimation in the two-component symmetric Gaussian mixtures in $d$ dimensions with $n$ independent samples. We show that, even in the absence of any separation between components, with high probability, theEMalgorithm converges to an estimate in at most $O(\sqrt{n} \log n)$ iterations, which is within $O((d/n)^{1/4} (\log n)^{3/4})$ in Euclidean distance to the true parameter, provided that $n=\Omega(d \log^2 d)$. This is within a logarithmic factor to the minimax optimal rate of $(d/n)^{1/4}$. The proof relies on establishing (a) a non-linear contraction behavior of the populationEMmapping (b) concentration of theEMtrajectory near the population version, to prove that random initialization works. This is in contrast to previous analysis in Daskalakis, Tzamos, and Zampetakis (2017) that requires sample splitting and restart theEMiteration after normalization, and Balakrishnan, Wainwright, and Yu (2017) that requires strong conditions on both the separation of the components and the quality of the initialization. Furthermore, we obtain the asymptotic efficient estimation when the signal is stronger than the minimax rate.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
This document provides an introduction to random matrix theory and its applications in machine learning. It discusses several classical random matrix ensembles like the Gaussian Orthogonal Ensemble (GOE) and Wishart ensemble. These ensembles are used to model phenomena in fields like number theory, physics, and machine learning. Specifically, the GOE is used to model Hamiltonians of heavy nuclei, while the Wishart ensemble relates to the Hessian of least squares problems. The tutorial will cover applications of random matrix theory to analyzing loss landscapes, numerical algorithms, and the generalization properties of machine learning models.
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning.
Part 3 covers: 1. Motivation: Average-case versus worst-case in high dimensions 2. Algorithm halting times (runtimes) 3. Outlook
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
Deep learning models with millions or billions of parameters should overfit according to classical theory, but they do not. The emerging theory of double descent seeks to explain why larger neural networks can generalize well. Random matrix theory provides a tractable framework to model double descent through random feature models, where the number of random features controls model capacity. In the high-dimensional limit, the test error of random feature regression exhibits a double descent shape that can be computed analytically.
An Altering Distance Function in Fuzzy Metric Fixed Point Theoremsijtsrd
The aim of this paper is to improve conditions proposed in recently published fixed point results for complete and compact fuzzy metric spaces. For this purpose, the altering distance functions are used. Moreover, in some of the results presented the class of t-norms is extended by using the theory of countable extensions of t-norms. Dr C Vijender"An Altering Distance Function in Fuzzy Metric Fixed Point Theorems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-5 , August 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2293.pdf http://www.ijtsrd.com/mathemetics/other/2293/an-altering-distance-function-in-fuzzy-metric-fixed-point-theorems/dr-c-vijender
Some real life data analysis on stationary and non-stationary Time SeriesAjay Bidyarthy
This document discusses methods for constructing bootstrap confidence intervals for time series data. It first describes the block bootstrap method for preserving time series structure when resampling data. It then explains two specific bootstrap confidence interval methods - the bootstrap percentile method and bootstrap-t method. For each method, it provides the step-by-step procedures for generating bootstrap samples and determining the confidence intervals based on those samples without assuming a specific distribution for the time series data.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
Understanding variable importances in forests of randomized treesGilles Louppe
This document discusses variable importances in random forests. It begins by introducing random forests and their strengths and weaknesses, specifically their loss of interpretability. It then discusses how variable importances can help recover interpretability by providing two main importance measures: mean decrease in impurity (MDI) and mean decrease in accuracy (MDA). The document focuses on MDI and presents three key results: 1) variable importances provide a three-level decomposition of information about the output, 2) importances only depend on relevant variables, and 3) most properties are lost when K > 1 in non-totally randomized trees.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
This document discusses various machine learning techniques including:
1. Tree pruning involves first growing a large tree and then pruning branches that do not improve the objective function. This prevents early stopping.
2. Boosting uses multiple weak learners sequentially to get an additive model that approximates the regression function. It combines many simple models to create a powerful ensemble model.
3. Unsupervised learning techniques like principal component analysis and clustering are used to find patterns in data without an outcome variable. These include reducing dimensions and partitioning data into subgroups.
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
Bias-variance decomposition in Random ForestsGilles Louppe
This document discusses bias-variance decomposition in random forests. It explains that combining predictions from multiple randomized models can achieve better results than a single model by reducing variance. Random forests work by constructing decision trees on randomly selected subsets of data and features, averaging their predictions. This randomization increases bias but reduces variance, providing an effective bias-variance tradeoff. The document provides theorems on how the expected generalization error of random forests and individual trees can be decomposed into noise, bias, and variance components.
ABC with data cloning for MLE in state space modelsUmberto Picchini
An application of the "data cloning" method for parameter estimation via MLE aided by Approximate Bayesian Computation. The relevant paper is http://arxiv.org/abs/1505.06318
This document discusses image transformation, which represents an image as a series summation of unitary matrices. It defines unitary and orthogonal matrices and describes how an arbitrary image can be represented as a series of orthonormal basis vectors. Separable unitary transformations are introduced to reduce computational complexity from O(N4) to O(2N3) by applying the same unitary matrix separately along rows and columns. An example demonstrates computing the transformed image and basis images given an input image and unitary matrix. Inverse transformation recovers the original image from the transformed image and basis vectors.
L1-based compression of random forest modelSlideArnaud Joly
Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
Link to the paper http://orbi.ulg.ac.be/handle/2268/124834
The document discusses using random forest algorithms to model and map soil organic carbon. It provides instructions on fitting a random forest regression model in R using sample soil data, assessing model performance, and using the model to predict and map soil organic carbon across a study area. Key steps include randomly splitting the data into training and testing sets, fitting a random forest model with 1000 trees, calculating error metrics like RMSE to evaluate the model, using the model to predict SOC for the testing data, plotting predicted vs observed values, and finally using the fitted model to predict and plot a soil organic carbon map for the study area.
This document discusses blended dynamics theorem and its applications for networked dynamic systems. It begins by introducing blended dynamics theorem, which states that under strong coupling, heterogeneous multi-agent systems will converge to the behavior of the blended dynamics. Several examples of applying blended dynamics to problems are then presented, including economic power dispatch for smart grids, network size estimation, and distributed control of multi-channel plants. The document discusses how blended dynamics allows for distributed operation, decentralized design, and plug-and-play capabilities in multi-agent systems.
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
1. The document describes supervised learning problems, specifically linear regression with one feature. It defines key concepts like the hypothesis function, cost function, and gradient descent algorithm.
2. A data set with one input feature and one output is defined. The goal is to learn a linear function that maps the input to the output to best fit the training data.
3. The hypothesis function is defined as h(x) = θ0 + θ1x, where θ0 and θ1 are parameters to be estimated. Gradient descent is used to minimize the cost function and find the optimal θ values.
This document proposes an approximate Bayesian inference method for estimating propensity scores under nonresponse. It involves treating the estimating equations as random variables and assigning a prior distribution to the transformed parameters. Samples are drawn from the posterior distribution of the parameters given the observed data to make inferences. The method is shown to be asymptotically consistent and confidence regions can be constructed from the posterior samples. Extensions are discussed to incorporate auxiliary variables and perform Bayesian model selection by assigning a spike-and-slab prior over the model parameters.
This document summarizes basic 2D and 3D transformations including translation, rotation, scaling, and perspective transformations. It discusses how multiple transformations can be represented by a single 4x4 transformation matrix and how the order of transformations is important as matrix operations are non-commutative. It also provides an overview of the image formation process through a pinhole camera model and how this relates cartesian and homogeneous coordinate systems.
Markov Chain Monte Carlo (MCMC) methods use Markov chains to sample from probability distributions for use in Monte Carlo simulations. The Metropolis-Hastings algorithm proposes transitions to new states in the chain and either accepts or rejects those states based on a probability calculation, allowing it to sample from complex, high-dimensional distributions. The Gibbs sampler is a special case of MCMC where each variable is updated conditional on the current values of the other variables, ensuring all proposed moves are accepted. These MCMC methods allow approximating integrals that are difficult to compute directly.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses deconvolution methods and their applications in analyzing gamma-ray spectra. It provides background on several deconvolution algorithms, including Tikhonov-Miller regularization, Van Cittert, Janson, Gold, Richardson-Lucy, and Muller algorithms. These algorithms aim to improve spectral resolution by mathematically removing instrument smearing effects. They are based on solving systems of linear equations using direct or iterative methods, with regularization techniques to produce stable solutions. Examples are given to illustrate the sensitivity of non-regularized solutions to noise and the need for regularization.
This document summarizes the Bayesian Additive Regression Trees (BART) model and the Monotone BART (MBART) extension. BART approximates an unknown function using an ensemble of regression trees with a regularization prior. It connects to ideas in Bayesian nonparametrics, dynamic random basis elements, and gradient boosting. The document outlines the BART MCMC algorithm and how it can provide automatic uncertainty quantification and variable selection. It then introduces MBART, which constrains trees to be monotonic, and describes an MCMC algorithm for fitting MBART models. Examples illustrate BART and MBART fits to simulated monotonic and non-monotonic functions.
The document discusses various methods for modeling input distributions in simulation models, including trace-driven simulation, empirical distributions, and fitting theoretical distributions to real data. It provides examples of several continuous and discrete probability distributions commonly used in simulation, including the exponential, normal, gamma, Weibull, binomial, and Poisson distributions. Key parameters and properties of each distribution are defined. Methods for selecting an appropriate input distribution based on summary statistics of real data are also presented.
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
Deep learning models with millions or billions of parameters should overfit according to classical theory, but they do not. The emerging theory of double descent seeks to explain why larger neural networks can generalize well. Random matrix theory provides a tractable framework to model double descent through random feature models, where the number of random features controls model capacity. In the high-dimensional limit, the test error of random feature regression exhibits a double descent shape that can be computed analytically.
An Altering Distance Function in Fuzzy Metric Fixed Point Theoremsijtsrd
The aim of this paper is to improve conditions proposed in recently published fixed point results for complete and compact fuzzy metric spaces. For this purpose, the altering distance functions are used. Moreover, in some of the results presented the class of t-norms is extended by using the theory of countable extensions of t-norms. Dr C Vijender"An Altering Distance Function in Fuzzy Metric Fixed Point Theorems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-5 , August 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2293.pdf http://www.ijtsrd.com/mathemetics/other/2293/an-altering-distance-function-in-fuzzy-metric-fixed-point-theorems/dr-c-vijender
Some real life data analysis on stationary and non-stationary Time SeriesAjay Bidyarthy
This document discusses methods for constructing bootstrap confidence intervals for time series data. It first describes the block bootstrap method for preserving time series structure when resampling data. It then explains two specific bootstrap confidence interval methods - the bootstrap percentile method and bootstrap-t method. For each method, it provides the step-by-step procedures for generating bootstrap samples and determining the confidence intervals based on those samples without assuming a specific distribution for the time series data.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
Understanding variable importances in forests of randomized treesGilles Louppe
This document discusses variable importances in random forests. It begins by introducing random forests and their strengths and weaknesses, specifically their loss of interpretability. It then discusses how variable importances can help recover interpretability by providing two main importance measures: mean decrease in impurity (MDI) and mean decrease in accuracy (MDA). The document focuses on MDI and presents three key results: 1) variable importances provide a three-level decomposition of information about the output, 2) importances only depend on relevant variables, and 3) most properties are lost when K > 1 in non-totally randomized trees.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
This document discusses various machine learning techniques including:
1. Tree pruning involves first growing a large tree and then pruning branches that do not improve the objective function. This prevents early stopping.
2. Boosting uses multiple weak learners sequentially to get an additive model that approximates the regression function. It combines many simple models to create a powerful ensemble model.
3. Unsupervised learning techniques like principal component analysis and clustering are used to find patterns in data without an outcome variable. These include reducing dimensions and partitioning data into subgroups.
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
Bias-variance decomposition in Random ForestsGilles Louppe
This document discusses bias-variance decomposition in random forests. It explains that combining predictions from multiple randomized models can achieve better results than a single model by reducing variance. Random forests work by constructing decision trees on randomly selected subsets of data and features, averaging their predictions. This randomization increases bias but reduces variance, providing an effective bias-variance tradeoff. The document provides theorems on how the expected generalization error of random forests and individual trees can be decomposed into noise, bias, and variance components.
ABC with data cloning for MLE in state space modelsUmberto Picchini
An application of the "data cloning" method for parameter estimation via MLE aided by Approximate Bayesian Computation. The relevant paper is http://arxiv.org/abs/1505.06318
This document discusses image transformation, which represents an image as a series summation of unitary matrices. It defines unitary and orthogonal matrices and describes how an arbitrary image can be represented as a series of orthonormal basis vectors. Separable unitary transformations are introduced to reduce computational complexity from O(N4) to O(2N3) by applying the same unitary matrix separately along rows and columns. An example demonstrates computing the transformed image and basis images given an input image and unitary matrix. Inverse transformation recovers the original image from the transformed image and basis vectors.
L1-based compression of random forest modelSlideArnaud Joly
Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
Link to the paper http://orbi.ulg.ac.be/handle/2268/124834
The document discusses using random forest algorithms to model and map soil organic carbon. It provides instructions on fitting a random forest regression model in R using sample soil data, assessing model performance, and using the model to predict and map soil organic carbon across a study area. Key steps include randomly splitting the data into training and testing sets, fitting a random forest model with 1000 trees, calculating error metrics like RMSE to evaluate the model, using the model to predict SOC for the testing data, plotting predicted vs observed values, and finally using the fitted model to predict and plot a soil organic carbon map for the study area.
This document discusses blended dynamics theorem and its applications for networked dynamic systems. It begins by introducing blended dynamics theorem, which states that under strong coupling, heterogeneous multi-agent systems will converge to the behavior of the blended dynamics. Several examples of applying blended dynamics to problems are then presented, including economic power dispatch for smart grids, network size estimation, and distributed control of multi-channel plants. The document discusses how blended dynamics allows for distributed operation, decentralized design, and plug-and-play capabilities in multi-agent systems.
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
1. The document describes supervised learning problems, specifically linear regression with one feature. It defines key concepts like the hypothesis function, cost function, and gradient descent algorithm.
2. A data set with one input feature and one output is defined. The goal is to learn a linear function that maps the input to the output to best fit the training data.
3. The hypothesis function is defined as h(x) = θ0 + θ1x, where θ0 and θ1 are parameters to be estimated. Gradient descent is used to minimize the cost function and find the optimal θ values.
This document proposes an approximate Bayesian inference method for estimating propensity scores under nonresponse. It involves treating the estimating equations as random variables and assigning a prior distribution to the transformed parameters. Samples are drawn from the posterior distribution of the parameters given the observed data to make inferences. The method is shown to be asymptotically consistent and confidence regions can be constructed from the posterior samples. Extensions are discussed to incorporate auxiliary variables and perform Bayesian model selection by assigning a spike-and-slab prior over the model parameters.
This document summarizes basic 2D and 3D transformations including translation, rotation, scaling, and perspective transformations. It discusses how multiple transformations can be represented by a single 4x4 transformation matrix and how the order of transformations is important as matrix operations are non-commutative. It also provides an overview of the image formation process through a pinhole camera model and how this relates cartesian and homogeneous coordinate systems.
Markov Chain Monte Carlo (MCMC) methods use Markov chains to sample from probability distributions for use in Monte Carlo simulations. The Metropolis-Hastings algorithm proposes transitions to new states in the chain and either accepts or rejects those states based on a probability calculation, allowing it to sample from complex, high-dimensional distributions. The Gibbs sampler is a special case of MCMC where each variable is updated conditional on the current values of the other variables, ensuring all proposed moves are accepted. These MCMC methods allow approximating integrals that are difficult to compute directly.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses deconvolution methods and their applications in analyzing gamma-ray spectra. It provides background on several deconvolution algorithms, including Tikhonov-Miller regularization, Van Cittert, Janson, Gold, Richardson-Lucy, and Muller algorithms. These algorithms aim to improve spectral resolution by mathematically removing instrument smearing effects. They are based on solving systems of linear equations using direct or iterative methods, with regularization techniques to produce stable solutions. Examples are given to illustrate the sensitivity of non-regularized solutions to noise and the need for regularization.
This document summarizes the Bayesian Additive Regression Trees (BART) model and the Monotone BART (MBART) extension. BART approximates an unknown function using an ensemble of regression trees with a regularization prior. It connects to ideas in Bayesian nonparametrics, dynamic random basis elements, and gradient boosting. The document outlines the BART MCMC algorithm and how it can provide automatic uncertainty quantification and variable selection. It then introduces MBART, which constrains trees to be monotonic, and describes an MCMC algorithm for fitting MBART models. Examples illustrate BART and MBART fits to simulated monotonic and non-monotonic functions.
The document discusses various methods for modeling input distributions in simulation models, including trace-driven simulation, empirical distributions, and fitting theoretical distributions to real data. It provides examples of several continuous and discrete probability distributions commonly used in simulation, including the exponential, normal, gamma, Weibull, binomial, and Poisson distributions. Key parameters and properties of each distribution are defined. Methods for selecting an appropriate input distribution based on summary statistics of real data are also presented.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
This talk gives an introduction to tree-based methods, both from a theoretical and practical point of view. It covers decision trees, random forests and boosting estimators, along with concrete examples based on Scikit-Learn about how they work, when they work and why they work.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
Understanding Random Forests: From Theory to PracticeGilles Louppe
This document provides an overview of random forests and their implementation. It begins with motivating random forests as a way to reduce variance in decision trees. It then discusses growing and interpreting random forests through variable importances. The document presents theoretical results on the decomposition and properties of variable importances. It concludes by describing the efficient implementation of random forests in scikit-learn, including its modular design and optimizations for speed.
This document provides a summary of Bayesian phylogenetic inference and Markov chain Monte Carlo (MCMC) methods. It begins with an introduction to probability distributions and stochastic processes relevant to phylogenetic modeling. It then discusses how Bayesian inference is applied to phylogenetics by combining prior distributions on tree topologies and other model parameters with the likelihood of the data to obtain posterior distributions. MCMC methods like the Metropolis-Hastings algorithm are introduced as a way to sample from these posterior distributions. Issues around convergence, mixing, and tuning MCMC proposals are also covered.
I. Random subspace with trees is an ensemble method for feature selection under memory constraints where each tree is grown from a random subset of features of size q.
II. Sequential random subspace improves upon random subspace by filling part of each random subset with previously identified relevant features, speeding up convergence while maintaining asymptotic guarantees.
III. Experiments on benchmark datasets show sequential random subspace more accurately ranks features and achieves better prediction performance than random subspace, especially with large numbers of irrelevant features.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document provides an overview of generalized linear models (GLMs) and maximum likelihood estimation (MLE).
It discusses the exponential family distribution framework for GLMs, which allows the use of the same tools of inference across different distributions. It presents examples of link functions and canonical forms for the Gaussian and Poisson distributions.
The document also covers likelihood theory and how to calculate parameter estimates and their uncertainty through taking derivatives of the log-likelihood function. It introduces MLE as a method for finding the parameter values that maximize the likelihood of observing the data. Computational estimation of GLMs is performed through an iterative least squares method using weights from the distributions' Fisher information.
The document discusses Markov chain Monte Carlo (MCMC) methods for posterior simulation. MCMC methods generate dependent samples from the posterior distribution using iterative sampling algorithms like the Metropolis algorithm and Gibbs sampler. The Metropolis algorithm uses an accept-reject rule to propose new samples from a jumping distribution and either accepts or rejects them, while the Gibbs sampler directly samples from conditional posterior distributions one parameter at a time. Both algorithms are proven to converge to the true posterior distribution given enough iterations. The document provides details on how to implement the Metropolis and Gibbs sampling algorithms.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
This document summarizes a presentation on controlled sequential Monte Carlo. It discusses state space models, sequential Monte Carlo, and particle marginal Metropolis-Hastings for parameter inference. Controlled sequential Monte Carlo is proposed to lower the variance of the marginal likelihood estimator compared to standard sequential Monte Carlo, improving the performance of parameter inference methods. The method is illustrated on a neuroscience example where it reduces variance for different particle sizes.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
My talk at the International Conference on Monte Carlo Methods and Applications (MCM2032) related to advances in mathematical aspects of stochastic simulation and Monte Carlo methods at Sorbonne Université June 28, 2023, about my recent works (i) "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://doi.org/10.1080/14697688.2022.2135455), and (ii) "Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities" (link: https://arxiv.org/abs/2003.05708).
The document discusses machine learning techniques for multivariate data analysis using the TMVA toolkit. It describes several common classification problems in high energy physics (HEP) and summarizes several machine learning algorithms implemented in TMVA for supervised learning, including rectangular cut optimization, likelihood methods, neural networks, boosted decision trees, support vector machines and rule ensembles. It also discusses challenges like nonlinear correlations between input variables and techniques for data preprocessing and decorrelation.
Gaussian process regression is a non-parametric Bayesian approach for supervised learning problems. It can be used to model an unknown function by specifying a prior directly over functions, such that the posterior incorporates the constraints from the training data. The kernel trick allows specifying an infinite-dimensional feature space without explicitly defining features. This allows selecting valid covariance functions that implicitly define features and incorporate prior knowledge about the solution.
Introduction to Machine Learning Lecturesssuserfece35
This lecture discusses ensemble methods in machine learning. It introduces bagging, which trains multiple models on random subsets of the training data and averages their predictions, in order to reduce variance and prevent overfitting. Bagging is effective because it decreases the correlation between predictions. Random forests apply bagging to decision trees while also introducing more randomness by selecting a random subset of features to consider at each node. The next lecture will cover boosting, which aims to reduce bias by training models sequentially to focus on examples previously misclassified.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
1) The document presents a statistical modeling approach called targeted smooth Bayesian causal forests (tsbcf) to smoothly estimate heterogeneous treatment effects over gestational age using observational data from early medical abortion regimens.
2) The tsbcf method extends Bayesian additive regression trees (BART) to estimate treatment effects that evolve smoothly over gestational age, while allowing for heterogeneous effects across patient subgroups.
3) The tsbcf analysis of early medical abortion regimen data found the simultaneous administration to be similarly effective overall to the interval administration, but identified some patient subgroups where effectiveness may vary more over gestational age.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
This document discusses difference-in-differences (DiD) analysis, a quasi-experimental method used to estimate treatment effects. The author notes that while widely applicable, DiD relies on strong assumptions about the counterfactual. She recommends approaches like matching on observed variables between similar populations, thoughtfully specifying regression models to adjust for confounding factors, testing for parallel pre-treatment trends under different assumptions, and considering more complex models that allow for different types of changes over time. The overall message is that DiD requires careful consideration and testing of its underlying assumptions to draw valid causal conclusions.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
This document summarizes a simulation study evaluating causal inference methods for assessing the effects of opioid and gun policies. The study used real US state-level data to simulate the adoption of policies by some states and estimated the effects using different statistical models. It found that with fewer adopting states, type 1 error rates were too high, and most models lacked power. It recommends using cluster-robust standard errors and lagged outcomes to improve model performance. The study aims to help identify best practices for policy evaluation studies.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This document discusses various types of academic writing and provides tips for effective academic writing. It outlines common academic writing formats such as journal papers, books, and reports. It also lists writing necessities like having a clear purpose, understanding your audience, using proper grammar and being concise. The document cautions against plagiarism and not proofreading. It provides additional dos and don'ts for writing, such as using simple language and avoiding filler words. Overall, the key message is that academic writing requires selling your ideas effectively to the reader.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Gender and Mental Health - Counselling and Family Therapy Applications and In...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monotonicity Discovery with MBART, Ed George, April 29, 2019
1. Multidimensional Monotonicity Discovery
with mBART
Ed George
Wharton, University of Pennsylvania
edgeorge@upenn.edu
Collaborations with:
Hugh Chipman (Acadia)
Rob McCulloch (Arizona State)
Tom Shively (UT Austin)
Bayesian, Fiducial, and Frequentist Conference
Duke University
April 29, 2019
1 / 45
2. Plan
I) Review BART
II) Introduce Monotone BART: mBART
III) Monotonicity Discovery with mBART
2 / 45
3. Part I. BART (Bayesian Additive Regression Trees)
Data: n observations of y and x = (x1, ..., xp)
Suppose: Y = f (x) + , symmetric with mean 0
Bayesian Ensemble Idea: Approximate unknown f (x) by the form
f (x) = g(x; θ1) + g(x; θ2) + ... + g(x; θm)
θ1, θ2, . . . , θm iid ∼ π(θ)
and use the posterior of f given y for inference.
BART is obtained when each g(x; θj ) is a regression tree.
Key calibration: Using y, set π(θ) so that Var(f ) ≈ Var(y).
3 / 45
5. Bayesian CART: Just add a prior π(M, T)
Bayesian CART Model Search
(Chipman, George, McCulloch 1998)
π(M, T) = π(M | T)π(T)
π(M | T) : (µ1, µ2, . . . , µb) ∼ Nb(0, τ2I)
π(T): Stochastic process to generate tree skeleton plus uniform
prior on splitting variables and splitting rules.
Closed form for π(T | y) facilitates MCMC stochastic search for
promising trees.
5 / 45
6. Moving on to BART
Bayesian Additive Regression Trees
(Chipman, George, McCulloch 2010)
The BART ensemble model
Y = g(x; T1, M1)+g(x; T2, M2)+. . .+g(x; Tm, Mm)+σz, z ∼ N(0, 1)
Each (Ti , Mi ) identifies a single tree.
E(Y | x, T1, M1, . . . , Tm, Mm) is the sum of m bottom node µ’s,
one from each tree.
Number of trees m can be much larger than sample size n.
g(x; T1, M1), g(x; T2, M2), ..., g(x; Tm, Mm) is a highly redundant
“over-complete basis” with many many parameters.
6 / 45
7. Complete the Model with a Regularization Prior
π((T1, M1), (T2, M2), . . . , (Tm, Mm), σ)
π applies the Bayesian CART prior to each (Tj , Mj ) independently
so that:
Each T small.
Each µ small.
σ will be compatible with the observed variation of y.
The observed variation of y is used to guide the choice of the
hyperparameters for the µ and σ priors.
π is a “regularization prior” as it keeps the contribution of each
g(x; Ti , Mi ) small, explaining only a small portion of the fit.
7 / 45
8. Build up the fit, by adding up tiny bits of fit ..
8 / 45
9. Connections to Other Modeling Ideas
Bayesian Nonparametrics:
Lots of parameters (to make model flexible)
A strong prior to shrink towards simple structure (regularization)
BART shrinks towards additive models with some interaction
Dynamic Random Basis:
g(x;T1,M1), ..., g(x;Tm,Mm) are dimensionally adaptive
Gradient Boosting:
Overall fit becomes the cumulative effort of many “weak learners”
Connections to Other Modeling Ideas
Y = g(x;T1,M1) + ... + g(x;Tm,Mm) + & z
plus
#((T1,M1),....(Tm,Mm),&)
12
Bayesian Nonparametrics:
Lots of parameters (to make model flexible)
A strong prior to shrink towards simple structure (regularization)
BART shrinks towards additive models with some interaction
Dynamic Random Basis Elements:
g(x; T1, M1), ..., g(x; Tm, Mm) are dimensionally adaptive
Gradient Boosting:
Fit becomes the cumulative effort of many weak learners
9 / 45
10. A Sketch of the BART MCMC Algorithm
Bayesian Nonparametrics:
Lots of parameters (to make model flexible)
A strong prior to shrink towards simple structure (regularization)
BART shrinks towards additive models with some interaction
Dynamic Random Basis:
g(x;T1,M1), ..., g(x;Tm,Mm) are dimensionally adaptive
Gradient Boosting:
Overall fit becomes the cumulative effort of many “weak learners”
Connections to Other Modeling Ideas
Y = g(x;T1,M1) + ... + g(x;Tm,Mm) + & z
plus
#((T1,M1),....(Tm,Mm),&)
12
Bayesian Backfitting: Outer loop is a “simple” Gibbs sampler
(Ti , Mi ) | Y , all other (Tj , Mj ), and σ
σ | Y , (T1, M1, . . . , . . . , Tm, Mm)
To draw (Ti , Mi ) above, subtract the contributions of the other
trees from both sides to get a simple one-tree model.
We integrate out M to draw T and then draw M | T.
10 / 45
11. For the draw of T we use a Metropolis-Hastings within Gibbs step.
Our proposal moves around tree space by proposing local
modifications such as the “birth-death” step:Because p(T | data) is available in closed form (up to a norming constant),
we use a Metropolis-Hastings algorithm.
Our proposal moves around tree space by proposing local modifications
such as
=>
?
=>
?
propose a more complex tree
propose a simpler tree
Such modifications are accepted according to their compatibility
with p(T | data).
20
Simulating p(T | data) with the Bayesian CART Algorithm
... as the MCMC runs, each tree in the sum will grow and shrink,
swapping fit amongst them ....
11 / 45
12. Using the MCMC Output to Draw Inference
Each iteration d results in a draw from the posterior of f
ˆfd (·) = g(·; T1d , M1d ) + · · · + g(·; Tmd , Mmd )
To estimate f (x) we simply average the ˆfd (·) draws at x
Posterior uncertainty is captured by variation of the ˆfd (x)
eg, 95% credible region estimated by middle 95% of values
Can do the same with functionals of f .
12 / 45
13. Out of Sample Prediction
Predictive comparisons on 42 data sets.
Data from Kim, Loh, Shih and Chaudhuri (2006) (thanks Wei-Yin Loh!)
p = 3 to 65, n = 100 to 7,000.
for each data set 20 random splits into 5/6 train and 1/6 test
use 5-fold cross-validation on train to pick hyperparameters (except
BART-default!)
gives 20*42 = 840 out-of-sample predictions, for each prediction, divide rmse
of different methods by the smallest
+ each boxplots represents
840 predictions for a
method
+ 1.2 means you are 20%
worse than the best
+ BART-cv best
+ BART-default (use default
prior) does amazingly
well!!
RondomForestsNeuralNetBoostingBART−cvBART−default
1.0 1.1 1.2 1.3 1.4 1.5
13 / 45
14. Automatic Uncertainty Quantification
A simple simulated 1-dimensional example
−1.0 −0.5 0.0 0.5 1.0
−1.0−0.50.00.51.0
x
y
95% pointwise posterior intervals, BART
posterior mean
true f
−1.0 −0.5 0.0 0.5 1.0
−1.0−0.50.00.51.0
x
y
95% pointwise posterior intervals, mBART
Note: mBART on the right plot to be introduced next
14 / 45
15. Part II. Monotone BART - mBART
Multidimensional Monotone BART
(Chipman, George, McCulloch, Shively 2018)
Idea:
Approximate multivariate monotone functions by the sum of many
single tree models, each of which is monotonic.
15 / 45
16. An Example of a Monotonic Tree
x1
x2
f(x)
Three different views of
a bivariate monotonic
tree.
16 / 45
17. What makes this single tree monotonic?
x1
x2
f(x)
A function g is said to be monotonic in xi if for any δ > 0,
g(x1, x2, . . . , xi + δ, xi+1, . . . , xk; T, M)
≥ g(x1, x2, . . . , xi , xi+1, . . . , xk; T, M).
For simplicity and wlog, let’s restrict attention to monotone
nondecreasing functions.
17 / 45
18. Constraining a tree to be monotone is easy: we simply constrain
the mean level of a node to be greater than those of its below
neighbors and less than those of its above neighbors.
0.2 0.4 0.6 0.8
0.20.40.60.8
x1
x2
4
10
11
12
13
7
node 7 is disjoint from node 4.
node 10 is a below neighbor of node 13.
node 7 is an above neighbor of node 13.
The mean level of node 13 must be greater than those of 10 and
12 and less than that of node 7.
18 / 45
19. The mBART Prior
Recall the BART parameter
θ = ((T1, M1), (T2, M2), . . . , (Tm, Mm), σ)
Let S = {θ : every tree is monotonic in a desired subset of xi s}
To impose the monotonicity we simply truncate the BART prior
π(θ) to the set S
π∗
(θ) ∝ π(θ) IS (θ)
where IS (θ) is 1 if every tree in θ is monotonic.
19 / 45
20. A New BART MCMC “Christmas Tree” Algorithm
π((T1, M1), (T2, M2), . . . , (Tm, Mm), σ | y))
Bayesian backfitting again: Iteratively sample each (Tj , Mj ) given
(y, σ) and other (Tj , Mj )’s
Each (T0, M0) → (T1, M1) update is sampled as follows:
Denote move as
(T0, M0
Common, M0
Old ) → (T1, M0
Common, M1
New )
Propose T∗ via birth, death, etc.
If M-H with π(T, M | y) accepts (T∗, M0
Common)
Set (T1
, M1
Common) = (T∗
, M0
Common)
Sample M1
New from π(MNew | T1
, M1
Common, y)
Only M0
Old → M1
New needs to be updated.
Works for both BART and mBART.
20 / 45
21. Example: Product of two x’s
Let’s consider a very simple simulated monotone example:
Y = x1 x2 + , xi ∼ Uniform(0, 1).
Here is the plot of the true function f (x1, x2) = x1 x2
0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
x1
x2
x1
x2
f(x)
21 / 45
22. First we try a single (just one tree), unconstrained tree model.
Here is the graph of the fit.
0.2 0.4 0.6 0.8
0.20.40.60.8
x1
x2
x1
x2
f(x)
The fit is not terrible, but there are some aspects of the fit which
violate monotonicity.
22 / 45
23. Here is the graph of the fit with the monotone constraint:
0.2 0.4 0.6 0.8
0.20.40.60.8
x1
x2
x1
x2
f(x)
We see that our fit is monotonic, and more representative of the
true f .
23 / 45
24. Here is the unconstrained BART fit:
0.2 0.4 0.6 0.8
0.20.40.60.8
x1
x2
x1
x2
f(x)
Much better (of course) but not monotone!
24 / 45
25. And, finally, the constrained BART fit:
0.2 0.4 0.6 0.8
0.20.40.60.8
x1
x2
x1
x2
f(x)
Not Bad!
Same method works with any number of x’s!
25 / 45
26. Example: MSE Reduction by Monotone Regularization
Y = x1 x2
2 + x3 x3
4 + x5 + ,
∼ N(0, σ2
), xi ∼ Uniform(0, 1).
For various values of σ, we simulated 5,000 observations.
26 / 45
28. Part III. Discovering Monotonicity with mBART
Suppose we don’t know if f (x) is monotone up, monotone down or
even monotone at all.
Of course, a simple strategy would be simply compare the fits from
BART and mBART.
Good news, we can do much better than this!
As we’ll now see, mBART can be deployed to simultaneously
estimate all the monotone components of f .
With this strategy, monotonicity can be discovered rather than
imposed!
28 / 45
29. The Monotone Decomposition of a Function
To begin simply, suppose x is one-dimensional and f is of bounded
variation.
Any such f can be uniquely written (up to an additive
constant) as the sum of a monotone up function and a
monotone down function
f (x) = fup(x) + fdown(x)
where
when f (x) is increasing, fup(x) increases at the same
rate and is flat otherwise,
when f (x) is decreasing, fdown(x) decreases at the
same rate and is flat otherwise.
29 / 45
30. More precisely, when f is differentiable,
fup(x) =
f (x) when f (x) > 0
0 when f (x) ≤ 0
and
fdown(x) =
f (x) when f (x) < 0
0 when f (x) ≥ 0
Notice the orthogonal decomposition of f
f (x) = fup(x) + fdown(x)
30 / 45
31. The Discovery Strategy with mBART
Key Idea: To discover the monotone decomposition of f , we
embed f (x) as a two-dimensional function in R2,
f (x) = f ∗
(x, x) = fup(x) + fdown(x).
Letting x1 = x2 = x be duplicate copies of x, we apply mBART to
estimate f ∗(x1, x2)
constrained to be monotone up in the x1 direction, and
constrained to be monotone down in the x2 direction.
We are effectively estimating the monotone projections of
f ∗(x1, x2) onto the x1 and x2 axes
P[x1]f ∗(x1, x2) = fup(x1)
P[x2]f ∗(x1, x2) = fdown(x2)
31 / 45
36. Example: House Price Data
Let’s look at a simple real example where y = house price,
and x = three characteristics of each house.
> head(x)
nbhd size brick
[1,] 2 1.79 0
[2,] 2 2.03 0
[3,] 2 1.74 0
[4,] 2 1.98 0
[5,] 2 2.13 0
[6,] 1 1.78 0
> dim(x)
[1] 128 3
> summary(x)
nbhd size brick
Min. :1.000 Min. :1.450 Min. :0.0000
1st Qu.:1.000 1st Qu.:1.880 1st Qu.:0.0000
Median :2.000 Median :2.000 Median :0.0000
Mean :1.961 Mean :2.001 Mean :0.3281
3rd Qu.:3.000 3rd Qu.:2.140 3rd Qu.:1.0000
Max. :3.000 Max. :2.590 Max. :1.0000
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
69.1 111.3 126.0 130.4 148.2 211.2
y: dollars (thousands).
x: nbhd (categorial), size (sq ft thousands), brick (indicator).
36 / 45
37. Call:
lm(formula = price ~ nbhd + size + brick, data = hdat)
Residuals:
Min 1Q Median 3Q Max
-30.049 -8.519 0.137 7.640 36.912
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.725 10.766 1.739 0.0845 .
nbhd2 5.556 2.779 1.999 0.0478 *
nbhd3 36.770 2.958 12.430 < 2e-16 ***
size 46.109 5.527 8.342 1.25e-13 ***
brickYes 19.152 2.438 7.855 1.69e-12 ***
---
Residual standard error: 12.5 on 123 degrees of freedom
Multiple R-squared: 0.7903,Adjusted R-squared: 0.7834
F-statistic: 115.9 on 4 and 123 DF, p-value: < 2.2e-16
If the linear model is correct, we are monotone up in all three
variables.
Remark: For the linear model we have to dummy up nbhd, but for
BART and mBART we can simply leave it as an ordered numerical
categorical variable.
37 / 45
41. Let’s now look at the effect of size conditionally on the six possible
values of (nbdh, brick)
1.7 1.8 1.9 2.0 2.1 2.2 2.3
100120140160180
size
price
N
N
N
N N
N
N
N
N
N
N
N
N
N
N
N
N
N
N N
N
N
N
N
N
N
N
N
N
N
N
N
N
N N
N
N
N
N
N
N
N
N
N
N
B
B
B
B B
B
B B
B
B
B
B
B
B
B
B
B
B
B B
B
B
B
B
B
B
B
B
B
B
B
B
B
B B
B
B
B
B
B
B
B
B
B
B
BART: conditional effect of size
NBHD1
NBHD2
NBHD3
1.7 1.8 1.9 2.0 2.1 2.2 2.3
100120140160180
size
price
N
N
N
N N N N N
N
N N N
N
N
N
N
N
N
N N N N N
N
N N N
N
N
NN
N
N
N N N N N
N
N N N
N
N
N
B
B
B
B B B B B
B
B B B
B
B
B
B
B
B
B B B B B
B
B B B
B
B
B
B
B
B
B B B B B
B
B B B
B
B
B
mBART: conditional effect of size
1.7 1.8 1.9 2.0 2.1 2.2 2.3
100120140160180
size
price
N
N N
N N N N N
N
N N N
N
N
N
N
N N
N N N N N
N
N N N
N
N
N
N
N N
N N N N N
N
N N N
N
N
N
B
B B
B B B B B
B
B B B
B
B
B
B
B B
B B B B B
B
B B B
B
B
B
B
B B
B B B B B
B
B B B
B
B
B
mBARTD: conditional effect of size
The conditionally monotone effect of size is becoming clearer!
41 / 45
42. And finally, the effect of size conditionally on the six possible
values of (nbdh, brick) via ˆfup and ˆfdown
1.7 1.8 1.9 2.0 2.1 2.2 2.3
80100120140160180200
size
price
N
N N N N N N N
N
N N N N
N N
N
N N N N N N N
N
N N N N
N NN
N N N N N N N
N
N N N N
N N
B
B B B B B B B
B
B B B B
B B
B
B B B B B B B
B
B B B B
B BB
B B B B B B B
B
B B B B
B B
mBART, fup: conditional effect of size
1.7 1.8 1.9 2.0 2.1 2.2 2.3
80100120140160180200
size
price
N N N N N N N N N N N N N N NN N N N N N N N N N N N N N NN N N N N N N N N N N N N N NB B B B B B B B B B B B B B BB B B B B B B B B B B B B B BB B B B B B B B B B B B B B B
mBARTD, fdown: conditional effect of size
Price is clearly conditionally monotone up in all three variables!
By simultaneously estimating ˆfup + ˆfdown, we have discovered
monotonicity without any imposed assumptions!!!
42 / 45
43. Concluding Remarks
mBARTD = ˆfup + ˆfdown provides a monotone assumption free
approach for the discovery of the monotone components of f
in multidimensional settings.
Discovering such regions of monotonicity may of scientific
interest in real applications.
We have used informal variable selection to identify the
monotone components here. More formal variable selection
can be used in higher dimensional settings.
As a doubly adaptive shape-constrained regularization
approach,
mBARTD will adapt to mBART when monotonicity is present,
mBARTD will adapt to BART when monotonicity absent,
mBARTD seems at least as good and maybe better, than the
best of mBART and BART in general.
43 / 45
44. Concluding Remarks
The fully Bayesian nature of BART greatly facilitates
extensions such as mBART, mBARTD and many others.
Despite its many compelling successes in practice, theoretical
frequentist support for BART only now beginning to appear.
For example, Rockova and van der Pas (2017) Posterior
Concentration for Bayesian Regression Trees and Their
Ensembles recently laid out a framework for the first
theoretical results for Bayesian CART and BART, showing
near-minimax posterior concentration when p > n for classes
of Holder continuous functions.
Monotone BART paper is available on Arxiv. Software for
mBART is available at https://bitbucket.org/remcc/mbart.
44 / 45