This document summarizes Hill's method for numerically approximating the eigenvalues and eigenfunctions of differential operators. Hill's method has two main steps:
1. Perform a Floquet-Bloch decomposition to reduce the problem from the real line to the interval [0,L] with periodic boundary conditions, parameterized by the Floquet exponent μ. This gives an operator with a compact resolvent.
2. Approximate the solutions by Fourier series, reducing the problem to a matrix eigenvalue problem that can be solved numerically.
The method is straightforward to implement and effective for various problems involving differential operators on the real line or with periodic boundary conditions. Convergence rates and error bounds for Hill's method are also presented.
Clustering in Hilbert geometry for machine learningFrank Nielsen
- The document discusses different geometric approaches for clustering multinomial distributions, including total variation distance, Fisher-Rao distance, Kullback-Leibler divergence, and Hilbert cross-ratio metric.
- It benchmarks k-means clustering using these four geometries on the probability simplex, finding that Hilbert geometry clustering yields good performance with theoretical guarantees.
- The Hilbert cross-ratio metric defines a non-Riemannian Hilbert geometry on the simplex with polytopal balls, and satisfies information monotonicity properties desirable for clustering distributions.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
This document discusses priors for mixture models. It introduces weakly informative priors like symmetric empirical Bayes priors and dependent priors. Improper independent priors are problematic for mixtures. Reparameterization techniques are discussed to define proper Jeffreys priors, including expressing components as local perturbations, using moments, and spherical reparameterization. Specific examples for Gaussian and Poisson mixtures show valid reparameterizations that lead to proper posteriors.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
11.generalized and subset integrated autoregressive moving average bilinear t...Alexander Decker
This document proposes generalized integrated autoregressive moving average bilinear (GBL) time series models and subset generalized integrated autoregressive moving average bilinear (GSBL) models to achieve stationary for all nonlinear time series. It presents the models' formulations and discusses their properties including stationary, convergence, and parameter estimation. An algorithm is provided to fit the one-dimensional models. The generalized models are applied to Wolfer sunspot numbers and the GBL model is found to perform better than the GSBL model.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
This document summarizes Hill's method for numerically approximating the eigenvalues and eigenfunctions of differential operators. Hill's method has two main steps:
1. Perform a Floquet-Bloch decomposition to reduce the problem from the real line to the interval [0,L] with periodic boundary conditions, parameterized by the Floquet exponent μ. This gives an operator with a compact resolvent.
2. Approximate the solutions by Fourier series, reducing the problem to a matrix eigenvalue problem that can be solved numerically.
The method is straightforward to implement and effective for various problems involving differential operators on the real line or with periodic boundary conditions. Convergence rates and error bounds for Hill's method are also presented.
Clustering in Hilbert geometry for machine learningFrank Nielsen
- The document discusses different geometric approaches for clustering multinomial distributions, including total variation distance, Fisher-Rao distance, Kullback-Leibler divergence, and Hilbert cross-ratio metric.
- It benchmarks k-means clustering using these four geometries on the probability simplex, finding that Hilbert geometry clustering yields good performance with theoretical guarantees.
- The Hilbert cross-ratio metric defines a non-Riemannian Hilbert geometry on the simplex with polytopal balls, and satisfies information monotonicity properties desirable for clustering distributions.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
This document discusses priors for mixture models. It introduces weakly informative priors like symmetric empirical Bayes priors and dependent priors. Improper independent priors are problematic for mixtures. Reparameterization techniques are discussed to define proper Jeffreys priors, including expressing components as local perturbations, using moments, and spherical reparameterization. Specific examples for Gaussian and Poisson mixtures show valid reparameterizations that lead to proper posteriors.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
11.generalized and subset integrated autoregressive moving average bilinear t...Alexander Decker
This document proposes generalized integrated autoregressive moving average bilinear (GBL) time series models and subset generalized integrated autoregressive moving average bilinear (GSBL) models to achieve stationary for all nonlinear time series. It presents the models' formulations and discusses their properties including stationary, convergence, and parameter estimation. An algorithm is provided to fit the one-dimensional models. The generalized models are applied to Wolfer sunspot numbers and the GBL model is found to perform better than the GSBL model.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
On the solvability of a system of forward-backward linear equations with unbo...Nikita V. Artamonov
The document discusses a system of forward-backward linear evolution equations (FBEE) with unbounded operator coefficients. It introduces the necessary mathematical framework including a triple of Banach spaces and associated operators. It then defines the system of FBEE, discusses mild solutions, and relates it to a differential operator Riccati equation. The main result is a theorem stating that under certain assumptions on the operators, including accretivity of A, the Riccati equation has a unique mild solution.
New Mathematical Tools for the Financial SectorSSA KPI
AACIMP 2010 Summer School lecture by Gerhard Wilhelm Weber. "Applied Mathematics" stream. "Modern Operational Research and Its Mathematical Methods with a Focus on Financial Mathematics" course. Part 5.
More info at http://summerschool.ssa.org.ua
Approximative Bayesian Computation (ABC) methods allow approximating intractable likelihoods in Bayesian inference. ABC rejection sampling simulates parameters from the prior and keeps those where simulated data is close to observed data. ABC Markov chain Monte Carlo creates a Markov chain over the parameters where proposed moves are accepted if simulated data is similar to observed. Population Monte Carlo and ABC-MCMC improve on rejection sampling by using sequential importance sampling and MCMC moves to propose parameters in high density regions.
Information geometry: Dualistic manifold structures and their usesFrank Nielsen
Information geometry: Dualistic manifold structures and their uses
by Frank Nielsen
Talk given at ICML GIMLI2018
http://gimli.cc/2018/
See tutorial at:
https://arxiv.org/abs/1808.08271
``An elementary introduction to information geometry''
An elementary introduction to information geometryFrank Nielsen
This document provides an elementary introduction to information geometry. It discusses how information geometry generalizes concepts from Riemannian geometry to study the geometry of decision making and model fitting. Specifically, it introduces:
1. Dually coupled connections (∇, ∇*) that are compatible with a metric tensor g and define dual parallel transport on a manifold.
2. The fundamental theorem of information geometry, which states that manifolds with dually coupled connections (∇, ∇*) have the same constant curvature.
3. Examples of statistical manifolds with dually flat geometry that arise from Bregman divergences and f-divergences, making them useful for modeling relationships between probability distributions
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
The document discusses exponential decay of solutions to a second-order linear differential equation involving a self-adjoint positive operator A and an accretive damping operator D. Several theorems establish conditions under which the associated operator semigroup or pencil generates exponential decay. If D is accretive and satisfies certain positivity conditions, the semigroup will decay exponentially. Explicit bounds on the rate of decay and estimates of the spectrum are provided depending on properties of A and D.
This document discusses approximate Bayesian computation (ABC). ABC allows Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. It introduces ABC, describes how it originated from population genetics models, and outlines some of its limitations and advances, including various related computational methods like ABC with empirical likelihoods. The document also examines how ABC relates to other simulation-based statistical methods and considers perspectives on how Bayesian ABC can be.
This document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces an approximation of the posterior distribution by simulating data under different parameter values and accepting simulations that match the observed data. The document provides background on how ABC originated from population genetics models and outlines some of the advances in ABC, including how it can be used as an inference machine to estimate parameters from simulated data.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
Hamilton-Jacobi approach for second order traffic flow modelsGuillaume Costeseque
This document summarizes a presentation on using a Hamilton-Jacobi approach for second order traffic flow models. It begins with an introduction to traffic modeling, discussing both Eulerian and Lagrangian representations of traffic. It then discusses using a variational principle to apply to generic second order traffic flow models (GSOM), which account for additional driver attributes beyond just density. Specifically, it discusses formulating GSOM models in Lagrangian coordinates using a Hamilton-Jacobi framework. The document outlines solving the HJ PDE using characteristics, and decomposing problems into elementary blocks defined by piecewise affine initial, upstream and internal boundary conditions.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
The document discusses Approximate Bayesian Computation (ABC). ABC allows inference for statistical models where the likelihood function is not available in closed form. ABC works by simulating data under different parameter values and comparing simulated to observed data. ABC has been used for model choice by comparing evidence for different models. Consistency of ABC for model choice depends on the criterion used and asymptotic identifiability of the parameters.
Signal Processing Course : Sparse Regularization of Inverse ProblemsGabriel Peyré
The document discusses sparse regularization for inverse problems. It describes how sparse regularization can be used for tasks like denoising, inpainting, and image separation by posing them as optimization problems that minimize data fidelity and an L1 sparsity prior on the coefficients. Iterative soft thresholding is presented as an algorithm for solving the noisy sparse regularization problem. Examples are given of how sparse wavelet regularization can outperform other regularizers like Sobolev for tasks like image deblurring.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Dependent processes in Bayesian NonparametricsJulyan Arbel
This document summarizes dependent processes in Bayesian nonparametrics. It motivates the need for dependent random probability measures to accommodate temporal dependence structures beyond the exchangeability assumption. It describes modeling collections of random probability measures indexed by time as either discrete-time or continuous-time processes. The diffusive Dirichlet process is introduced as a dependent Dirichlet process with Dirichlet marginal distributions at each time point and continuous sample paths. Simulation and estimation methods are discussed for this model.
This document provides a probability cheatsheet compiled by William Chen and Joe Blitzstein with contributions from others. It is licensed under CC BY-NC-SA 4.0 and contains information on topics like counting rules, probability definitions, random variables, expectations, independence, and more. The cheatsheet is designed to summarize essential concepts in probability.
On the solvability of a system of forward-backward linear equations with unbo...Nikita V. Artamonov
The document discusses a system of forward-backward linear evolution equations (FBEE) with unbounded operator coefficients. It introduces the necessary mathematical framework including a triple of Banach spaces and associated operators. It then defines the system of FBEE, discusses mild solutions, and relates it to a differential operator Riccati equation. The main result is a theorem stating that under certain assumptions on the operators, including accretivity of A, the Riccati equation has a unique mild solution.
New Mathematical Tools for the Financial SectorSSA KPI
AACIMP 2010 Summer School lecture by Gerhard Wilhelm Weber. "Applied Mathematics" stream. "Modern Operational Research and Its Mathematical Methods with a Focus on Financial Mathematics" course. Part 5.
More info at http://summerschool.ssa.org.ua
Approximative Bayesian Computation (ABC) methods allow approximating intractable likelihoods in Bayesian inference. ABC rejection sampling simulates parameters from the prior and keeps those where simulated data is close to observed data. ABC Markov chain Monte Carlo creates a Markov chain over the parameters where proposed moves are accepted if simulated data is similar to observed. Population Monte Carlo and ABC-MCMC improve on rejection sampling by using sequential importance sampling and MCMC moves to propose parameters in high density regions.
Information geometry: Dualistic manifold structures and their usesFrank Nielsen
Information geometry: Dualistic manifold structures and their uses
by Frank Nielsen
Talk given at ICML GIMLI2018
http://gimli.cc/2018/
See tutorial at:
https://arxiv.org/abs/1808.08271
``An elementary introduction to information geometry''
An elementary introduction to information geometryFrank Nielsen
This document provides an elementary introduction to information geometry. It discusses how information geometry generalizes concepts from Riemannian geometry to study the geometry of decision making and model fitting. Specifically, it introduces:
1. Dually coupled connections (∇, ∇*) that are compatible with a metric tensor g and define dual parallel transport on a manifold.
2. The fundamental theorem of information geometry, which states that manifolds with dually coupled connections (∇, ∇*) have the same constant curvature.
3. Examples of statistical manifolds with dually flat geometry that arise from Bregman divergences and f-divergences, making them useful for modeling relationships between probability distributions
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
The document discusses exponential decay of solutions to a second-order linear differential equation involving a self-adjoint positive operator A and an accretive damping operator D. Several theorems establish conditions under which the associated operator semigroup or pencil generates exponential decay. If D is accretive and satisfies certain positivity conditions, the semigroup will decay exponentially. Explicit bounds on the rate of decay and estimates of the spectrum are provided depending on properties of A and D.
This document discusses approximate Bayesian computation (ABC). ABC allows Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. It introduces ABC, describes how it originated from population genetics models, and outlines some of its limitations and advances, including various related computational methods like ABC with empirical likelihoods. The document also examines how ABC relates to other simulation-based statistical methods and considers perspectives on how Bayesian ABC can be.
This document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces an approximation of the posterior distribution by simulating data under different parameter values and accepting simulations that match the observed data. The document provides background on how ABC originated from population genetics models and outlines some of the advances in ABC, including how it can be used as an inference machine to estimate parameters from simulated data.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
Hamilton-Jacobi approach for second order traffic flow modelsGuillaume Costeseque
This document summarizes a presentation on using a Hamilton-Jacobi approach for second order traffic flow models. It begins with an introduction to traffic modeling, discussing both Eulerian and Lagrangian representations of traffic. It then discusses using a variational principle to apply to generic second order traffic flow models (GSOM), which account for additional driver attributes beyond just density. Specifically, it discusses formulating GSOM models in Lagrangian coordinates using a Hamilton-Jacobi framework. The document outlines solving the HJ PDE using characteristics, and decomposing problems into elementary blocks defined by piecewise affine initial, upstream and internal boundary conditions.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
The document discusses Approximate Bayesian Computation (ABC). ABC allows inference for statistical models where the likelihood function is not available in closed form. ABC works by simulating data under different parameter values and comparing simulated to observed data. ABC has been used for model choice by comparing evidence for different models. Consistency of ABC for model choice depends on the criterion used and asymptotic identifiability of the parameters.
Signal Processing Course : Sparse Regularization of Inverse ProblemsGabriel Peyré
The document discusses sparse regularization for inverse problems. It describes how sparse regularization can be used for tasks like denoising, inpainting, and image separation by posing them as optimization problems that minimize data fidelity and an L1 sparsity prior on the coefficients. Iterative soft thresholding is presented as an algorithm for solving the noisy sparse regularization problem. Examples are given of how sparse wavelet regularization can outperform other regularizers like Sobolev for tasks like image deblurring.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Dependent processes in Bayesian NonparametricsJulyan Arbel
This document summarizes dependent processes in Bayesian nonparametrics. It motivates the need for dependent random probability measures to accommodate temporal dependence structures beyond the exchangeability assumption. It describes modeling collections of random probability measures indexed by time as either discrete-time or continuous-time processes. The diffusive Dirichlet process is introduced as a dependent Dirichlet process with Dirichlet marginal distributions at each time point and continuous sample paths. Simulation and estimation methods are discussed for this model.
This document provides a probability cheatsheet compiled by William Chen and Joe Blitzstein with contributions from others. It is licensed under CC BY-NC-SA 4.0 and contains information on topics like counting rules, probability definitions, random variables, expectations, independence, and more. The cheatsheet is designed to summarize essential concepts in probability.
This document provides a probability cheatsheet compiled by William Chen and Joe Blitzstein with contributions from others. It is licensed under CC BY-NC-SA 4.0 and contains information on topics like counting rules, probability definitions, random variables, moments, and more. The cheatsheet is regularly updated with comments and suggestions submitted through a GitHub repository.
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
- The document discusses debiasing techniques for Markov chain Monte Carlo (MCMC) algorithms.
- It introduces the concept of "fishy functions" which are solutions to Poisson's equation and can be used for control variates to reduce bias and variance in MCMC estimators.
- The document outlines different sections including revisiting unbiased estimation through Poisson's equation, asymptotic variance estimation using a novel "fishy function" estimator, and experiments on different examples.
Asymptotics for discrete random measuresJulyan Arbel
This document provides an introduction to asymptotics for discrete random measures, specifically the Dirichlet process and two-parameter Poisson-Dirichlet process. It discusses several key aspects in 3 sentences or less:
1) It outlines the stick-breaking construction of the two-parameter Poisson-Dirichlet process and defines related notation. 2) It introduces the truncation error Rn and discusses how its asymptotic behavior differs between the Dirichlet and two-parameter Poisson-Dirichlet cases. 3) It briefly describes some applications of these processes in mixture modeling and summarizes different sampling approaches like blocked Gibbs and slice sampling that rely on truncation of the infinite-dimensional distributions.
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian theory and methodology in machine learning. They have achieved remarkable success in computation, and enjoy strong theoretical support. Much of the existing literature has focused on the linear Gaussian case. The purpose of the current talk is to demonstrate that the horseshoe priors are useful more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
Introduction of info theory basis for image/video coding, especially, entropy, rate-distortion theory,
entropy coding, huffman coding, arithmetic coding
This document presents Joe Suzuki's work on Bayes independence tests. It discusses both discrete and continuous cases. For the discrete case, it estimates mutual information using maximum likelihood and proposes a Bayesian estimation using Lempel-Ziv compression. This Bayesian estimation is shown to be consistent. For the continuous case, it constructs a generalized Bayesian estimation that is also consistent. It also discusses the Hilbert Schmidt independence criterion (HSIC) and its limitations. Experiments show the proposed method performs well on both synthetic and real data, while HSIC shows poor performance in some cases. The proposed method has significantly better execution time than HSIC.
The document presents the cooperative-Lasso, a regularization method for variable selection in regression that assumes sign-coherent group structure. It begins by introducing generalized linear models and the group Lasso estimator. It then notes two limitations of the group Lasso: it does not allow for single zeros within groups, and it does not enforce sign coherence within groups. The cooperative-Lasso is introduced as a penalty that assumes groups will have either all non-positive, non-negative, or null parameters. Examples of applications that could benefit from sign coherence between variables within groups are given.
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
Based on the theory of meadows an equational axiomatisation is given for probability functions on finite event spaces. Completeness of the axioms is stated with some pointers to how that is shown.Then a simplified model courtroom subjective probabilistic reasoning is provided in terms of a protocol with two proponents: the trier of fact (TOF, the judge), and the moderator of evidence (MOE, the scientific witness). Then the idea is outlined of performing of a step of Bayesian reasoning by way of applying a transformation of the subjective probability function of TOF on the basis of different pieces of information obtained from MOE. The central role of the so-called Adams transformation is outlined. A simple protocol is considered where MOE transfers to TOF first a likelihood ratio for a hypothesis H and a potential piece of evidence E and thereupon the additional assertion that E holds true. As an alternative a second protocol is considered where MOE transfers two successive likelihoods (the quotient of both being the mentioned ratio) followed with the factuality of E. It is outlined how the Adams transformation allows to describe information processing at TOF side in both protocols and that the resulting probability distribution is the same in both cases. Finally it is indicated how the Adams transformation also allows the required update of subjective probability at MOE side so that both sides in the protocol may be assumed to comply with the demands of subjective probability.
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
Hypothesis testings on individualized treatment rulesYoung-Geun Choi
Invited talk in Joint Statistical Meetings 2017, Baltimore, Maryland.
Individualized treatment rules (ITR) assign treatments according to different patient's characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment benefit. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. Our test extends the decorrelated score test proposed in Nang and Liu (2017) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical study and data examples.
We propose a regularized method for multivariate linear regression when the number of predictors may exceed the sample size. This method is designed to strengthen the estimation and the selection of the relevant input features with three ingredients: it takes advantage of the dependency pattern between the responses by estimating the residual covariance; it performs selection on direct links between predictors and responses; and selection is driven by prior structural information. To this end, we build on a recent reformulation of the multivariate linear regression model to a conditional Gaussian graphical model and propose a new regularization scheme accompanied with an efficient optimization procedure. On top of showing very competitive performance on artificial and real data sets, our method demonstrates capabilities for fine interpretation of its parameters, as illustrated in applications to genetics, genomics and spectroscopy.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
This document provides a concise probability cheatsheet compiled by William Chen and others. It covers key probability concepts like counting rules, sampling tables, definitions of probability, independence, unions and intersections, joint/marginal/conditional probabilities, Bayes' rule, random variables and their distributions, expected value, variance, indicators, moment generating functions, and independence of random variables. The cheatsheet is licensed under CC BY-NC-SA 4.0 and the last updated date is March 20, 2015.
This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented.
At that time I was experimenting with a mixture model, that has been since then abandoned, and the Hdist that is still experimental. The presentation also describes the exhange algorithm to solve doubly-intractable distributions, the generalized Multiple-Try Metropolis, and the parallel PRNG used to minimize communication between jobs.
This document presents a new model of decision making under risk and uncertainty called the Harmonic Probability Weighting Function (HPWF) model. The HPWF model incorporates mental states using a weak harmonic transitivity axiom and an abstract harmonic representation of noise. It explains phenomena like the conjunction fallacy and preference reversal. The HPWF uses a harmonic component controlled by a phase function to characterize how a decision maker's mental states influence probability weighting. Maximum entropy methods can be used to derive a coherent harmonic probability weighting function from the HPWF model.
Similar to Logit stick-breaking priors for partially exchangeable count data (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Logit stick-breaking priors for partially exchangeable count data
1. Logit stick-breaking priors for partially
exchangeable count data
Tommaso Rigon
http://tommasorigon.github.io
Bocconi University
SIS 2018, Palermo, 22-06-2018
Tommaso Rigon (Bocconi) LSBP SIS 2018 1 / 20
2. Introduction
Partial exchangeability
A bivariate sequence (Xi , Yj )i,j≥1 is partially exchangeable if
(X1, . . . , Xn1
, Y1, . . . , Yn2
)
d
= (Xσ(1), . . . , Xσ(n1), Yσ (1), . . . , Yσ (n2)),
for any n1, n2 ≥ 1 and any permutations σ and σ .
de Finetti’s representation theorem
The sequence (Xi , Yj )i,j≥1 is partially exchangeable if and only if
P(X1 ∈ A1, . . . Xn1
∈ An1
,Y1 ∈ B1, . . . , Yn2
∈ Bn2
) =
=
P2
n1
i=1
p1(Ai )
n2
j=1
p2(Bj )Q2(dp1, dp2).
Tommaso Rigon (Bocconi) LSBP SIS 2018 2 / 20
3. Introduction
Partial exchangeability
Thus, a draw from (Xi , Yj )i,j≥1 can be expressed hierarchically:
(Xi | p1)
iid
∼ p1, (Yj | p2)
iid
∼ p2,
(p1, p2) ∼ Q2
where each (Xi | p1) is independent on each (Yj | p2).
The quantity (p1, p2) is a vector of random probability measures and Q2
can be interpreted as their prior law.
If p1
|=
p2 then the observations (X1, . . . , Xn1
) and (Y1, . . . , Yn2
) can be
modeled separately and independently.
Dependence between p1 and p2 allows for borrowing of information
across the sequences.
Tommaso Rigon (Bocconi) LSBP SIS 2018 3 / 20
4. Introduction
Partial exchangeability with count data
Let Y1, . . . , Yn ∈ N be a collection of count response variables, each
corresponding to a qualitative covariate xi ∈ {1, . . . , J}.
Each data point yi is a conditionally independent draw from
(Yi | xi = j)
ind
∼ pj , i = 1, . . . , n,
where pj denotes the probability mass function of (Yi | xi = j).
This is an instance of partial exchangeability with count data.
Model elicitation is completed by specifying a prior law QJ for the vector
of random probability distributions (p1, . . . , pJ ) ∼ QJ .
Tommaso Rigon (Bocconi) LSBP SIS 2018 4 / 20
5. Introduction
Desiderata
We seek for a Bayesian inferential procedure which:
provides a flexible, i.e. nonparametric, estimate for each law pj ;
allows for borrowing of information across the J groups;
is scalable, in the sense that is computationally feasible for large n or
large p;
has a reasonable interpretation, thus facilitating the incorporation of
prior information.
Tommaso Rigon (Bocconi) LSBP SIS 2018 5 / 20
6. Introduction
Bayesian nonparametric mixture models
A flexible Bayesian model for density estimation assumes
p(y) =
Θ
K(y; θ)dP(θ),
where K(y; θ) is a known parametric kernel (e.g. Poisson, negative
binomial), and P(θ) is a prior mixing measure.
Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
7. Introduction
Bayesian nonparametric mixture models
A flexible Bayesian model for density estimation assumes
p(y) =
Θ
K(y; θ)dP(θ),
where K(y; θ) is a known parametric kernel (e.g. Poisson, negative
binomial), and P(θ) is a prior mixing measure.
If the mixing measure is a Dirichlet process (Lo 1984), then exploiting
the stick-breaking construction:
p(y) =
Θ
K(y; θ)dP(θ) =
∞
h=1
πhK(y; θh), πh = νh
h−1
l=1
(1 − νl ),
with θh
iid
∼ P0 and νh
iid
∼ Beta(1, α), for h = 1, . . . , ∞.
Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
8. Introduction
The hierarchical Dirichlet process
A popular extension of the Lo model for partially exchangeable data is
the hierarchical Dirichlet process (Teh et al. 2006).
In the hierarchical Dirichlet process, for j = 1, . . . , J,
pj (y) =
Θ
K(y; θ)dPj (θ) =
∞
h=1
πhj K(y; θh),
(Pj | P0)
iid
∼ DP(αP0),
P0 ∼ DP(α0P00).
Under this specification, different groups shares the same atoms, while
having different mixture weights =⇒ borrowing of information.
Alternative models? Simple conditional algorithms?
Tommaso Rigon (Bocconi) LSBP SIS 2018 7 / 20
9. Introduction
Main contributions
We explored computational, interpretational and theoretical aspects of
the logit stick-breaking process of Ren et al. (2011) in the partially
exchangeable setting, and using count data.
The LSBP can be constructed via sequential logistic regression, allowing
a more clear interpretation of the parameters involved.
For the LSBP we derived an efficient Gibbs sampler based on a
Pólya-gamma data-augmentation.
Further theoretical support.
Tommaso Rigon (Bocconi) LSBP SIS 2018 8 / 20
10. Logit stick-breaking process
The LSBP model
Our proposal has the same structure of the HDP:
pj (y) =
Θ
Pois(y; θ)dPj (θ) =
∞
h=1
πhj Pois(y; θh), j = 1, . . . , J,
with conditionally conjugate prior for the atoms θh
iid
∼ Gamma(aθ, bθ).
The pj (y)s share the atoms θh and are characterized by group-specific
mixing weights.
The mixing weights πhl have a stick-breaking representation. Moreover,
the prior of the LSBP is different from the one of the HDP.
Tommaso Rigon (Bocconi) LSBP SIS 2018 9 / 20
11. Logit stick-breaking process
Hierarchical representation
Samples from a LSBP model can be obtained hierarchically.
For each data point yi , sample the group indicator Gi denoting the
mixture component
pr(Gi = h | xi = j) = πhj = νhj
h−1
l=1
(1 − νlj ).
Then, conditionally on Gi , sample the count response variable from
(Yi | Gi = h) ∼ Pois(θh).
Tommaso Rigon (Bocconi) LSBP SIS 2018 10 / 20
12. Logit stick-breaking process
Sequential interpretation
Can we interpret the stick-breaking weights νhl ?
Yes, indeed they can be rearranged as:
νhj =
πhj
1 −
h−1
l=1 πlj
=
pr(Gi = h | xi = j)
pr(Gi > h − 1 | xi = j)
= pr(Gi = h | Gi > h − 1, xi = j).
Each νhj is the probability of being allocated to component h,
conditionally on the event of surviving to the previous components.
Each I(Gi = h) = ζih is the assignment indicator of each unit to the h-th
component
ζih = zih
h−1
l=1
(1 − zil ), (zih | xi = j) ∼ Bern(νhj ).
Tommaso Rigon (Bocconi) LSBP SIS 2018 11 / 20
13. Logit stick-breaking process
Continuation-ratio logistic regressions
We need some prior specification for stick-breaking weights νhl .
Consistently with classical generalized linear model, a natural choice is
to define
logit(νhj ) = αhj , with αh = (αh1, . . . , αhJ )
iid
∼ NJ (µα, Σα),
independently for every h = 1, . . . , +∞.
If the matrix Σα is diagonal, then the mixture weights πhj are a priori
independent across groups.
Stronger borrowing of information—i.e. dependence across the mixing
weights—can be induced for non-diagonal choices of Σα.
Tommaso Rigon (Bocconi) LSBP SIS 2018 12 / 20
14. Logit stick-breaking process
Prior quantities
Prior moments
Let (P1, . . . , PJ ) be a vector of random probability measure induced by the
LSBP. Then, for any measurable set B, and for any j and j , then
E{Pj (B)} = P0(B),
cov{Pj (B), Pj (B)} = P0(B)(1 − P0(B))
E(ν1j ν1j )
E(ν1j ) + E(ν1j ) − E(ν1j ν1j )
.
These expectations have not a closed form solution, but they can be
easily obtained numerically.
The correlation corr{Pj (B), Pj (B)} does not depend on B, and
therefore it is often interpreted as a global measure of dependence.
Tommaso Rigon (Bocconi) LSBP SIS 2018 13 / 20
15. Logit stick-breaking process
Deterministic truncation of the infinite process
The LSBP is an infinite dimensional process =⇒ computational
challenges.
We propose a truncated version of the vector of random probability
measure (P1, . . . , PJ ), which can be regarded as an approximation of the
infinite process.
We induce the truncation by letting νHj = 1 for some integer H > 1,
which guarantees that
H
h=1 πhj = 1 almost surely.
According to Theorem 1 in Rigon and Durante (2018), the “discrepancy"
between the two processes is exponentially decreasing in H.
Tommaso Rigon (Bocconi) LSBP SIS 2018 14 / 20
16. Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
17. Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
The augmented log-likelihood has a quadratic form =⇒ simple
computations and conjugacy with Gaussian priors.
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
18. Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
The augmented log-likelihood has a quadratic form =⇒ simple
computations and conjugacy with Gaussian priors.
The conditional distribution of (ωi | −) is still in the class of the
Pólya-gamma distributions =⇒ conjugacy.
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
19. Posterior inference
Posterior inference via Gibbs sampling
for i from 1 to n do update Gi from the discrete variable with probabilities
pr(Gi = h | −) =
πhxi
Pois(yi ; θh)
H
q=1 πqxi Pois(yi ; θq)
,
for every h = 1, . . . , H. From Gi derive the associated zih indicators.
for h from 1 to H − 1 do update the logit stick-breaking parameters αh
for every i such that Gi > h − 1 do sample the Pòlya–gamma data ωih from
(ωih | −) ∼ PG{1, ψ(xi ) αh}.
Given the Pòlya–gamma augmented data, update αh from the full conditional
(αh | −) ∼ NR (µαh , Σαh ), standard Bayesian linear regression
for h from 1 to H do update each kernel parameter θh from
(θh | −) ∼ Gamma
aθ +
i:Gi =h
yi , bθ +
n
i=1
I(Gi = h)
.
Tommaso Rigon (Bocconi) LSBP SIS 2018 16 / 20
20. Illustration
Application to the seizure dataset
We apply the LSBP Poisson mixture model to the seizure dataset, which
is also available in the flexmix R package.
The dataset consists of daily myoclonic seizure counts (seizures) for a
single subject, comprising a series of n = 140 days.
After 27 days of baseline observation (Treatment:No), the subject
received monthly infusions of intravenous gamma globulin
(Treatment:Yes).
We aim to compare the J = 2 groups: days with treatment and days
without treatment.
Tommaso Rigon (Bocconi) LSBP SIS 2018 17 / 20
22. Discussion and conclusions
Possible extensions
The LSBP for partially exchangeable random variables could be used as
a building block for more sophisticated models.
For instance, one could use the partially exchangeable LSBP as a prior
for infinite hidden Markov models or for topic modeling, where the HDP
is usually employed.
The computational advantages of the LSBP might lead to major
improvements in those settings.
Tommaso Rigon (Bocconi) LSBP SIS 2018 19 / 20
23. Discussion and conclusions
Summary
We proposed a Bayesian nonparametric mixture model for partially
exchangeable count data.
We explored some of its theoretical properties and we developed a
simple Gibbs sampler for posterior inference.
References
Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using
Polya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1–42.
Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011), Logistic stick-breaking process, Journal of Machine
Learning Research 12, 203–239.
Rigon, T. and Durante, D., (2018), Logit stick-breaking priors for Bayesian density regression, ArXiv.
Rodriguez, A. and Dunson, D. B. (2011), Nonparametric Bayesian models through probit stick-breaking
processes, Bayesian Analysis 6(1), 145–178.
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the
American Statistical Association, 101(476), 1566—1581.
Tommaso Rigon (Bocconi) LSBP SIS 2018 20 / 20