Sequential Monte Carlo methods use importance sampling and resampling to estimate distributions in state space models recursively over time. This document discusses strategies for sampling in sequential Monte Carlo methods, including:
- Using the optimal proposal distribution of the one-step ahead predictive distribution to minimize weight variance.
- Approximating the predictive distribution using mixtures, expansions, auxiliary variables, or Markov chain Monte Carlo methods.
- Considering blocks of variables over time rather than individual time steps to better diffuse particles, such as using a lagged block, reweighting particles before resampling, or sampling an extended block with an augmented state space.
This document summarizes controlled sequential Monte Carlo, which aims to efficiently estimate intractable likelihoods p(y|θ) in state space models. It does this by defining a target path measure P(dx0:T) and proposal Markov chain Q(dx0:T) to approximate P(dx0:T). Standard sequential Monte Carlo (SMC) methods provide unbiased estimation but can have inadequate performance for practical particle sizes N due to discrepancy between P and Q. The document proposes using twisted path measures that depend on observations to better match P and Q, by defining proposal transitions P(dxt|xt-1,yt:T) that incorporate backward information filters ψ*t(xt)=P(yt
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura
1) The document discusses stochastic optimal control theory and information theoretic control theory. It derives the stochastic Hamilton-Jacobi-Bellman (HJB) equation, which defines optimality in stochastic optimal control problems via dynamic programming.
2) It introduces Wiener processes and stochastic differential equations to model stochastic dynamics. It then derives the stochastic HJB equation by taking the expectation of the value function and applying Itô's lemma.
3) Solving the stochastic HJB yields the optimal closed-loop control policy, but it results in a high-dimensional PDE that is difficult to solve directly except in special cases like linear quadratic Gaussian control.
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
This document summarizes controlled sequential Monte Carlo, which aims to efficiently estimate intractable likelihoods p(y|θ) in state space models. It does this by defining a target path measure P(dx0:T) and proposal Markov chain Q(dx0:T) to approximate P(dx0:T). Standard sequential Monte Carlo (SMC) methods provide unbiased estimation but can have inadequate performance for practical particle sizes N due to discrepancy between P and Q. The document proposes using twisted path measures that depend on observations to better match P and Q, by defining proposal transitions P(dxt|xt-1,yt:T) that incorporate backward information filters ψ*t(xt)=P(yt
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura
1) The document discusses stochastic optimal control theory and information theoretic control theory. It derives the stochastic Hamilton-Jacobi-Bellman (HJB) equation, which defines optimality in stochastic optimal control problems via dynamic programming.
2) It introduces Wiener processes and stochastic differential equations to model stochastic dynamics. It then derives the stochastic HJB equation by taking the expectation of the value function and applying Itô's lemma.
3) Solving the stochastic HJB yields the optimal closed-loop control policy, but it results in a high-dimensional PDE that is difficult to solve directly except in special cases like linear quadratic Gaussian control.
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
The document discusses statistical representation of random inputs in continuum models. It provides examples of representing random fields using the Karhunen-Loeve expansion, which expresses a random field as the sum of orthogonal deterministic basis functions and random variables. Common choices for the covariance function in the expansion include the radial basis function and limiting cases of fully correlated and uncorrelated fields. The covariance function can be approximated from samples of the random field to enable representation in applications.
- Bayesian techniques can be used for parameter estimation problems where parameters are considered random variables with associated densities rather than fixed unknown values.
- Markov chain Monte Carlo (MCMC) methods like the Metropolis algorithm are commonly used to sample from the posterior distribution when direct sampling is impossible due to high-dimensional integration. The algorithm constructs a Markov chain whose stationary distribution is the target posterior density.
- At each step, a candidate value is generated from a proposal distribution and accepted or rejected based on the posterior ratio to the previous value. Over many iterations, the chain samples converge to the posterior distribution.
The document discusses techniques for parameter selection and sensitivity analysis when estimating parameters from observational data. It introduces local sensitivity analysis based on derivatives to determine how sensitive model outputs are to individual parameters. Global sensitivity analysis techniques like ANOVA (analysis of variance) are also discussed, which quantify how parameter uncertainties contribute to uncertainty in model outputs. The ANOVA approach uses a Sobol decomposition to represent models as sums of parameter main effects and interactions, allowing variance-based sensitivity indices to be defined that quantify the influence of individual parameters and groups of parameters.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
The document proposes a new version of Hamiltonian Monte Carlo (HMC) sampling that is essentially calibration-free. It achieves this by learning the optimal leapfrog scale from the distribution of integration times using the No-U-Turn Sampler algorithm. Compared to the original NUTS algorithm on benchmark models, this new enhanced HMC (eHMC) exhibits significantly improved efficiency with no hand-tuning of parameters required. The document tests eHMC on a Susceptible-Infected-Recovered model of disease transmission.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
Reinforcement learning: hidden theory, and new super-fast algorithms
Lecture presented at the Center for Systems and Control (CSC@USC) and Ming Hsieh Institute for Electrical Engineering,
February 21, 2018
Stochastic Approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. The most famous examples today are TD- and Q-learning algorithms. The first half of this lecture will provide an overview of stochastic approximation, with a focus on optimizing the rate of convergence. A new approach to optimize the rate of convergence leads to the new Zap Q-learning algorithm. Analysis suggests that its transient behavior is a close match to a deterministic Newton-Raphson implementation, and numerical experiments confirm super fast convergence.
Based on
@article{devmey17a,
Title = {Fastest Convergence for {Q-learning}},
Author = {Devraj, Adithya M. and Meyn, Sean P.},
Journal = {NIPS 2017 and ArXiv e-prints},
Year = 2017}
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as dealing with latent variable models where the likelihood function is intractable. It then covers random variable generation techniques before introducing the key MCMC algorithms: the Metropolis-Hastings algorithm and the Gibbs sampler. The document outlines the remaining topics to be covered, which include Monte Carlo integration, notions of Markov chains, and further advanced topics.
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn
A tutorial, and very new algorithms -- more details on arXiv and at NIPS 2017 https://arxiv.org/abs/1707.03770
Part of the Data Science Summer School at École Polytechnique: http://www.ds3-datascience-polytechnique.fr/program/
---------
2018 Updates:
See Zap slides from ISMP 2018 for new inverse-free optimal algorithms
Simons tutorial, March 2018 [one month before most discoveries announced at ISMP]
Part I (Basics, with focus on variance of algorithms)
https://www.youtube.com/watch?v=dhEF5pfYmvc
Part II (Zap Q-learning)
https://www.youtube.com/watch?v=Y3w8f1xIb6s
Big 2017 survey on variance in SA:
Fastest convergence for Q-learning
https://arxiv.org/abs/1707.03770
You will find the infinite-variance Q result there.
Our NIPS 2017 paper is distilled from this.
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006Michael Soltys
The document discusses several topics in proof complexity and matrix algebra that can be expressed in Quantified Permutation Frege (QPK). It summarizes Mulmuley's algorithm for computing matrix rank in NC2 and shows how the Steinitz Exchange Lemma can be used to prove properties like the existence of matrix powers and the Cayley-Hamilton theorem in the theory of Quantified Propositional Logic with Arithmetic (QLA). Specifically, it shows that QLA can prove Cayley-Hamilton using Steinitz Exchange Lemma and the principle of Strong Linear Independence.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
The document summarizes the Metropolis-adjusted Langevin algorithm (MALA) for sampling from log-concave probability measures in high dimensions. It introduces MALA and different proposal distributions, including random walk, Ornstein-Uhlenbeck, and Euler proposals. It discusses known results on optimal scaling, diffusion limits, ergodicity, and mixing time bounds. The main result is a contraction property for the MALA transition kernel under appropriate assumptions, implying dimension-independent bounds on mixing times.
The document discusses sequential sampling rules, specifically the Composite Limited Adaptive Sequential Test (CLAST) rule. A sequential sampling rule allows for a more flexible procedure for hypothesis testing regarding sample size compared to fixed sampling procedures. The CLAST rule sets a lower and upper boundary on sample size and divides the region under the statistic's distribution into three areas: 1) a rejection area, 2) a maintenance area, and 3) an uncertainty area that demands an increase in sample size. The CLAST rule calculates the test statistic and associated p-value at each step, adding more subjects to the sample if the p-value falls within the uncertainty area until stopping conditions are met. The CLAST has parameters including the reference sample size for the
The document discusses statistical representation of random inputs in continuum models. It provides examples of representing random fields using the Karhunen-Loeve expansion, which expresses a random field as the sum of orthogonal deterministic basis functions and random variables. Common choices for the covariance function in the expansion include the radial basis function and limiting cases of fully correlated and uncorrelated fields. The covariance function can be approximated from samples of the random field to enable representation in applications.
- Bayesian techniques can be used for parameter estimation problems where parameters are considered random variables with associated densities rather than fixed unknown values.
- Markov chain Monte Carlo (MCMC) methods like the Metropolis algorithm are commonly used to sample from the posterior distribution when direct sampling is impossible due to high-dimensional integration. The algorithm constructs a Markov chain whose stationary distribution is the target posterior density.
- At each step, a candidate value is generated from a proposal distribution and accepted or rejected based on the posterior ratio to the previous value. Over many iterations, the chain samples converge to the posterior distribution.
The document discusses techniques for parameter selection and sensitivity analysis when estimating parameters from observational data. It introduces local sensitivity analysis based on derivatives to determine how sensitive model outputs are to individual parameters. Global sensitivity analysis techniques like ANOVA (analysis of variance) are also discussed, which quantify how parameter uncertainties contribute to uncertainty in model outputs. The ANOVA approach uses a Sobol decomposition to represent models as sums of parameter main effects and interactions, allowing variance-based sensitivity indices to be defined that quantify the influence of individual parameters and groups of parameters.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
The document proposes a new version of Hamiltonian Monte Carlo (HMC) sampling that is essentially calibration-free. It achieves this by learning the optimal leapfrog scale from the distribution of integration times using the No-U-Turn Sampler algorithm. Compared to the original NUTS algorithm on benchmark models, this new enhanced HMC (eHMC) exhibits significantly improved efficiency with no hand-tuning of parameters required. The document tests eHMC on a Susceptible-Infected-Recovered model of disease transmission.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
Reinforcement learning: hidden theory, and new super-fast algorithms
Lecture presented at the Center for Systems and Control (CSC@USC) and Ming Hsieh Institute for Electrical Engineering,
February 21, 2018
Stochastic Approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. The most famous examples today are TD- and Q-learning algorithms. The first half of this lecture will provide an overview of stochastic approximation, with a focus on optimizing the rate of convergence. A new approach to optimize the rate of convergence leads to the new Zap Q-learning algorithm. Analysis suggests that its transient behavior is a close match to a deterministic Newton-Raphson implementation, and numerical experiments confirm super fast convergence.
Based on
@article{devmey17a,
Title = {Fastest Convergence for {Q-learning}},
Author = {Devraj, Adithya M. and Meyn, Sean P.},
Journal = {NIPS 2017 and ArXiv e-prints},
Year = 2017}
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as dealing with latent variable models where the likelihood function is intractable. It then covers random variable generation techniques before introducing the key MCMC algorithms: the Metropolis-Hastings algorithm and the Gibbs sampler. The document outlines the remaining topics to be covered, which include Monte Carlo integration, notions of Markov chains, and further advanced topics.
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn
A tutorial, and very new algorithms -- more details on arXiv and at NIPS 2017 https://arxiv.org/abs/1707.03770
Part of the Data Science Summer School at École Polytechnique: http://www.ds3-datascience-polytechnique.fr/program/
---------
2018 Updates:
See Zap slides from ISMP 2018 for new inverse-free optimal algorithms
Simons tutorial, March 2018 [one month before most discoveries announced at ISMP]
Part I (Basics, with focus on variance of algorithms)
https://www.youtube.com/watch?v=dhEF5pfYmvc
Part II (Zap Q-learning)
https://www.youtube.com/watch?v=Y3w8f1xIb6s
Big 2017 survey on variance in SA:
Fastest convergence for Q-learning
https://arxiv.org/abs/1707.03770
You will find the infinite-variance Q result there.
Our NIPS 2017 paper is distilled from this.
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006Michael Soltys
The document discusses several topics in proof complexity and matrix algebra that can be expressed in Quantified Permutation Frege (QPK). It summarizes Mulmuley's algorithm for computing matrix rank in NC2 and shows how the Steinitz Exchange Lemma can be used to prove properties like the existence of matrix powers and the Cayley-Hamilton theorem in the theory of Quantified Propositional Logic with Arithmetic (QLA). Specifically, it shows that QLA can prove Cayley-Hamilton using Steinitz Exchange Lemma and the principle of Strong Linear Independence.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
The document summarizes the Metropolis-adjusted Langevin algorithm (MALA) for sampling from log-concave probability measures in high dimensions. It introduces MALA and different proposal distributions, including random walk, Ornstein-Uhlenbeck, and Euler proposals. It discusses known results on optimal scaling, diffusion limits, ergodicity, and mixing time bounds. The main result is a contraction property for the MALA transition kernel under appropriate assumptions, implying dimension-independent bounds on mixing times.
The document discusses sequential sampling rules, specifically the Composite Limited Adaptive Sequential Test (CLAST) rule. A sequential sampling rule allows for a more flexible procedure for hypothesis testing regarding sample size compared to fixed sampling procedures. The CLAST rule sets a lower and upper boundary on sample size and divides the region under the statistic's distribution into three areas: 1) a rejection area, 2) a maintenance area, and 3) an uncertainty area that demands an increase in sample size. The CLAST rule calculates the test statistic and associated p-value at each step, adding more subjects to the sample if the p-value falls within the uncertainty area until stopping conditions are met. The CLAST has parameters including the reference sample size for the
In spite of the recent developments in surrogate modeling techniques, the low fidelity of these models often limits their use in practical engineering design optimization. When surrogate models are used to represent the behavior of a complex system, it is challenging to simultaneously obtain high accuracy over the entire design space. When such surrogates are used for optimization, it becomes challening to find the optimum/optima with certainty. Sequential sampling methods offer a powerful solution to this challenge by providing the surrogate with reasonable accuracy where and when needed. When surrogate-based design optimization (SBDO) is performed using sequential sampling, the typical SBDO process is repeated multiple times, where each time the surrogate is improved by addition of new sam- ple points. This paper presents a new adaptive approach to add infill points during SBDO, called Adaptive Sequential Sampling (ASS). In this approach, both local exploitation and global exploration aspects are considered for updating the surrogate during optimization, where multiple iterations of the SBDO process is performed to increase the quality of the optimal solution. This approach adaptively improves the accuracy of the surrogate in the region of the current global optimum as well as in the regions of higher relative errors. Based on the initial sample points and the fitted surrogate, the ASS method adds infill points at each iteration in the locations of: (i) the current optimum found based on the fitted surrogate; and (ii) the points generated using cross-over between sample points that have relatively higher cross-validation errors. The Nelder and Mead Simplex method is adopted as the optimization algorithm. The effectiveness of the proposed method is illus- trated using a series of standard numerical test problems.
This document discusses various methods for collecting data and sampling strategies in nursing research. It begins by defining key terms like census, sample survey, experiment, and observational study as different data collection methods. It then covers advantages and disadvantages of these methods. The document primarily focuses on explaining different sampling strategies, including probability and non-probability techniques. It discusses considerations for calculating appropriate sample sizes and provides an example of a sample size calculation. Overall, the document provides an overview of collecting evidence and sampling in nursing research.
Field research and interaction design: course #3nicolas nova
Third deck of slides from the Field Research and Interaction Design, a Master course at the Geneva University of Art and Design, in the Media Design program taught in 2009-2010
The document discusses different types of sampling designs used in research, including probability and non-probability sampling. Probability sampling methods aim to give all members of the population an equal chance of being selected and include simple random sampling, systematic sampling, stratified sampling, and cluster sampling. Non-probability sampling methods do not use random selection and include convenience sampling, purposive sampling, and quota sampling. The key factors to consider in sampling design are determining the target population, parameters of interest, sampling frame, appropriate sampling method, and sample size.
This document discusses research methodology and sampling techniques. It defines key terms like population, sample, census, and probability and non-probability sampling. It describes different sampling methods like simple random sampling, systematic sampling, stratified sampling, cluster sampling, and their advantages and disadvantages. Finally, it discusses issues around internet sampling and methods like using web site visitors, panels, and opt-in lists.
This document discusses different types of sampling methods used in qualitative research. It defines key terms like sample, random sampling, and non-probability sampling. It then explains different sampling techniques in more detail, including simple random sampling, systematic random sampling, stratified random sampling, multi-stage cluster sampling, convenience sampling, snowball sampling, quota sampling, accidental sampling, panel sampling, and improving response rates. The document emphasizes that qualitative researchers are more concerned with understanding phenomena in depth than statistical validity or generalizability.
The document provides information on various sampling techniques used in research. It defines key terms like population, sample, sampling, and element. It describes different probability sampling techniques like simple random sampling, stratified random sampling, systematic random sampling, and cluster sampling. It also covers non-probability sampling techniques such as purposive sampling and convenience sampling. The document discusses the purposes, processes, merits, and limitations of different sampling methods.
Sampling Methods in Qualitative and Quantitative ResearchSam Ladner
This document discusses different types of sampling methods used in qualitative and quantitative research. It outlines the different assumptions researchers make regarding sampling in qualitative versus quantitative studies. A variety of sampling techniques are described for different research contexts such as ethnographic fieldwork, interviews, and content analysis.
The document discusses sample and sampling techniques used in research. It defines key terms like population, sample, sampling, and element. It describes two main sampling techniques - probability sampling which uses random selection, and non-probability sampling which uses non-random methods. Some examples of probability sampling techniques include simple random sampling, systematic sampling, stratified random sampling, cluster sampling, and multi-stage sampling. Examples of non-probability sampling include convenience sampling, quota sampling, and purposive sampling. Sample size is determined using formulas like Slovin's formula.
Sampling is the process of selecting a subset of individuals from within a population to estimate characteristics of the whole population. There are several sampling techniques including simple random sampling, stratified sampling, cluster sampling, systematic sampling, and non-probability sampling. Each technique has advantages and disadvantages related to accuracy, cost, and generalizability. Proper sampling helps reduce sampling errors and increase the reliability of making inferences about the population from a sample.
The document discusses key concepts in sampling, including:
- The target population is the group to which results will be generalized.
- Sampling units are the smallest elements that can be selected from the sampling frame.
- The sampling frame is the list from which potential respondents are drawn.
- Probability sampling methods like simple random sampling, stratified sampling, and cluster sampling aim to select a representative sample and allow estimates of sampling error. Non-probability methods do not involve random selection.
Computer hardware devices include webcams, scanners, mice, speakers, trackballs, and light pens. Webcams connect via USB or network and are used for video calls and conferencing. Scanners optically scan images and documents into digital formats. Mice are pointing devices that detect motion to move a cursor. Speakers have internal amplifiers and audio jacks. Trackballs contain ball and sensors to detect rotation for cursor movement. Light pens allow pointing directly on CRT displays.
This was a presentation that was carried out in our research method class by our group. It will be useful for PHD and master students quantitative and qualitative method. It consist sample definition, purpose of sampling, stages in the selection of a sample, types of sampling in quantitative researches, types of sampling in qualitative researches, and ethical Considerations in Data Collection.
The document discusses particle filtering and state-space processes. It provides an overview of two commonly used particle filters: the bootstrap filter and auxiliary particle filter. It also presents an example of applying particle filtering to a stochastic volatility model.
This document summarizes a presentation on controlled sequential Monte Carlo. It discusses state space models, sequential Monte Carlo, and particle marginal Metropolis-Hastings for parameter inference. Controlled sequential Monte Carlo is proposed to lower the variance of the marginal likelihood estimator compared to standard sequential Monte Carlo, improving the performance of parameter inference methods. The method is illustrated on a neuroscience example where it reduces variance for different particle sizes.
A crash coarse in stochastic Lyapunov theory for Markov processes (emphasis is on continuous time)
See also the survey for models in discrete time,
https://netfiles.uiuc.edu/meyn/www/spm_files/MarkovTutorial/MarkovTutorialUCSB2010.html
Looking Inside Mechanistic Models of CarcinogenesisSascha Zöllner
This talk discusses the basic mathematical approaches and motivations underlying mechanistic models of carcinogenesis, specifically multi-stage models. After discussing simple ODE-based deterministic models, stochastic cancer models are introduced. On the simplest examples of the 1-stage (Poisson) process and a minimal 2-stage model, the basic features of such models are laid out. We then proceed to treat the widely used two-stage model with clonal expansion (TSCE), and its application to calculating risks due to external agents, such as radiation.
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning.
Part 3 covers: 1. Motivation: Average-case versus worst-case in high dimensions 2. Algorithm halting times (runtimes) 3. Outlook
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
Markov chain Monte Carlo (MCMC) methods are commonly used to approximate properties of target probability distributions. However, MCMC estimators are generally biased for any fixed number of samples. The document discusses various techniques for constructing unbiased estimators from MCMC output, including regeneration, sequential Monte Carlo samplers, and coupled Markov chains. Specifically, running two Markov chains in parallel and taking the difference in their values at meeting times can yield an unbiased estimator, though certain conditions must hold.
This document discusses recent advances in Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) methods. It introduces Markov chain and sequential Monte Carlo techniques such as the Hastings-Metropolis algorithm, Gibbs sampling, data augmentation, and space alternating data augmentation. These techniques are applied to problems such as parameter estimation for finite mixtures of Gaussians.
The main machine learning algorithms are built upon various mathematical foundations such as statistics, optimization, and probability. Will this also hold true for Artificial Intelligence? In this presentation, I will showcase some recent examples of interactions between machine learning and mathematics.
Colloquium @ CEREMADE (October 3, 2023)
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
The document discusses modeling dynamic systems and earthquake response. It covers basic concepts like Fourier transforms, single and multi-degree of freedom systems, modal analysis, and elastic response spectra. Numerical methods are presented for dynamic analysis in the frequency and time domains, including the finite element method and method of complex response. Examples of earthquake records, harmonic motion, and Fourier transforms are shown.
The document discusses modeling dynamic systems and earthquake response. It covers basic concepts like Fourier transforms, single and multi-degree of freedom systems, modal analysis, and elastic response spectra. Numerical methods are presented for dynamic analysis in the frequency and time domains, including the finite element method and method of complex response. Examples of earthquake records and harmonic motion are shown.
The document discusses modeling dynamic systems and earthquake response. It covers basic concepts like Fourier transforms, single and multi-degree of freedom systems, modal analysis, and elastic response spectra. Numerical methods are presented for dynamic analysis in the frequency and time domains, including the finite element method and method of complex response. Examples of earthquake records and harmonic motion are shown.
This document summarizes a talk given by Pierre E. Jacob on recent developments in unbiased Markov chain Monte Carlo methods. It discusses:
1. The bias inherent in standard MCMC estimators due to the initial distribution not being the target distribution.
2. A method for constructing unbiased estimators using coupled Markov chains, where two chains are run in parallel until they meet, at which point an estimator involving the differences in the chains' values is returned.
3. Conditions under which the coupled chain estimators are unbiased and have finite variance. Examples are given of how to construct coupled versions of common MCMC algorithms like Metropolis-Hastings and Gibbs sampling.
This document describes unbiased Markov chain Monte Carlo (MCMC) methods using coupled Markov chains. It begins by discussing how standard MCMC estimators are biased due to initialization and finite simulation length. It then introduces the idea of running two coupled Markov chains such that they meet and become equal after some meeting time τ. The difference in function values between the chains can then be used to construct an unbiased estimator. Several methods for designing coupled chains that meet this criterion are described, including couplings of popular MCMC algorithms like Metropolis-Hastings. Conditions under which the resulting estimators are guaranteed to be unbiased and have good statistical properties are also outlined.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
This document discusses the calculus of variations and its application to optimal control problems. It begins by introducing the fundamental problem of finding functions that minimize cost functionals, which are functions of other functions. It then derives the necessary conditions for an extremum by taking variations of the functional. This leads to the Euler-Lagrange equation, the analogue of setting the gradient to zero for functions. The document provides examples of applying these concepts to problems with scalar functions and vector functions, as well as problems with free terminal times.
Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approa...Gota Morota
The document discusses modeling allele frequency changes over time as stochastic processes. It describes allele frequencies changing as random walks or Brownian motion. It presents the Fokker-Planck equation for describing the probability distribution of allele frequencies over time under various evolutionary forces like genetic drift, selection, and mutation. The steady state distribution of allele frequencies and solutions to the Fokker-Planck equation are discussed for different evolutionary scenarios. Time series analysis methods are introduced for modeling allele frequency change as a discrete time process. An example application to cattle genotype data is shown.
Unbiased Markov chain Monte Carlo methods Pierre Jacob
This document describes unbiased Markov chain Monte Carlo methods for approximating integrals with respect to a target probability distribution π. It introduces the idea of coupling two Markov chains such that their states are equal with positive probability, which can be used to construct an unbiased estimator of integrals of the form Eπ[h(X)]. The document outlines conditions under which the proposed estimator is unbiased and has finite variance. It also discusses implementations of coupled Markov chains for common MCMC algorithms like Metropolis-Hastings and Gibbs sampling.
Similar to Sampling strategies for Sequential Monte Carlo (SMC) methods (20)
Sampling strategies for Sequential Monte Carlo (SMC) methods
1. Sampling strategies for
Sequential Monte Carlo methods
Arnaud Doucet1
, St´ephane S´en´ecal2
1
Department of Engineering, University of Cambridge
2
The Institute of Statistical Mathematics
2004
thanks to the Japanese Ministry of Education and the Japan Society for
the Promotion of Science
1
2. Overview
– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling
– Examples, applications
– References
2
3. Estimation of state space models
xt = ft(xt−1, ut) yt = gt(xt, vt)
p(x0:t|y1:t) → p(xt|y1:t) = p(x0:t|y1:t)dx0:t−1
distribution of x0:t ⇒ computation of estimate x0:t :
x0:t = x0:tp(x0:t|y1:t)dx0:t → Ep(.|y1:t){f(x0:t)}
x0:t = arg max
x0:t
p(x0:t|y1:t)
3
4. Computation of the estimates
p(x0:t|y1:t) ⇒ multidimensionnal, non-standard distributions :
→ analytical, numerical approximations
→ integration, optimisation methods
⇒ Monte Carlo techniques
4
5. Monte Carlo approach
compute estimates for distribution π(.) → samples x1, . . . , xN ∼ π
x
pi(x)
x_1 x_N
⇒ distribution πN = 1
N
N
i=1 δxi approximates π(.)
5
6. Monte Carlo estimates
SN (f) =
1
N
N
i=1
f(xi) −→ f(x)π(x)dx = Eπ{f(x)}
arg max(xi)1≤i≤N
πN (xi) approximates arg maxx π(x)
⇒ sampling xi ∼ π difficult
→ importance sampling techniques
6
8. Importance Sampling
xi ∼ g = π → (xi, wi) weighted sample
⇒ weight wi =
π(xi)
g(xi)
x
g(x)
pi(x)
x_Nx_1
8
9. Estimation
importance sampling → computation of Monte Carlo estimates
e. g. expectations Eπ{f(x)} :
f(x)
π(x)
g(x)
g(x)dx = f(x)π(x)dx
N
i=1
wif(xi) → f(x)π(x)dx = Eπ{f(x)}
dynamic model (xt, yt) ⇒ recursive estimation x0:t−1 → x0:t
Monte Carlo techniques ⇒ sampling sequences x
(i)
0:t−1 → x
(i)
0:t
9
10. Sequential simulation
sampling sequences x
(i)
0:t ∼ πt(x0:t) recursively :
time
variable
state
x
p(x,t) target distribution:
t
t2
t1
p(x,t2)
x_t1
x_t2
p(x_t1)
p(x_t2)
p(x,t1)
10
11. Sequential simulation : importance sampling
samples x
(i)
0:t ∼ πt(x0:t) approximated by weighted particles
(x
(i)
0:t, w
(i)
t )1≤i≤N
time
p(x,t) target distribution:
p(x,t2)
t
t2
t1
x
p(x,t1)
11
12. Sequential importance sampling
diffusing particles x
(i)
0:t1
→ x
(i)
0:t2
time
p(x,t) target distribution:
p(x,t2)
t
x
p(x,t1)
t2
t1
⇒ sampling scheme x
(i)
0:t−1 → x
(i)
0:t
12
13. Sequential importance sampling
updating weights w
(i)
t1
→ w
(i)
t2
time
p(x,t) target distribution:
p(x,t2)
t
p(x,t1)
x
t2
t1
⇒ updating rule w
(i)
t−1 → w
(i)
t
13
14. Sequential Importance Sampling
x0:t ∼ πt(x0:t) ⇒ (x
(i)
0:t, w
(i)
t )1≤i≤N
Simulation scheme t − 1 → t :
– Sampling step x
(i)
t ∼ qt(xt|x
(i)
0:t−1)
– Updating weights
w
(i)
t ∝ w
(i)
t−1 ×
πt(x
(i)
0:t−1, x
(i)
t )
πt−1(x
(i)
0:t−1)qt(x
(i)
t |x
(i)
0:t−1)
incremental weight (iw)
normalizing
N
i=1 w
(i)
t = 1
14
18. Sequential Importance Sampling/Resampling
Simulation scheme t − 1 → t :
– Sampling step x
,(i)
t ∼ qt(x,
t|x
(i)
0:t−1)
– Updating weights w
(i)
t ∝ w
(i)
t−1 ×
πt(x
(i)
0:t−1,x
,(i)
t )
πt−1(x
(i)
0:t−1)qt(x
,(i)
t |x
(i)
0:t−1)
→ parallel computing
– ⇒ Resampling step : sample N paths from (x
(i)
0:t−1, x
,(i)
t )1≤i≤N
→ particles interacting : computation at least O(N)
18
19. SISR for recursive estimation of state space models
xt = ft(xt−1, ut) → p(xt|xt−1)
yt = gt(xt, vt) → p(yt|xt)
Usual SISR : Bootstrap filter (Gordon et al. 93, Kitagawa 96) :
– Sampling step x
(i)
t ∼ p(xt|x
(i)
t−1)
– Updating weights : incremental weight w
(i)
t ∝ w
(i)
t−1 × iw
iw ∝ p(yt|x
(i)
t )
– Stratified/Deterministic resampling
efficient, easy, fast for a wide class of models
→ tracking, time series
19
20. Overview - Break
– Introduction :
→ state space models
→ estimation, computating estimates via Monte Carlo methods
→ importance sampling
– recursive estimation → sequential simulation
⇒ Sequential Importance Sampling/Resampling
– ⇒ Strategies for sampling :
→ designing/sampling “optimal” candidate distribution
→ considering blocks of variables : reweighting, → sampling
– Examples and applications
20
21. Improving simulation
sampling multimodal, multidimensional distributions
model with informative observation → peaky likelihood
→ prior dynamics to diffuse particles : poor approximation results
→ efficient propagation for a finite number of particles N
⇒ need for good sampling proposals
21
22. Improving simulation
Optimal proposal distribution qt(xt|x
(i)
0:t−1)
→ mimimizing variance of incremental weight (w
(i)
t ∝ w
(i)
t−1 × iw)
iw =
πt(x
(i)
0:t−1, x
(i)
t )
πt−1(x
(i)
0:t−1)qt(x
(i)
t |x
(i)
0:t−1)
⇒ 1-step ahead predictive :
πt(xt|x0:t−1) = p(xt|xt−1, yt)
⇒ incremental weight :
iw →
πt(x0:t−1)
πt−1(x0:t−1)
=
p(x0:t−1|y1:t)
p(x0:t−1|y1:t−1)
∝ p(yt|xt−1) = p(yt|xt)p(xt|xt−1)dxt
22
23. Approximations
Sampling the predictive distribution πt(xt|x0:t−1) = p(xt|xt−1, yt) :
– expansions of the p.d.f. or log(p.d.f.), Taylor
– mixture models : Gaussian i πiN(µi, σ2
i )
– Accept/Reject schemes
– Markov chain schemes : Metropolis-Hastings, Gibbs sampler
– dynamic stochastic simulation (Hybrid Monte Carlo)
– augmented sampling spaces :
→ slice samplers
→ auxiliary variables
23
24. Auxiliary variables
Pitt and Shephard 99 : approximating predictive p(xt|x
(k)
t−1, yt)
via augmented sampling space → p(xt, k|x
(k)
t−1, yt)
x_t
p(x_t/y_t)
x_t^(j)
x_t−1^(1)0
x_t−1
p(x_t−1/y_t−1)
1 x_t−1^(j) 3 x_t−1^(k)
x_t−1(N)0
x_t−1^(i)2
x_t^(i2)x_t^(i1) x_t^(k1) x_t^(k3)
x_t^(k2)
index of particle k (→ number of offsping(s) of particle x
(k)
t−1) ∼ .|yt
⇒ boost particles with high likelihood
24
25. Auxiliary variables
→ importance sampling for p(xt, k|x
(k)
t−1, yt) :
candidate distribution :
g(xt, k|xt−1, yt) ∝ p(yt|µ
(k)
t )p(xt|x
(k)
t−1)
where µ
(k)
t = mean, mode, draw from xt|x
(k)
t−1
x_t
p(x_t/x_t−1^(k))
mean maximummu_t^(k)= ,
25
26. Auxiliary variables
– sample (x
(j)
t , kj)1≤j≤R from g(xt, k|x
(k)
t−1, yt) :
k ∼ g(k|xt−1, yt) ∝ p(yt|µ
(k)
t )p(xt|x
(k)
t−1)dxt = p(yt|µ
(k)
t )
xt ∼ p(xt|x
(k)
t−1)
– reweighting (x
(j)
t , kj) with
wj ∝
p(yt|x
(j)
t )
p(yt|µ
(kj )
t )
– resample N paths from (x
(kj )
0:t−1, x
(j)
t )1≤j≤R with second stage
weights wj
26
28. Approaches using a block of variables
– discrete distributions : Meirovitch 85
– reweighting before resampling :
auxiliary variables Pitt and Shephard 99,
Wang et al. 02
⇒ discrete distribution → analytical form for proposal
xt ∼ πt+L(xt|x0:t−1) = πt+L(xt:t+L|x0:t−1)dxt+1:t+L
Meirovitch 85 : growing a polymer, random walk in discrete space
→ complexity X L
for lag L
28
32. Approaches using a block of variables
→ auxiliary variables : Pitt and Shephard 99
proposal distribution :
p(xt, k|x
(k)
t−1, yt:t+L) ∝ p(xt:t+L, k|x
(k)
t−1, yt:t+L)dxt+1:t+L
approximated with importance sampling :
g(xt, k|x
(k)
t−1, yt:t+L) = p(yt+L|µ
(k)
t+L) . . . p(yt|µ
(k)
t )p(xt|x
(k)
t−1)
→ sample (x
(j)
t , kj)1≤j≤R
kj ∼ g(k|yt:t+L) ∝ p(yt+L|µ
(k)
t+L) . . . p(yt|µ
(k)
t )
x
(j)
t ∼ p(xt|x
(kj )
t−1 )
32
33. Approaches using a block of variables
auxiliary variables → resampling from (x
(j)
t )1≤j≤R :
→ propagate/sample x
(j)
t+1 → x
(j)
t+L with prior transitions p(xt|xt−1)
→ use second stage weights : w
(j)
t ∝ w
(j)
t−1 × iw
iw ∝
p(yt+L|x
(j)
t+L) . . . p(yt|x
(j)
t )
p(yt+L|µ
(kj )
t+L) . . . p(yt|µ
(kj )
t )
for resampling N paths (x
(i)
0:t)1≤i≤N
33
34. Approaches using a block of variables
→ reweighting before resampling : Wang et al. 02
0
x_t^(i)
x_t
t
w_t^(i)
a_t^(i)
t t+L
x_t+L^(i_j)
a_t^(i_j)
propagate particles x
(i)
t → x
(ij )
t+1:t+L for j = 1, . . . , R
compute weights a
(ij )
t , particle path x
(i)
0:t reweighted with e.g.
a
(i)
t =
R
j=1 a
(ij )
t
α
, resampling from the set (x
(i)
0:t, a
(i)
t )i=1,...,N
34
35. Reweighting
→ need to sample/propagate xt from a/by block of variables :
πt+L(xt|x0:t−1) = πt+L(xt:t+L|x0:t−1)dxt+1:t+L
⇒ sampling a block of variables
→ design a proposal/candidate distribution
35
36. Sampling recursively a block of variables
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
36
37. Sampling a block of variables
t−L t−L+1 tt−1
t−L+1x’(
0
:0 t−1x( )
:t)
direct sampling :
xt−L+1:t ∼ qt(xt−L+1:t|x0:t−1)
37
38. Sampling a block of variables
t−L t−L+1 tt−1
t−L+1x’(
t−L+1x(
0
:0 t−1x(
:0 t−Lx(
:t)
)
)
)t−1:
proposal/candidate distribution for the block :
(x0:t−L, xt−L+1:t) ∼ πt−1(x0:t−1)qt(xt−L+1:t|x0:t−1)dxt−L+1:t−1
38
39. Sampling a block of variables
⇒ Idea : consider extended block of variables
(x0:t−L, xt−L+1:t) → (x0:t−L, xt−L+1:t−1, xt−L+1:t) = (x0:t−1, xt−L+1:t)
t−L t−L+1 tt−1
t−L+1x’(
t−L+1x(
0
:0 t−Lx(
:t)
)
)t−1:
39
40. Sampling a block of variables
candidate distribution for extended block
(x0:t−L, xt−L+1:t−1, xt−L+1:t) = (x0:t−1, xt−L+1:t) :
(x0:t−1, xt−L+1:t) ∼ πt−1(x0:t−1)qt(xt−L+1:t|x0:t−1)
direct sampling :
(x0:t−L, xt−L+1:t) ∼ πt−1(x0:t−1)qt(xt−L+1:t|x0:t−1)dxt−L+1:t−1
40
41. Sampling a block of variables
target distribution for the block (x0:t−L, xt−L+1:t) :
πt(x0:t−L, xt−L+1:t)
⇒ auxiliary target distribution for the extended block
(x0:t−1, xt−L+1:t) = (x0:t−L, xt−L+1:t−1, xt−L+1:t) :
πt(x0:t−L, xt−L+1:t)rt(xt−L+1:t−1|x0:t−L, xt−L+1:t)
with rt = any conditional distribution
⇒ proposal + target distributions → importance sampling
41
43. Sampling techniques for a block of variables
sampling the block xt−L+1:t ∼ qt(xt−L+1:t|x0:t−1) :
→ forward-backward recursion : e. g. Carter and Kohn 94
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
43
44. Sampling techniques for a block of variables
→ forward-backward recursion :
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
→ approximations : expansions, mixture models, MCMC, . . .
44
45. Improving simulation
Optimal proposal qt(xt−L+1:t|x0:t−1) distribution :
→ mimimizing variance of incremental weight w
(i)
t ∝ w
(i)
t−1 × iw :
iw =
πt(x0:t−L, xt−L+1:t)rt(xt−L+1:t−1|x0:t−L, xt−L+1:t)
πt−1(x0:t−1)qt(xt−L+1:t|x0:t−1)
⇒ qt = L-step ahead predictive
πt(xt−L+1:t|x0:t−L) = p(xt−L+1:t|xt−L, yt−L+1:t)
For one variable : optimal qt = 1-step ahead predictive
πt(xt|x0:t−1) = p(xt|xt−1, yt)
45
46. Improving simulation
→ block of variables ⇒ optimal proposal and target distribution
mimimizing variance of incremental weight w
(i)
t ∝ w
(i)
t−1 × iw
iw =
πt(x0:t−L, xt−L+1:t)rt(xt−L+1:t−1|x0:t−L, xt−L+1:t)
πt−1(x0:t−1)qt(xt−L+1:t|x0:t−1)
→ optimal conditional distribution rt(xt−L+1:t−1|x0:t−L, xt−L+1:t)
⇒ rt = (L − 1)-step ahead predictive
πt−1(xt−L+1:t−1|x0:t−L) = p(xt−L+1:t−1|xt−L, yt−L+1:t−1)
46
47. Improving simulation
For optimal qt and rt, incremental weight w
(i)
t ∝ w
(i)
t−1 × iw :
iw →
πt(x0:t−L)
πt−1(x0:t−L)
=
p(x0:t−L|y1:t)
p(x0:t−L|y1:t−1)
∝ p(yt|xt−L, yt−L+1:t−1)
∝ p(yt, xt−L+1:t|xt−L, yt−L+1:t−1)dxt−L+1:t
SISR for one variable with optimal proposal qt :
iw →
πt(x0:t−1)
πt−1(x0:t−1)
= p(yt|xt−1) = p(yt|xt)p(xt|xt−1)dxt
Bootstrap filter : iw = p(yt|xt)
47
49. Overview - Break
– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling :
→ “optimal” candidate distribution
sampling with e.g. auxiliary variables
→ considering a block of variables : reweighting
⇒ sampling a block of variables :
definition of importance sampling for a block
performing sampling → “optimal” candidate distribution
– ⇒ Examples, applications :
→ simple, complex models
→ why the sampling strategy for particles can be crucial ?
49
50. Example
Linear and Gaussian state space model :
xt = αxt−1 + ut x0, ut ∼ N(0, 1)
yt = xt + vt vt ∼ N(0, σ2
)
Sequential Monte Carlo methods :
– Bootstrap filter, proposal p(xt|xt−1)
– SISR with optimal proposal p(xt|xt−1, yt)
– SISR for blocks with optimal proposal p(xt−L+1:t|xt−L, yt−L+1:t)
computed by forward-backward exact recursions
⇒ estimates compared with Kalman filter results
⇒ approximation of target distribution p(xt|y1:t)
50
51. Estimation
0 10 20 30 40 50 60 70 80 90 100
−4
−3
−2
−1
0
1
2
3
4
time index
x(t)
x(t) and estimates
model (α, σ) = (0.9, 0.1) xt =
N
i=1 w
(i)
t x
(i)
t N=100
51
52. Approximation of the target distribution
⇒ Effective Sample Size :
ESS =
1
N
i=1[w
(i)
t ]2
w(i)
= 1
N : ESS = N
pi(x_t)
x_t
w(i)
≈ 0 ∀i except one : ESS = 1
x_t
pi(x_t)
⇒ Resampling performed for ESS ≤ N
2 , N
10
52
53. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100
0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
time index
ESS
Efficient Sample Size (ESS)
ESS for Bootstrap (→), SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
53
54. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100, 100 time steps
algorithm ESS resampling steps CPU time
Bootstrap 11.19 99 0.84
optimal SISR 77.1 2 0.12
Block-SISR L = 2 99.1 1 0.23
54
55. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100, ∞ time steps
algorithm ESS resampling steps CPU time
Bootstrap 10 100% ∝0.84
optimal SISR 75 0.04% ∝0.12
Block-SISR L = 2 99 0% ∝0.23
55
56. Approximation of the target distribution
Resampling for ESS ≤ N
2 , various N
algorithm ESS resampling steps
Bootstrap 10%N 100%
optimal SISR 75%N 0.04%
Block-SISR L = 2 99%N 0%
computational complexity : resampling O(N) → CPU time
56
57. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
100 150 200 250 300 350 400 450 500
0
5
10
15
20
25
30
35
40
Number of particles N
CPUtime
Bootstrap (→), SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
57
58. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
10
20
30
40
50
60
70
number of particles N
CPUtime
CPU time/number of particles N
SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
58
59. Sequential Monte Carlo methods
for this model :
– Same estimation results as Kalman filtering
– sampling ⇒ = approximation for the target distribution
– N ≤ 500 : computational complexity, CPU time
→ SISR with optimal proposal p(xt|xt−1, yt)
– N ≥ 500 : → block SISR with optimal proposal
p(xt−L+1:t|xt−L, yt−L+1:t)
59
60. Sampling strategies
xt = αxt−1 + ut x0, ut ∼ N(0, σ2
u)
yt = xt + vt vt ∼ N(0, σ2
v)
– σv=0.1 → observation yt very informative/prior σu=1.0
⇒ take into account yt for diffusing particles
p(xt|xt−1) → p(xt|xt−1, yt) ⇒ ESS
– α=0.9 → variables (xt)t correlated
⇒ sampling by block xt−L+1:t
block of observations yt−L+1:t more informative/single one yt
p(xt|xt−1, yt) → p(xt−L+1:t|xt−L, yt−L+1:t) ⇒ ESS
60
61. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100
0 20 40 60 80 100
0
20
40
60
80
100
0 20 40 60 80 100
0
20
40
60
80
100
120
0 20 40 60 80 100
0
20
40
60
80
100
120
0 20 40 60 80 100
0
20
40
60
80
100
ESS for Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
61
62. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
100 200 300 400 500
0
10
20
30
40
100 200 300 400 500
0
5
10
15
20
100 200 300 400 500
0
5
10
15
20
100 200 300 400 500
0
10
20
30
40
Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
62
63. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
0 2000 4000 6000 8000 10000
0
200
400
600
800
0 2000 4000 6000 8000 10000
0
100
200
300
400
0 2000 4000 6000 8000 10000
0
200
400
600
800
0 2000 4000 6000 8000 10000
0
50
100
150
200
250
300
350
SISR with optimal proposals for 1 (→), 2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
63
64. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100
0 20 40 60 80 100
0
20
40
60
80
100
0 20 40 60 80 100
30
40
50
60
70
80
90
100
0 20 40 60 80 100
0
20
40
60
80
100
0 20 40 60 80 100
40
50
60
70
80
90
100
ESS for Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
64
65. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
100 200 300 400 500
0
5
10
15
20
100 200 300 400 500
0
5
10
15
20
25
100 200 300 400 500
0
5
10
15
20
100 200 300 400 500
0
5
10
15
20
Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
65
66. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
0 2000 4000 6000 8000 10000
0
100
200
300
400
0 2000 4000 6000 8000 10000
0
50
100
150
200
250
0 2000 4000 6000 8000 10000
0
50
100
150
200
250
300
350
0 2000 4000 6000 8000 10000
0
50
100
150
200
SISR with optimal proposals for 1 (→), 2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
66
67. Overview- Break
– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling
– Applications : Linear and Gaussian model
⇒ sampling strategy :
→ approximation of the target distribution, CPU time
→ information in observation, dynamic of the state variable
→ nonlinear non-Gaussian models
67
68. Example
Nonlinear state space model :
xt = α(xt−1 + βx3
t−1) + ut x0, ut ∼ N(0, σ2
u)
yt = xt + vt vt ∼ N(0, σ2
v)
Sequential Monte Carlo methods :
– Bootstrap filter, proposal p(xt|xt−1)
– SISR with optimal proposal p(xt|xt−1, yt) approximated by
KF/EKF
– SISR for blocks with optimal proposal p(xt−L+1:t|xt−L, yt−L+1:t)
approximated by forward-backward recursions with KF/EKF
Parameters values α=0.9, β=0.2, σu=0.1 and σv=0.05
⇒ approximation of target distribution p(xt|y1:t)
68
69. Simulation results
algorithm MSE ESS RS CPU
Bootstrap 0.0021 36.8 70.3 % 0.68
SISR-KF 0.0019 64.7 19.3% 0.44
SISR-EKF 0.0019 65.8 19.2% 0.48
BSISR-KF 0.0018 72.3 0.9% 0.21
BSISR-EKF 0.0018 73.5 0.8% 0.24
N = 100 particles, 100 runs of particle filters for a single and for a
block of L = 2 variables (MSE from KF/EKF = 0.0034).
69
70. Approximation of the target distribution
Resampling for ESS ≤ N
2 , N = 100
0 20 40 60 80 100 120 140 160 180 200
0
10
20
30
40
50
60
70
80
90
100
time index
EffectiveSampleSize
Approximated ESS vs. time index for a realization of the Bootstrap
filter (dotted), the SISR with Kalman filter proposal for a single
variable (dashdotted) and for a block of L=2 variables (straight).
70
71. Simulation results
block size L N=100 N=500 N=1000 RS
2 74 370 715 0.9%
3 96 493 985 0.9%
4 99 496 989 1%
5 98 494 988 1%
10 97 486 972 2.5%
Approximated ESS averaged over 100 runs of particle filters for
blocks of L variables, considering N particles.
71
72. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
100 200 300 400 500 600 700 800 900 1000
0
1
2
3
4
5
6
7
Number of Particles
ComputingTime
CPU time vs. N for bootstrap filter (dotted), SISR with KF proposal
for a single variable (KF : dashed, EKF : dashdotted) and for a block
of L=2 variables (straight), 100 realizations.
72
73. CPU time / number of particles N
Resampling for ESS ≤ N
2 , 1,000 time steps
2000 4000 6000 8000 10000
0
2
4
6
8
10
12
14
Number of Particles
ComputingTime
Computational time vs. N for block sampling scheme with lags from
L=2 (bottom), 3, 4, 5, 10 (top), 100 realizations.
73
74. Sequential Monte Carlo methods
for this model :
– Good approximations of the target distribution
– sampling ⇒ = approximation for the target distribution
– even for small N, block SISR with approximated optimal proposal
p(xt−L+1:t|xt−L, yt−L+1:t) is efficient for L=3, 4, 5
– → information in observation : σu=0.1, σv=0.05
74
75. Conclusion
⇒ Importance of proposal/candidate distribution for
Sequential Monte Carlo simulation methods
design of proposal :
→ information in observation, dynamic of the state variable :
p(xt|xt−1) ←→ p(xt|yt, xt−1) ←→ p(xt|yt)
→ sampling a block/fixed lag of variables can be useful :
– for intermittent/informative observation, correlated variables
– applications ⇒ tracking, radar, navigation, positioning . . .
75
76. References - SISR, Sequential Monte Carlo
– N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear and non-Gaussian Bayesian state estimation,”
Proceedings IEE-F, vol. 140, pp. 107–113, 1993.
– G. Kitagawa, “Monte carlo filter and smoother for non-Gaussian
nonlinear state space models,” J. Comput. Graph. Statist., vol. 5,
pp. 1–25, 1996.
– A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte
Carlo methods in practice, Statistics for engineering and
information science. Springer, 2001.
76
77. References - block/fixed lag approaches
– H. Meirovitch, “Scanning method as an unbiased simulation
technique and its application to the study of self-avoiding random
walks,” Phys. Rev. A, vol. 32, pp. 3699–3708, 1985.
– M. K. Pitt and N. Shephard, “Filtering via simulation : auxiliary
particle filter,” J. Am. Stat. Assoc., vol. 94, pp. 590–599, 1999.
– X. Wang, R. Chen, and D. Guo, “Delayed-pilot sampling for
mixture Kalman filter with application in fading channels,” IEEE
Trans. Sig. Proc., vol. 50, pp. 241–253, 2002.
77
78. References - block/fixed lag sampling methods
– A. Doucet and S. S´en´ecal, “Fixed-Lag Sequential Monte Carlo”,
accepted at EUSIPCO 2004.
– S. S´en´ecal and A. Doucet, “An example of sequential Monte Carlo
block sampling method,” AIC2003 Science of Modeling,
pp. 418-419, 2003.
– C. K. Carter and R. Kohn, “On the Gibbs sampling for state space
models,” Biometrika, vol. 81, pp. 541–553, 1994.
78