This document provides a brief history of Markov chain Monte Carlo (MCMC) methods. It describes how MCMC originated from early Monte Carlo methods developed during World War II to simulate nuclear weapons. The first true MCMC algorithm, known as the Metropolis algorithm, was published in 1953 and aimed to sample from complicated probability distributions by constructing a Markov chain with a desired stationary distribution. However, MCMC methods did not gain widespread use in statistics until the late 1980s and early 1990s, partly due to lack of computing power and understanding of Markov chains.
Why should you care about Markov Chain Monte Carlo methods?
→ They are in the list of "Top 10 Algorithms of 20th Century"
→ They allow you to make inference with Bayesian Networks
→ They are used everywhere in Machine Learning and Statistics
Markov Chain Monte Carlo methods are a class of algorithms used to sample from complicated distributions. Typically, this is the case of posterior distributions in Bayesian Networks (Belief Networks).
These slides cover the following topics.
→ Motivation and Practical Examples (Bayesian Networks)
→ Basic Principles of MCMC
→ Gibbs Sampling
→ Metropolis–Hastings
→ Hamiltonian Monte Carlo
→ Reversible-Jump Markov Chain Monte Carlo
This document provides an introduction to genetic algorithms and genetic programming. It discusses how genetic algorithms are inspired by natural selection and genetics, using operations like crossover and mutation to evolve solutions to problems. It also outlines the basic steps of a genetic programming framework, including generating an initial population randomly, evaluating fitness, selecting parents, performing crossover and mutation to create offspring, and iterating until a solution is found. Representation using syntax trees and example genetic operators like single point crossover are described.
This document provides an introduction to Bayesian analysis and Metropolis-Hastings Markov chain Monte Carlo (MCMC). It explains the foundations of Bayesian analysis and how MCMC sampling methods like Metropolis-Hastings can be used to draw samples from posterior distributions that are intractable. The Metropolis-Hastings algorithm works by constructing a Markov chain with the target distribution as its stationary distribution. The document provides an example of using MCMC to perform linear regression in a Bayesian framework.
This document discusses the relationships between various performance measures used to evaluate and compare risk prediction models, including the net reclassification improvement (NRI) and decision-analytic measures. It explains that the NRI at a risk threshold T is equal to the difference in sensitivity plus specificity (Youden index) between two models. Decision-analytic measures like net benefit and relative utility are preferable to NRI as they account for different costs of misclassifying patients with or without the outcome. The document also presents a case study comparing a ovarian tumor risk prediction model with and without the CA-125 tumor marker, finding a negative NRI but positive net benefit and relative utility at threshold T=5%.
This document provides an overview of online random forest, a machine learning algorithm that can handle streaming data. It discusses how traditional supervised learning algorithms require a static data matrix as input, while streaming data has an explicit time order and new observations can arrive at any time. Online random forest allows trees to learn incrementally from each new observation and drop trees from the forest that perform poorly, enabling it to adapt to changes in important predictors over time. It also scales well by distributing tree computations across actors in a distributed system.
An Introduction to Causal Discovery, a Bayesian Network ApproachCOST action BM1006
This gene ranked 152nd based on correlation alone. Using causal reasoning and Bayesian networks, the researchers were able to better identify genes that could causally influence the disease state, rather than just being correlated. This integrative approach combining genetic and gene expression data provided more insights into disease causality than traditional correlation-based methods alone.
Why should you care about Markov Chain Monte Carlo methods?
→ They are in the list of "Top 10 Algorithms of 20th Century"
→ They allow you to make inference with Bayesian Networks
→ They are used everywhere in Machine Learning and Statistics
Markov Chain Monte Carlo methods are a class of algorithms used to sample from complicated distributions. Typically, this is the case of posterior distributions in Bayesian Networks (Belief Networks).
These slides cover the following topics.
→ Motivation and Practical Examples (Bayesian Networks)
→ Basic Principles of MCMC
→ Gibbs Sampling
→ Metropolis–Hastings
→ Hamiltonian Monte Carlo
→ Reversible-Jump Markov Chain Monte Carlo
This document provides an introduction to genetic algorithms and genetic programming. It discusses how genetic algorithms are inspired by natural selection and genetics, using operations like crossover and mutation to evolve solutions to problems. It also outlines the basic steps of a genetic programming framework, including generating an initial population randomly, evaluating fitness, selecting parents, performing crossover and mutation to create offspring, and iterating until a solution is found. Representation using syntax trees and example genetic operators like single point crossover are described.
This document provides an introduction to Bayesian analysis and Metropolis-Hastings Markov chain Monte Carlo (MCMC). It explains the foundations of Bayesian analysis and how MCMC sampling methods like Metropolis-Hastings can be used to draw samples from posterior distributions that are intractable. The Metropolis-Hastings algorithm works by constructing a Markov chain with the target distribution as its stationary distribution. The document provides an example of using MCMC to perform linear regression in a Bayesian framework.
This document discusses the relationships between various performance measures used to evaluate and compare risk prediction models, including the net reclassification improvement (NRI) and decision-analytic measures. It explains that the NRI at a risk threshold T is equal to the difference in sensitivity plus specificity (Youden index) between two models. Decision-analytic measures like net benefit and relative utility are preferable to NRI as they account for different costs of misclassifying patients with or without the outcome. The document also presents a case study comparing a ovarian tumor risk prediction model with and without the CA-125 tumor marker, finding a negative NRI but positive net benefit and relative utility at threshold T=5%.
This document provides an overview of online random forest, a machine learning algorithm that can handle streaming data. It discusses how traditional supervised learning algorithms require a static data matrix as input, while streaming data has an explicit time order and new observations can arrive at any time. Online random forest allows trees to learn incrementally from each new observation and drop trees from the forest that perform poorly, enabling it to adapt to changes in important predictors over time. It also scales well by distributing tree computations across actors in a distributed system.
An Introduction to Causal Discovery, a Bayesian Network ApproachCOST action BM1006
This gene ranked 152nd based on correlation alone. Using causal reasoning and Bayesian networks, the researchers were able to better identify genes that could causally influence the disease state, rather than just being correlated. This integrative approach combining genetic and gene expression data provided more insights into disease causality than traditional correlation-based methods alone.
Principal component analysis (PCA) is a technique used to simplify complex datasets. It works by converting a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. PCA identifies patterns in data and expresses the data in such a way as to highlight their similarities and differences. The main implementations of PCA are eigenvalue decomposition and singular value decomposition. PCA is useful for data compression, reducing dimensionality for visualization and building predictive models. However, it works best for data that follows a multidimensional normal distribution.
1) OpenDA is a data assimilation toolbox that allows for both data assimilation and model calibration in a generic way.
2) It has an object oriented design that allows components like models and algorithms to be easily exchanged.
3) OpenDA supports parallel computing concepts and various ways of integrating models, including keeping models as "black boxes".
The document summarizes trigonometric addition formulas and related formulas. It provides proofs of the formulas using properties of coordinates on the unit circle. Specifically, it proves formulas for cosine, sine, and tangent of the sum or difference of two angles α and β using the x-y coordinates of points on two superimposed unit circles with angles of α, β, α+β, and -β.
This document summarizes a lecture on discrete structures. It discusses logical equivalences, De Morgan's laws, tautologies and contradictions. It also covers laws of logic like distribution, identity and negation. Conditional propositions are defined as relating two propositions with "if-then". Truth tables are used to check logical equivalence and interpret conditionals. The contrapositive and biconditional are also introduced.
This document provides an overview of a lecture on analyzing pathogens using BEAST (Bayesian Evolutionary Analysis Sampling Trees). The lecture covers what makes pathogens special from an evolutionary perspective, an introduction to Bayesian analysis and Markov chain Monte Carlo (MCMC) methods, an overview of the BEAST software package and its components, and a demonstration of running a BEAST analysis. The lecture discusses building phylogenetic trees incorporating sample time data and estimating parameters like substitution rates and population dynamics from molecular sequences using Bayesian methods in BEAST.
The document discusses the history and development of Monte Carlo simulation methods in financial engineering. Some key points:
1) Monte Carlo simulation techniques originated from games of chance and probabilistic concepts in the 17th century. They were later applied to calculating integrals and solving differential equations.
2) In the 1940s/50s, the techniques were developed and applied at Los Alamos National Laboratory, coining the term "Monte Carlo."
3) In the 1970s, Monte Carlo methods became widely used in finance, with Black-Scholes options pricing and models simulating random asset price movements. They allow calculating expected option payoffs and fair values.
Event link: http://www.meetup.com/NYC-Open-Data/events/161342472/
A free R workshop given by SupStat Inc at New York R user group and NYC Open Data Meetup group
Principal component analysis (PCA) is a technique used to simplify complex datasets. It works by converting a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. PCA identifies patterns in data and expresses the data in such a way as to highlight their similarities and differences. The main implementations of PCA are eigenvalue decomposition and singular value decomposition. PCA is useful for data compression, reducing dimensionality for visualization and building predictive models. However, it works best for data that follows a multidimensional normal distribution.
1) OpenDA is a data assimilation toolbox that allows for both data assimilation and model calibration in a generic way.
2) It has an object oriented design that allows components like models and algorithms to be easily exchanged.
3) OpenDA supports parallel computing concepts and various ways of integrating models, including keeping models as "black boxes".
The document summarizes trigonometric addition formulas and related formulas. It provides proofs of the formulas using properties of coordinates on the unit circle. Specifically, it proves formulas for cosine, sine, and tangent of the sum or difference of two angles α and β using the x-y coordinates of points on two superimposed unit circles with angles of α, β, α+β, and -β.
This document summarizes a lecture on discrete structures. It discusses logical equivalences, De Morgan's laws, tautologies and contradictions. It also covers laws of logic like distribution, identity and negation. Conditional propositions are defined as relating two propositions with "if-then". Truth tables are used to check logical equivalence and interpret conditionals. The contrapositive and biconditional are also introduced.
This document provides an overview of a lecture on analyzing pathogens using BEAST (Bayesian Evolutionary Analysis Sampling Trees). The lecture covers what makes pathogens special from an evolutionary perspective, an introduction to Bayesian analysis and Markov chain Monte Carlo (MCMC) methods, an overview of the BEAST software package and its components, and a demonstration of running a BEAST analysis. The lecture discusses building phylogenetic trees incorporating sample time data and estimating parameters like substitution rates and population dynamics from molecular sequences using Bayesian methods in BEAST.
The document discusses the history and development of Monte Carlo simulation methods in financial engineering. Some key points:
1) Monte Carlo simulation techniques originated from games of chance and probabilistic concepts in the 17th century. They were later applied to calculating integrals and solving differential equations.
2) In the 1940s/50s, the techniques were developed and applied at Los Alamos National Laboratory, coining the term "Monte Carlo."
3) In the 1970s, Monte Carlo methods became widely used in finance, with Black-Scholes options pricing and models simulating random asset price movements. They allow calculating expected option payoffs and fair values.
Event link: http://www.meetup.com/NYC-Open-Data/events/161342472/
A free R workshop given by SupStat Inc at New York R user group and NYC Open Data Meetup group
Modeling of players activity by Michel pierfitte, Director of Game Analytics ...Sylvain Gauthier
This document discusses key performance indicators (KPIs) for modeling player activity and retention in games. It summarizes different retention curves for lifetime, playtime, and purchasing behavior. Lifetime retention looks at how long players remain active over time while playtime retention examines total active time among long-term players. Revenue models are also presented that break down revenue per user into conversion rate, average payment, and purchasing frequency. Various benchmarks are provided for different game types and a number of important KPIs are highlighted for monitoring player progression and monetization.
This document provides an overview of spectral clustering. It begins with a review of clustering and introduces the similarity graph and graph Laplacian. It then describes the spectral clustering algorithm and interpretations from the perspectives of graph cuts, random walks, and perturbation theory. Practical details like constructing the similarity graph, computing eigenvectors, choosing the number of clusters, and which graph Laplacian to use are also discussed. The document aims to explain the mathematical foundations and intuitions behind spectral clustering.
Este documento introduce los métodos de Monte Carlo y Monte Carlo por cadenas de Markov para la integración numérica. Estos métodos generan puntos aleatorios en lugar de puntos equiespaciados para aproximar integrales. El método de Monte Carlo directo tiene un error que depende de la raíz cuadrada del número de puntos, mientras que el error de otros métodos depende del número de dimensiones. Monte Carlo por cadenas de Markov genera una cadena de configuraciones cuya distribución corresponde a la distribución deseada mediante reglas de transición.
That's like, so random! Monte Carlo for Data ScienceCorey Chivers
1. Monte Carlo simulation can be used to understand obscure statistics, create your own statistics, avoid difficult math, understand inferences from data, and propagate uncertainty in complex models.
2. It allows running 'what if' scenarios, such as understanding how a surge in patients in one hospital unit would propagate to the rest of the hospital.
3. The talk introduces the concept of 'simudidactic', meaning to understand complex systems using randomization and computation to create models of real-world phenomena.
This document discusses error analysis for quasi-Monte Carlo methods. It introduces the trio error identity that decomposes the error into three terms: the variation of the integrand, the discrepancy of the sampling measure from the probability measure, and the alignment between the integrand and the difference between the measures. Several examples are provided to illustrate the identity, including integration over a reproducing kernel Hilbert space. The discrepancy term can be evaluated in O(n^2) operations and converges at different rates depending on the sampling method and properties of the integrand.
This document provides an introduction to Bayesian methods for theory, computation, inference and prediction. It discusses key concepts in Bayesian statistics including the likelihood principle, the likelihood function, Bayes' theorem, and using Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm to perform posterior integration when closed-form solutions are not possible. Examples are provided on using Bayesian regression to model the relationship between salmon body length and egg mass while incorporating prior information. The summary concludes that the Bayesian approach provides a coherent way to quantify uncertainty and make predictions accounting for both aleatory and epistemic sources of variation.
This document provides an introduction to machine learning. It discusses that machine learning focuses on learning about processes in the world rather than just memorizing data. It also covers the main types of machine learning: supervised learning which learns mappings between examples and labels; unsupervised learning which learns structure from unlabeled examples; and reinforcement learning which learns to take actions to maximize rewards. The document explains that machine learning requires representing data as feature vectors and using models with optimization techniques to find parameters that generalize to new data rather than overfitting the training data.
Conditional Image Generation with PixelCNN Decoderssuga93
The document summarizes research on conditional image generation using PixelCNN decoders. It discusses how PixelCNNs sequentially predict pixel values rather than the whole image at once. Previous work used PixelRNNs, but these were slow to train. The proposed approach uses a Gated PixelCNN that removes blind spots in the receptive field by combining horizontal and vertical feature maps. It also conditions PixelCNN layers on class labels or embeddings to generate conditional images. Experimental results show the Gated PixelCNN outperforms PixelCNN and achieves performance close to PixelRNN on CIFAR-10 and ImageNet, while training faster. It can also generate portraits conditioned on embeddings of people.
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
This slide was created for NIPS 2016 study meetup.
IAF and other related researches are briefly explained.
paper:
Diederik P. Kingma et al., "Improving Variational Inference with Inverse Autoregressive Flow", 2016
https://papers.nips.cc/paper/6581-improving-variational-autoencoders-with-inverse-autoregressive-flow
Monte Carlo simulation is a statistical technique that uses random numbers and probability to simulate real-world processes. It was developed in the 1940s by scientists working on nuclear weapons research. Monte Carlo simulation provides approximate solutions to problems by running simulations many times. It allows for sensitivity analysis and scenario analysis. Some examples include estimating pi by randomly generating points within a circle, and approximating integrals by treating the area under a curve as a target for random darts. The technique provides probabilistic results and allows modeling of correlated inputs.
Von Neumann worked on the cellular automata in in late 1940’s and 1950's as an abstraction of self replication. Von Neumann's ideas of propagation of information from parent cells to next cycles in a cellular automaton, could be an explanation of the geometry of space-time grid, limitation on the speed of light, Heisenberg’s Uncertainty principle, principles of Quantum theory, Relativity, elementary particles of physics, why universe is expanding,… and the list goes on. If this simple mechanism could explain so many things, why was it not a prominent field of research? The answer is very simple but at the same time quite unexpected: one of the applications of this post war study of Von Neumann on Cellular Automata was cryptography; therefore his results were classified and still kept as top secret by USA government. However it’s time to look at this subject from a different point of view: Can this mechanism be used to explain the physical universe? we are more interested in the secret of existence than encryption-decryption of text or data; where did all these galaxies, stars, cosmological objects come from, when did it start, what it was like at the beginning of time and space
This document provides a summary of key developments in the foundations of quantum mechanics. It discusses Planck's discovery that led to defining Planck's constant h, which established that energy is quantized. Einstein's work on the photoelectric effect supported this and introduced the photon concept. Bohr used classical mechanics and energy quantization to develop his model of the hydrogen atom. The document outlines the revolutionary changes brought by quantum theory and its greater scope and applicability compared to classical physics. It provides context for understanding quantum mechanics from first principles.
1. The document introduces quantum mechanics and its importance in describing phenomena at the nanoscale and for systems where classical mechanics fails, such as atoms and molecules.
2. It discusses how quantum mechanics was developed due to failures of classical mechanics and outlines some early discoveries that contributed to quantum mechanics, such as Planck's blackbody radiation law and Bohr's model of the hydrogen atom.
3. The document focuses on energy quantization in quantum systems and uses the example of the quantized emission spectrum of hydrogen atoms to illustrate this phenomenon of discrete energy levels.
A Short Introduction to Quantum Information and Quantum.pdfSolMar38
This document provides an introduction to the field of quantum information and quantum computation. It discusses how quantum information builds upon fundamental principles of quantum mechanics, such as quantum superposition and entanglement, which allow quantum systems to encode and process information in ways not possible classically. Specifically, it introduces the concept of a quantum bit (qubit) which can represent superpositions of 0 and 1, exponential scaling of information with entangled states, and algorithms like Shor's algorithm that achieve an exponential speedup over classical computers. The document serves as an overview of the novel possibilities opened up by manipulating and observing individual quantum objects.
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxmadlynplamondon
Eden by Wire: Webcameras and the Telepresent Landscape
THOMAS J. CAMPANELLA
Thomas J. Campanella received his Ph.D. from the Department of Urban Studies and Planning at the Massachusetts Institute of Technology. Presently he is an Assistant Professor in the Department of City and Regional Planning at the University of North Carolina at Chapel Hill. There he teaches courses in the Theory and Practice of Urban Design, Making the American Urban Lnndscape, and Site Planning and Sustainable Design, among others. His most recent books include The Resilient City: How Modern Cities Recover from Disaster (2004) [co-authored with Lawrence J. Vale], Republic of Shade: New England and the American Elm (2003), and Cities from the Sky: An Aerial Portrait of America (2001). The following is an excerpt from Dr. Campanella's article "Eden by Wire, which originally appeared in The Robot in the Garden: Telerobotics and Telepistemology in the Age of the Internet (2000), edited by Ken Goldberg and published by MIT Press.Getting Started
Have you ever looked up and noticed all around you—in and on buildings, on streets, in stores, banks, and lobbies—cameras or webcameras? Campanella refers to these cameras as "a set of wired eyes, a digital extension of the human faculty of vision.' But what are these cameras looking at? Are they necessary for our security or are they an intrusion into our lives? Who is watching us and what records are they keeping for what purposes? What do these webcameras mean for privacy and individual rights? How important are these cameras in an age of terrorism and fear? How do you feel about constantly being watched?
80 Thomas J. Campanella
Hello, and welcome to my webcam; it points out of my window here in Cambridge, and looks toward the centre of town. 1 Wake up to find out that you are the eyes of the World. 2
he sun never sets on the cyberspatial empire; somewhere on the globe, at any hour, an electronic retina is receiving light, converting sunbeams into a stream of ones and zeros. Since the popularization of the Internet several years ago, hundreds of "webcameras" have gone live, a globe-spanning matrix of electro-optical devices serving images to the World Wide Web. The scenes they afford range from the sublime to the ridiculous—from toilets to the Statue of Liberty. Among the most compelling are those webcameras trained on urban and rural landscapes, and which enable the remote observation of distant outdoor scenes in real or close to real time. Webcameras indeed constitute something of a grassroots global telepresence project. William J. Mitchell has described the Internet as "a worldwide, time-zone-spanning optic nerve with electronic eyeballs at its endpoints "3 Webcameras are those eyeballs. If the Internet and World Wide Web represent the augmentation of collective memory, then webcameras are a set of wired eyes, a digital extension of the human faculty of vision.
Before the advent of webcameras, the synchronous observation of ...
Eden by Wire Webcameras and the Telepresent LandscapeTHOMAS J. .docxtidwellveronique
This document summarizes Thomas Campanella's article "Eden by Wire: Webcameras and the Telepresent Landscape". It discusses how webcameras have enabled remote observation of distant places in real-time over the internet, shrinking the vastness of geographic distance. Webcameras represent an expansion of our personal space-time envelope by allowing us to view hundreds of destinations at any hour. While not a perfect substitute for direct experience, webcameras introduce a degree of physical sight and presence into the otherwise disembodied digital world, making us remotely present in faraway places.
This document provides an overview of quantum computers, including their advantages over classical computers, key concepts like superposition and entanglement, and the history and current state of quantum computing research and development. It discusses how quantum computers work using quantum bits rather than binary bits to store information, and how companies like D-Wave are developing quantum processors. The timeline details major advances, from early theoretical work in the 1970s-1980s to experimental demonstrations of quantum gates and algorithms in the 1990s-2000s to current multi-qubit systems being researched.
This a short presentation for a 15 minutes talk at Bayesian Inference for Stochastic Processes 7, on the SMC^2 algorithm.
http://arxiv.org/abs/1101.1528
The 2022 Nobel Prize in Physics was awarded to Alain Aspect, John Clauser, and Anton Zeilinger for their groundbreaking experimental work on quantum entanglement and violations of Bell's inequalities. John Clauser performed the first conclusive experiment in 1972 showing violations of Bell's inequalities. Alain Aspect then designed experiments in the 1980s enforcing stricter locality conditions. Anton Zeilinger demonstrated quantum teleportation in 1997 and performed another key Bell violation experiment in 1998. Together, their work confirmed the predictions of quantum mechanics and ruled out local hidden variable theories, resolving a decades-long debate between Einstein and Bohr. This established the foundations for the rapidly growing field of quantum information science.
1. Quantum entanglement describes a phenomenon where two quantum particles interact in such a way that they become linked regardless of distance, so that measuring one particle instantly affects the state of the other.
2. Einstein was critical of quantum mechanics and its implications of "spooky action at a distance," which led to the development of experiments to test theories of quantum entanglement.
3. Repeated experiments confirmed the existence of quantum entanglement and disproved Einstein's theories, showing that entangled particles are truly linked regardless of distance.
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
1. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
A Short History of Markov Chain Monte Carlo:
Subjective Recollections from Incomplete Data
Christian P. Robert and George Casella
Universit´ Paris-Dauphine, IuF, & CRESt
e
and University of Florida
April 2, 2011
2. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
In memoriam, Julian Besag, 1945–2010
3. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
Introduction
Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .
4. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
Introduction
Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .
Contents: Distinction between Metropolis-Hastings based
algorithms and those related with Gibbs sampling, and brief entry
into “second-generation MCMC revolution”.
5. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
A few landmarks
Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)
6. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
A few landmarks
Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)
Several reasons:
lack of computing machinery
lack of background on Markov chains
lack of trust in the practicality of the method
7. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Bombs before the revolution
Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.
[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
8. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Bombs before the revolution
Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.
[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
9. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
10. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
11. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
12. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
13. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo with computers
Very close “coincidence” with
appearance of very first
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947
14. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo with computers
Very close “coincidence” with
appearance of very first
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947
Same year Ulam and von Neumann (re)invented inversion and
accept-reject techniques
In 1949, very first symposium on Monte Carlo and very first paper
[Metropolis and Ulam, 1949]
15. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
The Metropolis et al. (1953) paper
Very first MCMC algorithm associated
with the second computer, MANIAC,
Los Alamos, early 1952.
Besides Metropolis, Arianna W.
Rosenbluth, Marshall N. Rosenbluth,
Augusta H. Teller, and Edward Teller
contributed to create the Metropolis
algorithm...
16. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Motivating problem
Computation of integrals of the form
F (p, q) exp{−E(p, q)/kT }dpdq
I= ,
exp{−E(p, q)/kT }dpdq
with energy E defined as
N N
1
E(p, q) = V (dij ),
2
i=1 j=1
j=i
and N number of particles, V a potential function and dij the
distance between particles i and j.
17. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Boltzmann distribution
Boltzmann distribution exp{−E(p, q)/kT } parameterised by
temperature T , k being the Boltzmann constant, with a
normalisation factor
Z(T ) = exp{−E(p, q)/kT }dpdq
not available in closed form.
18. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Computational challenge
Since p and q are 2N -dimensional vectors, numerical integration is
impossible
Plus, standard Monte Carlo techniques fail to correctly
approximate I: exp{−E(p, q)/kT } is very small for most
realizations of random configurations (p, q) of the particle system.
19. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Metropolis algorithm
Consider a random walk modification of the N particles: for each
1 ≤ i ≤ N , values
xi = xi + αξ1i and yi = yi + αξ2i
are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The
energy difference between new and previous configurations is ∆E
and the new configuration is accepted with probability
1 ∧ exp{−∆E/kT } ,
and otherwise the previous configuration is replicated∗
∗
counting one more time in the average of the F (pt , pt )’s over the τ moves
of the random walk.
20. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
21. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.
22. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.
Application to the specific problem of the rigid-sphere collision
model. The number of iterations of the Metropolis algorithm
seems to be limited: 16 steps for burn-in and 48 to 64 subsequent
iterations (that still required four to five hours on the Los Alamos
MANIAC).
23. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Physics and chemistry
The method of Markov chain Monte Carlo immediately
had wide use in physics and chemistry.
[Geyer & Thompson, 1992]
Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...
24. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Physics and chemistry
Statistics has always been fuelled by energetic mining of
the physics literature.
[Clifford, 1993]
Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...
25. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
A fair generalisation
In Biometrika 1970, Hastings defines MCMC methodology for
finite and reversible Markov chains, the continuous case being
discretised:
26. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
A fair generalisation
In Biometrika 1970, Hastings defines MCMC methodology for
finite and reversible Markov chains, the continuous case being
discretised:
Generic acceptance probability for a move from state i to state j is
sij
αij = πi q ,
1 + πj qij
ji
where sij is a symmetric function.
27. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
State of the art
Note
Generic form that encompasses both Metropolis et al. (1953) and
Barker (1965).
Peskun’s ordering not yet discovered: Hastings mentions that little
is known about the relative merits of those two choices (even
though) Metropolis’s method may be preferable.
Warning against high rejection rates as indicative of a poor choice
of transition matrix, but not mention of the opposite pitfall of low
rejection.
28. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
What else?!
Items included in the paper are
a Poisson target with a ±1 random walk proposal,
a normal target with a uniform random walk proposal mixed with its
reflection (i.e. centered at −X(t) rather than X(t)),
a multivariate target where Hastings introduces Gibbs sampling,
updating one component at a time and defining the composed
transition as satisfying the stationary condition because each
component does leave the target invariant
a reference to Erhman, Fosdick and Handscomb (1960) as a
preliminary if specific instance of this Metropolis-within-Gibbs
sampler
an importance sampling version of MCMC,
some remarks about error assessment,
a Gibbs sampler for random orthogonal matrices
29. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
Three years later
Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).
30. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
Three years later
Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).
Proof direct consequence of Kemeny and Snell (1960) on
asymptotic variance. Peskun also establishes that this variance can
improve upon the iid case if and only if the eigenvalues of P − A
are all negative, when A is the transition matrix corresponding to
the iid simulation and P the transition matrix corresponding to the
Metropolis algorithm, but he concludes that the trace of P − A is
always positive.
31. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
32. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
[Hammersley and Clifford, 1971]
33. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
What is the most general form of the conditional probability
functions that define a coherent joint function? And what will the
joint look like?
[Besag, 1972]
34. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
Joint distribution of vector associated with a dependence graph
must be represented as product of functions over the cliques of the
graphs, i.e., of functions depending only on the components
indexed by the labels in the clique.
[Cressie, 1993; Lauritzen, 1996]
35. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
A probability distribution P with positive and continuous density f
satisfies the pairwise Markov property with respect to an undirected
graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)
[Cressie, 1993; Lauritzen, 1996]
36. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
Under the positivity condition, the joint distribution g satisfies
p
g j (y j |y 1 , . . . , y j−1
,y j+1
, . . . , y p)
g(y1 , . . . , yp ) ∝
g j (y j |y 1 , . . . , y j−1
,y , . . . , y p)
j=1 j+1
for every permutation on {1, 2, . . . , p} and every y ∈ Y .
[Cressie, 1993; Lauritzen, 1996]
37. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
An apocryphal theorem
The Hammersley-Clifford theorem was never published by its
authors, but only through Grimmet (1973), Preston (1973),
Sherman (1973), Besag (1974). The authors were dissatisfied with
the positivity constraint: The joint density could only be recovered
from the full conditionals when the support of the joint was made
of the product of the supports of the full conditionals (with
obvious counter-examples. Moussouris’ counter-example put a full
stop to their endeavors.
[Hammersley, 1974]
38. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
To Gibbs or not to Gibbs?
Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.
39. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
To Gibbs or not to Gibbs?
Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.
The simulation procedure is to consider the sites
cyclically and, at each stage, to amend or leave unaltered
the particular site value in question, according to a
probability distribution whose elements depend upon the
current value at neighboring sites (...) However, the
technique is unlikely to be particularly helpful in many
other than binary situations and the Markov chain itself
has no practical interpretation.
[Besag, 1974]
40. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Broader perspective
In 1964, Hammersley and Handscomb wrote a (the first?)
textbook on Monte Carlo methods: they cover
They cover such topics as
“Crude Monte Carlo“;
importance sampling;
control variates; and
“Conditional Monte Carlo”, which looks surprisingly like a
missing-data Gibbs completion approach.
They state in the Preface
We are convinced nevertheless that Monte Carlo methods
will one day reach an impressive maturity.
41. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Clicking in
After Peskun (1973), MCMC mostly dormant in mainstream
statistical world for about 10 years, then several papers/books
highlighted its usefulness in specific settings:
Geman and Geman (1984)
Besag (1986)
Strauss (1986)
Ripley (Stochastic Simulation, 1987)
Tanner and Wong (1987)
Younes (1988)
42. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Enters the Gibbs sampler
Geman and Geman (1984), building on Metropolis et al. (1953),
Hastings (1970), and Peskun (1973), constructed a Gibbs sampler
for optimisation in a discrete image processing problem without
completion.
Responsible for the name Gibbs sampling, because method used for
the Bayesian study of Gibbs random fields linked to the physicist
Josiah Willard Gibbs (1839–1903)
Back to Metropolis et al., 1953: the Gibbs sampler is used as a
simulated annealing algorithm and ergodicity is proven on the
collection of global maxima
43. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Besag (1986) integrates GS for SA...
...easy to construct the transition matrix Q, of a discrete
time Markov chain, with state space Ω and limit
distribution (4). Simulated annealing proceeds by
running an associated time inhomogeneous Markov chain
with transition matrices QT , where T is progressively
decreased according to a prescribed “schedule” to a value
close to zero.
[Besag, 1986]
44. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...and links with Metropolis-Hastings...
There are various related methods of constructing a
manageable QT (Hastings, 1970). Geman and Geman
(1984) adopt the simplest, which they term the ”Gibbs
sampler” (...) time reversibility, a common ingredient in
this type of problem (see, for example, Besag, 1977a), is
present at individual stages but not over complete cycles,
though Peter Green has pointed out that it returns if QT
is taken over a pair of cycles, the second of which visits
pixels in reverse order
[Besag, 1986]
45. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...seeing the larger picture,...
As Geman and Geman (1984) point out, any property of
the (posterior) distribution P (x|y) can be simulated by
running the Gibbs sampler at “temperature” T = 1.
Thus, if xi maximizes P (xi |y), then it is the most
ˆ
frequently occurring colour at pixel i in an infinite
realization of the Markov chain with transition matrix Q
of Section 2.3. The xi ’s can therefore be simultaneously
ˆ
estimated from a single finite realization of the chain. It
is not yet clear how long the realization needs to be,
particularly for estimation near colour boundaries, but the
amount of computation required is generally prohibitive
for routine purposes
[Besag, 1986]
46. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...seeing the larger picture,...
P (x|y) can be simulated using the Gibbs sampler, as
suggested by Grenander (1983) and by Geman and
Geman (1984). My dismissal of such an approach for
routine applications was somewhat cavalier:
purpose-built array processors could become relatively
inexpensive (...) suppose that, for 100 complete cycles
say, images have been collected from the Gibbs sampler
(or by Metropolis’ method), following a “settling-in”
period of perhaps another 100 cycles, which should cater
for fairly intricate priors (...) These 100 images should
often be adequate for estimating properties of the
posterior (...) and for making approximate associated
confidence statements, as mentioned by Mr Haslett.
[Besag, 1986]
47. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
48. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
49. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
The pair (xi , βi ) is then a (bivariate) Markov field and
can be reconstructed as a bivariate process by the
methods described in Professor Besag’s paper.
[Clifford, 1986]
50. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
The simulation-based estimator Epost Ψ(X) will differ
ˆ
from the m.a.p. estimator Ψ(x).
[Silverman, 1986]
51. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Discussants of Besag (1986)
Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P.
Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.
Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.
Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc
u
52. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
A comment on Besag (1986)
While special purpose algorithms will determine the
utility of the Bayesian methods, the general purpose
methods-stochastic relaxation and simulation of solutions
of the Langevin equation (Grenander, 1983; Geman and
Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986)
have proven enormously convenient and versatile. We are
able to apply a single computer program to every new
problem by merely changing the subroutine that
computes the energy function in the Gibbs representation
of the posterior distribution.
[Geman and McClure, 1986]
53. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Another one
It is easy to compute exact marginal and joint posterior
probabilities of currently unobserved features, conditional
on those clinical findings currently available
(Spiegelhalter, 1986a,b), the updating taking the form of
‘propagating evidence’ through the network (...) it would
be interesting to see if the techniques described tonight,
which are of intermediate complexity, may have any
applications in this new and exciting area [causal
networks].
[Spiegelhalter, 1986]
54. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
The candidate’s formula
Representation of the marginal likelihood as
π(θ)f (x|θ)
m(x)
π(θ|x)
or of the marginal predictive as
pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )
[Besag, 1989]
55. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
The candidate’s formula
Representation of the marginal likelihood as
π(θ)f (x|θ)
m(x)
π(θ|x)
or of the marginal predictive as
pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )
[Besag, 1989]
Why candidate?
“Equation (2) appeared without explanation in a Durham
University undergraduate final examination script of 1984.
Regrettably, the student’s name is no longer known to me.”
56. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994) used this representation to derive
the [infamous] harmonic mean approximation to the marginal
likelihood
Gelfand and Dey (1994)
Geyer and Thompson (1995)
Chib (1995)
57. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994) also relied on this formula for the
same purpose in a more general perspective
Geyer and Thompson (1995)
Chib (1995)
58. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994)
Geyer and Thompson (1995) derived MLEs by a Monte Carlo
approximation to the normalising constant
Chib (1995)
59. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994)
Geyer and Thompson (1995)
Chib (1995) uses this representation to build a MCMC
approximation to the marginal likelihood
60. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Impact
“This is surely a revolution.”
[Clifford, 1993]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear influence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.
61. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Impact
“[Gibbs sampler] use seems to have been isolated in the spatial
statistics community until Gelfand and Smith (1990)”
[Geyer, 1990]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear influence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.
62. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint
63. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint
Lower impact:
emphasis on missing data problems (hence data augmentation)
MCMC approximation to the target at every iteration
K
1
π(θ|x) ≈ π(θ|x, z t,k ) , z t,k ∼ πt−1 (z|x) ,
ˆ
K
k=1
too close to Rubin’s (1978) multiple imputation
theoretical backup based on functional analysis (Markov kernel had
to be uniformly bounded and equicontinuous)
64. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Epiphany
In June 1989, at a Bayesian workshop in Sherbrooke,
Qu´bec, Adrian Smith exposed for the first time (?)
e
the generic features of Gibbs sampler, exhibiting a ten
line Fortran program handling a random effect model
Yij = θi + εij , i = 1, . . . , K, j = 1, . . . , J,
2 2
θi ∼ N(µ, σθ ) εij ∼ N(0, σε )
by full conditionals on µ, σθ , σε ...
[Gelfand and Smith, 1990]
This was enough to convince the whole audience!
65. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Garden of Eden
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;
Wang et al., 1993, 1994)
generalized linear mixed models (Albert and Chib, 1993)
mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,
1994; Escobar and West, 1993)
changepoint analysis (Carlin et al., 1992)
point processes (Grenander and Møller, 1994)
&tc
66. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Garden of Eden
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
genomics (Stephens and Smith, 1993; Lawrence et al., 1993;
Churchill, 1995; Geyer and Thompson, 1995)
ecology (George and X, 1992; Dupuis, 1995)
variable selection in regression (George and mcCulloch, 1993)
spatial statistics (Raftery and Banfield, 1991)
longitudinal studies (Lange et al., 1992)
&tc
67. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992, relied on MCMC methods for ML
estimation
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
68. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993 discussed convergence diagnoses and
applications, incl. mixtures for Gibbs and Metropolis–Hastings
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
69. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993 stated the desideratas for
convergences, and connect MCMC with auxiliary and
antithetic variables
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
70. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994 laid out all of the assumptions needed to
analyze the Markov chains and then developed their
properties, in particular, convergence of ergodic averages and
central limit theorems
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
71. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95 analyzed the covariance
structure of Gibbs sampling, and were able to formally
establish the validity of Rao-Blackwellization in Gibbs
sampling
Mengersen and Tweedie, 1996
72. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996 set the tone for the study of
the speed of convergence of MCMC algorithms to the target
distribution
73. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
Gilks, Clayton and Spiegelhalter, 1993
&tc...
74. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Convergence diagnoses
Convergence diagnoses
Can we really tell when a complicated Markov chain has
reached equilibrium? Frankly, I doubt it.
[Clifford, 1993]
Explosion of methods
Gelman and Rubin (1991)
Besag and Green (1992)
Geyer (1992)
Raftery and Lewis (1992)
Cowles and Carlin (1996) coda
Brooks and Roberts (1998)
&tc
75. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Particles, again
Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
76. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Particles, again
Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
et al. (1997) coined the term “particle filter”.
77. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Bootstrap filter and sequential Monte Carlo
Gordon, Salmon and Smith (1993) introduced the bootstrap filter
which, while formally connected with importance sampling,
involves past simulations and possible MCMC steps (Gilks and
Berzuini, 2001).
Sequential imputation was developped in Kong, Liu and Wong
(1994), while Liu and Chen (1995) first formally pointed out the
importance of resampling in “sequential Monte Carlo”, a term they
coined
78. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
pMC versus pMCMC
Recycling of past simulations legitimate to build better
importance sampling functions as in population Monte Carlo
[Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007]
e
Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)
using particles to build an evolving MCMC kernel pθ (y1:T ) in
ˆ
state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s
and Roberts’ (2009) use of approximations in MCMC
acceptance steps
[Kennedy and Kulti, 1985]
79. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Reversible jump
Reversible jump
Generaly considered as the second Revolution.
Formalisation of a Markov chain moving across
models and parameter spaces allows for the
Bayesian processing of a wide variety of models
and to the success of Bayesian model choice
Definition of a proper balance condition on cross-model Markov
kernels gives a generic setup for exploring variable dimension
spaces, even when the number of models under comparison is
infinite.
[Green, 1995]
80. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Perfect sampling
Perfect sampling
Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.
81. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Perfect sampling
Perfect sampling
Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.
Outburst of papers, particularly from Jesper Møller and coauthors,
but the excitement somehow dried out [except in dedicated areas]
as construction of perfect samplers is hard and coalescence times
very high...
[Møller and Waagepetersen, 2003]
82. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Envoi
To be continued...
...standing on the shoulders of giants