This document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces an approximation of the posterior distribution by simulating data under different parameter values and accepting simulations that match the observed data. The document provides background on how ABC originated from population genetics models and outlines some of the advances in ABC, including how it can be used as an inference machine to estimate parameters from simulated data.
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
The presentation is an introduction to decision making with approximate Bayesian Methods. It consists of a review of Bayesian Decision Theory and Variational Inference along with a description of Loss Calibrated Variational Inference.
Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability CalculusLeo Asselborn
This paper proposes an approach to algorithmically synthesize control strategies for discrete-time nonlinear uncertain systems based on reachable set computations using the ellipsoidal calculus. For given ellipsoidal initial sets and bounded ellipsoidal disturbances, the proposed algorithm iterates over conservatively approximating and LMI-constrained optimization problems to compute stabilizing controllers. The method uses
first-order Taylor approximation of the nonlinear dynamics and a conservative approximation of the Lagrange remainder.
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 5.
More info at http://summerschool.ssa.org.ua
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Presentación realizada en la V Jornada de Formació Sanitària Especialitzada ICS. Tarragona 2011.
Recoge el diseño del plan estratégico de identidad digital de un servicio de soporte a la docencia de un hospital universitario (Hospital Universitari Sagrat Cor).
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
The presentation is an introduction to decision making with approximate Bayesian Methods. It consists of a review of Bayesian Decision Theory and Variational Inference along with a description of Loss Calibrated Variational Inference.
Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
Control of Uncertain Nonlinear Systems Using Ellipsoidal Reachability CalculusLeo Asselborn
This paper proposes an approach to algorithmically synthesize control strategies for discrete-time nonlinear uncertain systems based on reachable set computations using the ellipsoidal calculus. For given ellipsoidal initial sets and bounded ellipsoidal disturbances, the proposed algorithm iterates over conservatively approximating and LMI-constrained optimization problems to compute stabilizing controllers. The method uses
first-order Taylor approximation of the nonlinear dynamics and a conservative approximation of the Lagrange remainder.
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 5.
More info at http://summerschool.ssa.org.ua
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Presentación realizada en la V Jornada de Formació Sanitària Especialitzada ICS. Tarragona 2011.
Recoge el diseño del plan estratégico de identidad digital de un servicio de soporte a la docencia de un hospital universitario (Hospital Universitari Sagrat Cor).
CONCUSSION The Impact on Schools: Strategies and Adjustments in the First We...GaylordHealth
Traumatic brain injury and recovery from concussion. Concussion management and considerations, management and Return to Learn. Graduated RTL schedules, goals, strategies, if symptoms persist the importance of a neuropsychological assessment. Several concussion case studies with RTL.
2011-2012 Florida Bright Futures Definition of TermsA.J. Kleinheksel
A University of Florida Office for Student Financial Affairs resource to define the key terms associated with the Florida Bright Futures Scholarship Program.
The present study concerns the feedback stabilisation of the unstable equilibria of a two-dimensional nonlinear pool-boiling system with essentially heterogeneous temperature distributions in the
fluid-heater interface. Regulation of such equilibria has great potential for application in, for instance, micro-electronics cooling. Here, as a first step, stabilisation of these equilibria is considered. To this end a control law is implemented that regulates the heat supply to the heater as a function of the Fourier-Chebyshev modes of its internal temperature distribution. These
modes are intimately related to the physical eigenmodes of the system and thus admit robust and efficient regulation on the basis of the natural composition of the temperature field. Key to this modal-control strategy and its application for controller design are two equivalent and interchangeable PDE and state-space forms of the linearised pool-boiling model. Derivation of these forms is a central theme of this study. Performance of modal controllers thus designed is demonstrated and analysed by simulations of the nonlinear closed-loop system.
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...Alexander Litvinenko
We suggest the new vision for classical Bayesian Update formula. We expand all ingredients in Polynomial Chaos Expansion and write out a new formula for Bayesian* update of PCE coefficients. This formula is derived from Minimum Mean Square Estimation. One starts with prior PCE, take measurements into account, and obtain posterior PCE coefficients, without any MCMC sampling.
Non-sampling functional approximation of linear and non-linear Bayesian UpdateAlexander Litvinenko
We offer a non-sampling functional approximation of non-linear surrogate to classical Bayesian Update formula. We start with prior Polynomial Chaos Expansion (PCE), express log-likelihood in a PCE basis and obtain a new posterior PCE.
Main IDEA is to update not probability density, but basis coefficients.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
A crash coarse in stochastic Lyapunov theory for Markov processes (emphasis is on continuous time)
See also the survey for models in discrete time,
https://netfiles.uiuc.edu/meyn/www/spm_files/MarkovTutorial/MarkovTutorialUCSB2010.html
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
1. Approximate Bayesian Computation:
how Bayesian can it be?
Christian P. Robert
ISBA LECTURES ON BAYESIAN FOUNDATIONS
ISBA 2012, Kyoto, Japan
Universit´ Paris-Dauphine, IuF, & CREST
e
http://www.ceremade.dauphine.fr/~xian
June 21, 2012
2. This talk is dedicated to the memory of our dear friend
and fellow Bayesian, George Casella, 1951–2012
4. Approximate Bayesian computation
Introduction
Monte Carlo basics
simulation-based methods in Econometrics
Approximate Bayesian computation
ABC as an inference machine
5. General issue
Given a density π known up to a normalizing constant, e.g. a
posterior density, and an integrable function h, how can one use
h(x)˜ (x)µ(dx)
π
Ih = h(x)π(x)µ(dx) =
π (x)µ(dx)
˜
when h(x)˜ (x)µ(dx) is intractable?
π
6. Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Ih by
N
Imc (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
since [LLN] IMC (h) −→ Ih
N
Furthermore, if Ih2 = h2 (x)π(x)µ(dx) < ∞,
√ L
[CLT] ˆ
N IMC (h) − Ih
N N 0, I [h − Ih ]2 .
Caveat
Often impossible or inefficient to simulate directly from π
7. Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Ih by
N
Imc (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
since [LLN] IMC (h) −→ Ih
N
Furthermore, if Ih2 = h2 (x)π(x)µ(dx) < ∞,
√ L
[CLT] ˆ
N IMC (h) − Ih
N N 0, I [h − Ih ]2 .
Caveat
Often impossible or inefficient to simulate directly from π
8. Importance Sampling
For Q proposal distribution with density q
Ih = h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1 , . . . , xN ∼ Q and estimate Ih by
N
IIS (h) = N −1
ˆ
Q,N h(xi ){π/q}(xi ).
i=1
9. Importance Sampling
For Q proposal distribution with density q
Ih = h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1 , . . . , xN ∼ Q and estimate Ih by
N
IIS (h) = N −1
ˆ
Q,N h(xi ){π/q}(xi ).
i=1
10. Importance Sampling (convergence)
Then
ˆ as
[LLN] IIS (h) −→ Ih
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
[CLT] ˆ
N(IIS (h) − Ih )
Q,N N 0, Q{(hπ/q − Ih )2 } .
Caveat
ˆ
If normalizing constant unknown, impossible to use IIS (h)
Q,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
11. Importance Sampling (convergence)
Then
ˆ as
[LLN] IIS (h) −→ Ih
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
[CLT] ˆ
N(IIS (h) − Ih )
Q,N N 0, Q{(hπ/q − Ih )2 } .
Caveat
ˆ
If normalizing constant unknown, impossible to use IIS (h)
Q,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
12. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
13. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
14. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
15. Econom’ections
Model choice
Similar exploration of simulation-based and approximation
techniques in Econometrics
Simulated method of moments
Method of simulated moments
Simulated pseudo-maximum-likelihood
Indirect inference
[Gouri´roux & Monfort, 1996]
e
16. Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
1. simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
2. and estimate θ by
n
arg min (yto − yt (θ))2
θ
t=1
17. Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
1. simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
2. and estimate θ by
n n 2
arg min yto − yt (θ)
θ
t=1 t=1
18. Method of simulated moments
Given a statistic vector K (y ) with
Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ)
find an unbiased estimator of k(y1:(t−1) ; θ),
˜
k( t , y1:(t−1) ; θ)
Estimate θ by
n S
arg min K (yt ) − ˜ t
k( s , y1:(t−1) ; θ)/S
θ
t=1 s=1
[Pakes & Pollard, 1989]
19. Indirect inference
ˆ
Minimise [in θ] a distance between estimators β based on a
pseudo-model for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´roux, Monfort, & Renault, 1993;
e
Smith, 1993; Gallant & Tauchen, 1996]
20. Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
ˆ
β(y) = arg max log f (yt |β, y1:(t−1) )
β
t
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
21. Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
2
ˆ ∂ log f
β(y) = arg min (yt |β, y1:(t−1) )
β
t
∂β
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
22. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux, Monfort, 1996, p. 66]
e
23. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux, Monfort, 1996, p. 66]
e
24. Choice of pseudo-model
Pick model such that
ˆ
1. β(θ) not flat
(i.e. sensitive to changes in θ)
ˆ
2. β(θ) not dispersed (i.e. robust agains changes in ys (θ))
[Frigessi & Heggland, 2004]
25. Empirical likelihood
Another approximation method (not yet related with simulation)
Definition
For dataset y = (y1 , . . . , yn ), and parameter of interest θ, pick
constraints
E[h(Y , θ)] = 0
uniquely identifying θ and define the empirical likelihood as
n
Lel (θ|y) = max pi
p
i=1
for p in the set {p ∈ [0; 1]n , pi = 1, i pi h(yi , θ) = 0}.
[Owen, 1988]
26. Empirical likelihood
Another approximation method (not yet related with simulation)
Example
When θ = Ef [Y ], empirical likelihood is the maximum of
p1 · · · pn
under constraint
p1 y1 + . . . + pn yn = θ
27. ABCel
Another approximation method (now related with simulation!)
Importance sampling implementation
Algorithm 1: Raw ABCel sampler
Given observation y
for i = 1 to M do
Generate θ i from the prior distribution π(·)
Set the weight ωi = Lel (θ i |y)
end for
Proceed with pairs (θi , ωi ) as in regular importance sampling
[Mengersen, Pudlo & Robert, 2012]
28. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
29. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
30. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
31. How much Bayesian?
asymptotically so (meaningfull?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
32. Approximate Bayesian computation
Introduction
Approximate Bayesian computation
Genetics of ABC
ABC basics
Advances and interpretations
Alphabet (compu-)soup
ABC as an inference machine
33. Genetic background of ABC
ABC is a recent computational technique that only requires being
able to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´ & al., 1999]
e
34. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we can not calculate the likelihood of the
polymorphism data f (y|θ).
35. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we can not calculate the likelihood of the
polymorphism data f (y|θ).
36. A genuine example of application
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
94
Pygmies populations: do they have a common origin? Was there a
lot of exchanges between pygmies and non-pygmies populations?
38. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
39. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
40. Untractable likelihood
So, what can we do when the
likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference...?
41. Untractable likelihood
So, what can we do when the
likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference...?
42. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
43. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
44. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
45. Why does it work?!
The proof is trivial:
f (θi ) ∝ π(θi )f (z|θi )Iy (z)
z∈D
∝ π(θi )f (y|θi )
= π(θi |y) .
[Accept–Reject 101]
46. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
(y, z) ≤
where is a distance
Output distributed from
def
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
[Pritchard et al., 1999]
47. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
(y, z) ≤
where is a distance
Output distributed from
def
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
[Pritchard et al., 1999]
48. ABC algorithm
In most implementations, further degree of A...pproximation:
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)} ≤
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
49. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
50. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
51. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
52. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
53. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)µ(B )
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
54. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X
µ(B )
X
X
55. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ 0, there exists 0 such that 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X
µ(B )
X
X
[Proof extends to other continuous-in-0 kernels K ]
56. Convergence of ABC (second attempt)
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
57. Convergence of ABC (second attempt)
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
58. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
59. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
60. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
61. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
62. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
63. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
64. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
65. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
66. Pima Indian benchmark
80
100
1.0
80
60
0.8
60
0.6
Density
Density
Density
40
40
0.4
20
20
0.2
0.0
0
0
−0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0
Figure: Comparison between density estimates of the marginals on β1
(left), β2 (center) and β3 (right) from ABC rejection samples (red) and
MCMC samples (black)
.
67. MA example
MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform over the inverse [real and complex] roots in
q
Q(u) = 1 − ϑi u i
i=1
under the identifiability conditions
68. MA example
MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform prior over the identifiability zone, i.e. triangle
for MA(2)
69. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
Distance: basic distance between the series
T
ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
70. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
Distance: basic distance between the series
T
ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
71. Comparison of distance impact
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
72. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
73. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
74. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
75. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
76. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
77. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
78. ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing away
θ such that ρ(η(z), η(y)) , replace
θ’s with locally regressed transforms
(use with BIC)
θ∗ = θ − {η(z) − η(y)}T β
ˆ [Csill´ry et al., TEE, 2010]
e
ˆ
where β is obtained by [NP] weighted least square regression on
(η(z) − η(y)) with weights
Kδ {ρ(η(z), η(y))}
[Beaumont et al., 2002, Genetics]
79. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
80. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
81. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior expectation
i θi Kδ {ρ(η(z(θi )), η(y))}
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
82. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior conditional density
˜
Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
83. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
84. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[Blum, JASA, 2010]
85. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[standard NP calculations]
86. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ (η(y))
ˆ
θ∗ = m(η(y)) + [θ − m(η(z))]
ˆ ˆ
σ (η(z))
ˆ
where
m(η) estimated by non-linear regression (e.g., neural network)
ˆ
σ (η) estimated by non-linear regression on residuals
ˆ
log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi
ˆ
[Blum Fran¸ois, 2009]
c
87. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ (η(y))
ˆ
θ∗ = m(η(y)) + [θ − m(η(z))]
ˆ ˆ
σ (η(z))
ˆ
where
m(η) estimated by non-linear regression (e.g., neural network)
ˆ
σ (η) estimated by non-linear regression on residuals
ˆ
log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi
ˆ
[Blum Fran¸ois, 2009]
c
88. ABC-NCH (2)
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum Fran¸ois, 2009]
c
89. ABC-NCH (2)
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum Fran¸ois, 2009]
c
90. ABC-MCMC
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (θ(t) |θ )
θ(t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )Kω (θ |θ(t) )
,
(t)
θ otherwise,
has the posterior π(θ|y ) as stationary distribution
[Marjoram et al, 2003]
91. ABC-MCMC
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (θ(t) |θ )
θ(t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )Kω (θ |θ(t) )
,
(t)
θ otherwise,
has the posterior π(θ|y ) as stationary distribution
[Marjoram et al, 2003]
92. ABC-MCMC (2)
Algorithm 2 Likelihood-free MCMC sampler
Use Algorithm 1 to get (θ(0) , z(0) )
for t = 1 to N do
Generate θ from Kω ·|θ(t−1) ,
Generate z from the likelihood f (·|θ ),
Generate u from U[0,1] ,
π(θ )Kω (θ(t−1) |θ )
if u ≤ I
π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then
set (θ (t) , z(t) ) = (θ , z )
else
(θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
end if
end for
93. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
=
π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) )
q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) )
×
q(θ |θ (t−1) ) XX|θ
)
f (z XX
94. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) )
f (z(t−1) |θ (t−1)
((( h ((
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) XX|θ
)
f (z XX
95. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) X(t−1) )
f (z(t−1) |θ (t−1) IAXX
(( X
((( h ,y (zX
X
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) X|θ
X )
f (z XX
π(θ )q(θ (t−1) |θ )
= IA ,y (z )
π(θ (t−1) q(θ |θ (t−1) )
96. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC sampler
thetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)eps]
97. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC sampler
thetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)eps]
98. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC-MCMC sampler
metas=rep(0,N)
metas[1]=rnorm(1,sd=10)
zed[1]=x
for (t in 2:N){
metas[t]=rnorm(1,mean=metas[t-1],sd=5)
zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1)
if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){
metas[t]=metas[t-1]
zed[t]=zed[t-1]}
}
104. A toy example
0.10
0.08
0.06
0.04
0.02
0.00 x = 50
−40 −20 0 20 40
θ
105. ABC-PMC
Use of a transition kernel as in population Monte Carlo with
manageable IS correction
Generate a sample at iteration t by
N
(t−1) (t−1)
πt (θ(t) ) ∝
ˆ ωj Kt (θ(t) |θj )
j=1
modulo acceptance of the associated xt , and use an importance
(t)
weight associated with an accepted simulation θi
(t) (t) (t)
ωi ∝ π(θi ) πt (θi ) .
ˆ
c Still likelihood free
[Beaumont et al., 2009]
106. Our ABC-PMC algorithm
Given a decreasing sequence of approximation levels 1 ≥ ... ≥ T,
1. At iteration t = 1,
For i = 1, ..., N
(1) (1)
Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y ) 1
(1)
Set ωi = 1/N
(1)
Take τ 2 as twice the empirical variance of the θi ’s
2. At iteration 2 ≤ t ≤ T ,
For i = 1, ..., N, repeat
(t−1) (t−1)
Pick θi from the θj ’s with probabilities ωj
(t) 2 (t)
generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi )
until (x, y ) t
(t) (t) N (t−1) −1 (t) (t−1)
Set ωi ∝ π(θi )/ j=1 ωj ϕ σt θi − θj )
2 (t)
Take τt+1 as twice the weighted empirical variance of the θi ’s
107. Sequential Monte Carlo
SMC is a simulation technique to approximate a sequence of
related probability distributions πn with π0 “easy” and πT as
target.
Iterated IS as PMC: particles moved from time n to time n via
kernel Kn and use of a sequence of extended targets πn˜
n
πn (z0:n ) = πn (zn )
˜ Lj (zj+1 , zj )
j=0
where the Lj ’s are backward Markov kernels [check that πn (zn ) is a
marginal]
[Del Moral, Doucet Jasra, Series B, 2006]
108. Sequential Monte Carlo (2)
Algorithm 3 SMC sampler
(0)
sample zi ∼ γ0 (x) (i = 1, . . . , N)
(0) (0) (0)
compute weights wi = π0 (zi )/γ0 (zi )
for t = 1 to N do
if ESS(w (t−1) ) NT then
resample N particles z (t−1) and set weights to 1
end if
(t−1) (t−1)
generate zi ∼ Kt (zi , ·) and set weights to
(t) (t) (t−1)
(t) (t−1) πt (zi ))Lt−1 (zi ), zi ))
wi = wi−1 (t−1) (t−1) (t)
πt−1 (zi ))Kt (zi ), zi ))
end for
[Del Moral, Doucet Jasra, Series B, 2006]
109. ABC-SMC
[Del Moral, Doucet Jasra, 2009]
True derivation of an SMC-ABC algorithm
Use of a kernel Kn associated with target π n and derivation of the
backward kernel
π n (z )Kn (z , z)
Ln−1 (z, z ) =
πn (z)
Update of the weights
M m
m=1 IA n
(xin )
win ∝ wi(n−1) M m
(xi(n−1) )
m=1 IA n−1
m
when xin ∼ K (xi(n−1) , ·)
110. ABC-SMCM
Modification: Makes M repeated simulations of the pseudo-data z
given the parameter, rather than using a single [M = 1] simulation,
leading to weight that is proportional to the number of accepted
zi s
M
1
ω(θ) = Iρ(η(y),η(zi ))
M
i=1
[limit in M means exact simulation from (tempered) target]
111. Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z, z ) to
simplify the importance weight and to remove the dependence on
the unknown likelihood from this weight. Update of importance
weights is reduced to the ratio of the proportions of surviving
particles
Major assumption: the forward kernel K is supposed to be invariant
against the true target [tempered version of the true posterior]
Adaptivity in ABC-SMC algorithm only found in on-line
construction of the thresholds t , slowly enough to keep a large
number of accepted transitions
112. Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z, z ) to
simplify the importance weight and to remove the dependence on
the unknown likelihood from this weight. Update of importance
weights is reduced to the ratio of the proportions of surviving
particles
Major assumption: the forward kernel K is supposed to be invariant
against the true target [tempered version of the true posterior]
Adaptivity in ABC-SMC algorithm only found in on-line
construction of the thresholds t , slowly enough to keep a large
number of accepted transitions
113. A mixture example (1)
Toy model of Sisson et al. (2007): if
θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) ,
then the posterior distribution associated with y = 0 is the normal
mixture
θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100)
restricted to [−10, 10].
Furthermore, true target available as
π(θ||x| ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
114. A mixture example (2)
Recovery of the target, whether using a fixed standard deviation of
τ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s.
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
θ θ θ θ θ
115. Yet another ABC-PRC
Another version of ABC-PRC called PRC-ABC with a proposal
distribution Mt , S replications of the pseudo-data, and a PRC step:
PRC-ABC Algorithm
(t−1) (t−1)
1. Select θ at random from the θi ’s with probabilities ωi
(t) iid (t−1)
2. Generate θi ∼ Kt , x1 , . . . , xS ∼ f (x|θi ), set
(t) (t)
ωi = Kt {η(xs ) − η(y)} Kt (θi ) ,
s
(t)
and accept with probability p (i) = ωi /ct,N
(t) (t)
3. Set ωi ∝ ωi /p (i) (1 ≤ i ≤ N)
[Peters, Sisson Fan, Stat Computing, 2012]
116. Yet another ABC-PRC
Another version of ABC-PRC called PRC-ABC with a proposal
distribution Mt , S replications of the pseudo-data, and a PRC step:
(t−1) (t−1)
Use of a NP estimate Mt (θ) = ωi ψ(θ|θi ) (with a
fixed variance?!)
S = 1 recommended for “greatest gains in sampler
performance”
ct,N upper quantile of non-zero weights at time t
117. ABC inference machine
Introduction
Approximate Bayesian computation
ABC as an inference machine
Error inc.
Exact BC and approximate targets
summary statistic
Series B discussion4
118. How much Bayesian?
maybe a convergent method
of inference (meaningful?
sufficient?)
approximation error
unknown (w/o simulation)
pragmatic Bayes (there is no
other solution!)
many calibration issues
(tolerance, distance,
statistics)
...should Bayesians care?!
119. How much Bayesian?
maybe a convergent method
of inference (meaningful?
sufficient?)
approximation error
unknown (w/o simulation)
pragmatic Bayes (there is no
other solution!)
many calibration issues
(tolerance, distance,
statistics)
...should Bayesians care?!
120. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
121. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
122. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
123. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
ABCµ involves acceptance probability
ˆ
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
ˆ
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
124. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
ABCµ involves acceptance probability
ˆ
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
ˆ
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
127. Questions about ABCµ
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
Is the data informative about ? [Identifiability]
How much does the prior π( ) impact the comparison?
How is using both ξ( |x0 , θ) and π ( ) compatible with a
standard probability model? [remindful of Wilkinson’s eABC ]
Where is the penalisation for complexity in the model
comparison?
[X, Mengersen Chen, 2010, PNAS]
128. Questions about ABCµ
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
Is the data informative about ? [Identifiability]
How much does the prior π( ) impact the comparison?
How is using both ξ( |x0 , θ) and π ( ) compatible with a
standard probability model? [remindful of Wilkinson’s eABC ]
Where is the penalisation for complexity in the model
comparison?
[X, Mengersen Chen, 2010, PNAS]
129. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
130. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
131. How exact a BC?
“Using to represent measurement error is
straightforward, whereas using to model the model
discrepancy is harder to conceptualize and not as
commonly used”
[Richard Wilkinson, 2008]
132. How exact a BC?
Pros
Pseudo-data from true model and observed data from noisy
model
Interesting perspective in that outcome is completely
controlled
Link with ABCµ and assuming y is observed with a
measurement error with density K
Relates to the theory of model approximation
[Kennedy O’Hagan, 2001]
Cons
Requires K to be bounded by M
True approximation error never assessed
Requires a modification of the standard ABC algorithm
133. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
0
where only y1:n is observed.
[Dean, Singh, Jasra, Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
0 0
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
134. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
0
where only y1:n is observed.
[Dean, Singh, Jasra, Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
0 0
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
135. ABC-MLE for HMMs
ABC-MLE defined by
ˆ 0 0
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, Peters, 2011]
136. ABC-MLE for HMMs
ABC-MLE defined by
ˆ 0 0
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, Peters, 2011]
137. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
138. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
139. Noisy ABC-MLE
Idea: Modify instead the data from the start
0
(y1 + ζ1 , . . . , yn + ζn )
[ see Fearnhead-Prangle ]
noisy ABC-MLE estimate
0 0
arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , )
θ
[Dean, Singh, Jasra, Peters, 2011]
140. Consistent noisy ABC-MLE
Degrading the data improves the estimation performances:
Noisy ABC-MLE is asymptotically (in n) consistent
under further assumptions, the noisy ABC-MLE is
asymptotically normal
increase in variance of order −2
likely degradation in precision or computing time due to the
lack of summary statistic [curse of dimensionality]
141. SMC for ABC likelihood
Algorithm 4 SMC ABC for HMMs
Given θ
for k = 1, . . . , n do
1 1 N N
generate proposals (xk , yk ), . . . , (xk , yk ) from the model
weigh each proposal with ωk l =I l
B(yk + ζk , ) (yk )
0
l
renormalise the weights and sample the xk ’s accordingly
end for
approximate the likelihood by
n N
l
ωk N
k=1 l=1
[Jasra, Singh, Martin, McCoy, 2010]
142. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
143. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
144. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
145. Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
η (y) = η(y) + τ
˜
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
146. Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
η (y) = η(y) + τ
˜
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
147. Summary statistics
Optimality of the posterior expectation E[θ|y] of the
parameter of interest as summary statistics η(y)!
Use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
148. Summary statistics
Optimality of the posterior expectation E[θ|y] of the
parameter of interest as summary statistics η(y)!
Use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
149. Details on Fearnhead and Prangle (FP) ABC
Use of a summary statistic S(·), an importance proposal g (·), a
kernel K (·) ≤ 1 and a bandwidth h 0 such that
(θ, ysim ) ∼ g (θ)f (ysim |θ)
is accepted with probability (hence the bound)
K [{S(ysim ) − sobs }/h]
and the corresponding importance weight defined by
π(θ) g (θ)
[Fearnhead Prangle, 2012]
150. Errors, errors, and errors
Three levels of approximation
π(θ|yobs ) by π(θ|sobs ) loss of information
[ignored]
π(θ|sobs ) by
π(s)K [{s − sobs }/h]π(θ|s) ds
πABC (θ|sobs ) =
π(s)K [{s − sobs }/h] ds
noisy observations
πABC (θ|sobs ) by importance Monte Carlo based on N
simulations, represented by var(a(θ)|sobs )/Nacc [expected
number of acceptances]
[M. Twain/B. Disraeli]
151. Average acceptance asymptotics
For the average acceptance probability/approximate likelihood
p(θ|sobs ) = f (ysim |θ) K [{S(ysim ) − sobs }/h] dysim ,
overall acceptance probability
p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd )
[FP, Lemma 1]
152. Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2
[Not particularly useful in practice]
note that p(θ|sobs ) is an approximate likelihood
reminiscent of parallel tempering
could be approximately achieved by attrition of half of the
data
153. Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2
[Not particularly useful in practice]
note that p(θ|sobs ) is an approximate likelihood
reminiscent of parallel tempering
could be approximately achieved by attrition of half of the
data
154. Calibration of h
“This result gives insight into how S(·) and h affect the Monte
Carlo error. To minimize Monte Carlo error, we need hd to be not
too small. Thus ideally we want S(·) to be a low dimensional
summary of the data that is sufficiently informative about θ that
π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (FP, p.5)
turns h into an absolute value while it should be
context-dependent and user-calibrated
only addresses one term in the approximation error and
acceptance probability (“curse of dimensionality”)
h large prevents πABC (θ|sobs ) to be close to π(θ|sobs )
d small prevents π(θ|sobs ) to be close to π(θ|yobs ) (“curse of
[dis]information”)
155. Calibrating ABC
“If πABC is calibrated, then this means that probability statements
that are derived from it are appropriate, and in particular that we
can use πABC to quantify uncertainty in estimates” (FP, p.5)
Definition
For 0 q 1 and subset A, event Eq (A) made of sobs such that
PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq (A)) = q
unclear meaning of conditioning on Eq (A)
156. Calibrating ABC
“If πABC is calibrated, then this means that probability statements
that are derived from it are appropriate, and in particular that we
can use πABC to quantify uncertainty in estimates” (FP, p.5)
Definition
For 0 q 1 and subset A, event Eq (A) made of sobs such that
PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq (A)) = q
unclear meaning of conditioning on Eq (A)
157. Calibrated ABC
Theorem (FP)
Noisy ABC, where
sobs = S(yobs ) + h , ∼ K (·)
is calibrated
[Wilkinson, 2008]
no condition on h!!
158. Calibrated ABC
Consequence: when h = ∞
Theorem (FP)
The prior distribution is always calibrated
is this a relevant property then?
159. More questions about calibrated ABC
“Calibration is not universally accepted by Bayesians. It is even more
questionable here as we care how statements we make relate to the
real world, not to a mathematically defined posterior.” R. Wilkinson
Same reluctance about the prior being calibrated
Property depending on prior, likelihood, and summary
Calibration is a frequentist property (almost a p-value!)
More sensible to account for the simulator’s imperfections
than using noisy-ABC against a meaningless based measure
[Wilkinson, 2012]
160. Converging ABC
Theorem (FP)
For noisy ABC, the expected noisy-ABC log-likelihood,
E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + )]π(yobs |θ0 )K ( )dyobs d ,
has its maximum at θ = θ0 .
True for any choice of summary statistic? even ancilary statistics?!
[Imposes at least identifiability...]
Relevant in asymptotia and not for the data
161. Converging ABC
Corollary
For noisy ABC, the ABC posterior converges onto a point mass on
the true parameter value as m → ∞.
For standard ABC, not always the case (unless h goes to zero).
Strength of regularity conditions (c1) and (c2) in Bernardo
Smith, 1994?
[out-of-reach constraints on likelihood and posterior]
Again, there must be conditions imposed upon summary
statistics...
162. Loss motivated statistic
Under quadratic loss function,
Theorem (FP)
ˆ
(i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when
ˆ = E(θ|yobs ) (!)
θ
(ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs )
ˆ
(iii) If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ]
ˆ
E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK (x)dx + o(h2 ).
measure-theoretic difficulties?
dependence of sobs on h makes me uncomfortable inherent to noisy
ABC
Relevant for choice of K ?
163. Optimal summary statistic
“We take a different approach, and weaken the requirement for
πABC to be a good approximation to π(θ|yobs ). We argue for πABC
to be a good approximation solely in terms of the accuracy of
certain estimates of the parameters.” (FP, p.5)
From this result, FP
derive their choice of summary statistic,
S(y) = E(θ|y)
[almost sufficient]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
164. Optimal summary statistic
“We take a different approach, and weaken the requirement for
πABC to be a good approximation to π(θ|yobs ). We argue for πABC
to be a good approximation solely in terms of the accuracy of
certain estimates of the parameters.” (FP, p.5)
From this result, FP
derive their choice of summary statistic,
S(y) = E(θ|y)
[wow! EABC [θ|S(yobs )] = E[θ|yobs ]]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
165. Caveat
Since E(θ|yobs ) is most usually unavailable, FP suggest
(i) use a pilot run of ABC to determine a region of non-negligible
posterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data to
estimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
166. Caveat
Since E(θ|yobs ) is most usually unavailable, FP suggest
(i) use a pilot run of ABC to determine a region of non-negligible
posterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data to
estimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
167. Approximating the summary statistic
As Beaumont et al. (2002) and Blum and Fran¸ois (2010), FP
c
use a linear regression to approximate E(θ|yobs ):
(i)
θi = β0 + β (i) f (yobs ) + i
with f made of standard transforms
[Further justifications?]
168. [my]questions about semi-automatic ABC
dependence on h and S(·) in the early stage
reduction of Bayesian inference to point estimation
approximation error in step (i) not accounted for
not parameterisation invariant
practice shows that proper approximation to genuine posterior
distributions stems from using a (much) larger number of
summary statistics than the dimension of the parameter
the validity of the approximation to the optimal summary
statistic depends on the quality of the pilot run
important inferential issues like model choice are not covered
by this approach.
[Robert, 2012]
169. More about semi-automatic ABC
[End of section derived from comments on Read Paper, just appeared]
“The apparently arbitrary nature of the choice of summary statistics
has always been perceived as the Achilles heel of ABC.” M.
Beaumont
“Curse of dimensionality” linked with the increase of the
dimension of the summary statistic
Connection with principal component analysis
[Itan et al., 2010]
Connection with partial least squares
[Wegman et al., 2009]
Beaumont et al. (2002) postprocessed output is used as input
by FP to run a second ABC
170. More about semi-automatic ABC
[End of section derived from comments on Read Paper, just appeared]
“The apparently arbitrary nature of the choice of summary statistics
has always been perceived as the Achilles heel of ABC.” M.
Beaumont
“Curse of dimensionality” linked with the increase of the
dimension of the summary statistic
Connection with principal component analysis
[Itan et al., 2010]
Connection with partial least squares
[Wegman et al., 2009]
Beaumont et al. (2002) postprocessed output is used as input
by FP to run a second ABC
171. Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
K {η(yr ) − η(yobs )}
R r
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd (µθ , Σθ )
whose parameters can be approximated based on the R simulations
(for each value of θ).
Parametric versus non-parametric rate [Uh?!]
Automatic weighting of components of η(·) through Σθ
Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami Kosmidis, 2012]
172. Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
K {η(yr ) − η(yobs )}
R r
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd (µθ , Σθ )
whose parameters can be approximated based on the R simulations
(for each value of θ).
Parametric versus non-parametric rate [Uh?!]
Automatic weighting of components of η(·) through Σθ
Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami Kosmidis, 2012]
173. Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π (x, y |θ) = f (x|θ)¯Y |X (y |x)
¯ π
where
πY |X (y |x) = K (y − x)
¯
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs ), then marginally
¯ ¯
y ∼ πY |θ (·|θ0 )
¯ ¯
c Explain for the consistency of Bayesian inference based on y and π
¯ ¯
[Lee, Andrieu Doucet, 2012]
174. Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π (x, y |θ) = f (x|θ)¯Y |X (y |x)
¯ π
where
πY |X (y |x) = K (y − x)
¯
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs ), then marginally
¯ ¯
y ∼ πY |θ (·|θ0 )
¯ ¯
c Explain for the consistency of Bayesian inference based on y and π
¯ ¯
[Lee, Andrieu Doucet, 2012]
175. ABC for Markov chains
Rewriting the posterior as
π(θ)1−n π(θ|x1 ) π(θ|xt−1 , xt )
where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
Allows for a stepwise ABC, replacing each π(θ|xt−1 , xt ) by an
ABC approximation
Similarity with FP’s multiple sources of data (and also with
Dean et al., 2011 )
[White et al., 2010, 2012]
176. ABC for Markov chains
Rewriting the posterior as
π(θ)1−n π(θ|x1 ) π(θ|xt−1 , xt )
where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
Allows for a stepwise ABC, replacing each π(θ|xt−1 , xt ) by an
ABC approximation
Similarity with FP’s multiple sources of data (and also with
Dean et al., 2011 )
[White et al., 2010, 2012]