This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013Christian Robert
Those are the slides for my conference talk at 2013 WSC, in the "Jacob Bernoulli's "Ars Conjectandi" and the emergence of probability" session organised by Adam Jakubowski
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013Christian Robert
Those are the slides for my conference talk at 2013 WSC, in the "Jacob Bernoulli's "Ars Conjectandi" and the emergence of probability" session organised by Adam Jakubowski
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
Remote-sensing data offer unprecedented opportunities to address Earth-system-science challenges, such as understanding the relationship between the atmosphere and Earth's surface using physics, chemistry, biology, mathematics, and computing. Statistical methods have often been seen as a hybrid of the latter two, so that a lot of attention has been given to computing estimates but far less to quantifying the uncertainty of the estimates. In my "bird's-eye view," I shall give a way to look at the problem using conditional probability models and three states of knowledge. Examples will be given of analyzing remotely sensed data of a leading greenhouse gas, carbon dioxide.
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multiple estimators for Monte Carlo approximations
1. Multiple estimators for Monte Carlo
approximations
Christian P. Robert
Universit´e Paris-Dauphine, Paris & University of Warwick, Coventry
with M. Banterle, G. Casella, R. Douc, V. Elvira, J.M. Marin
3. Monte Carlo problems
Approximation of an integral
I =
ˆ
X
H(x)dx
based on pseudo-random simulations
x1, . . . , xT
i.i.d.
∼ f(x)
and expectation representation
I =
ˆ
X
h(x)f(x)dx = Ef
[h(X)]
[Casella and Robert, 1996]
4. Monte Carlo problems
Approximation of an integral
I =
ˆ
X
H(x)dx
based on pseudo-random simulations
x1, . . . , xT
i.i.d.
∼ f(x)
and expectation representation
I =
ˆ
X
h(x)f(x)dx = Ef
[h(X)]
[Casella and Robert, 1996]
5. Multiple estimators
Even with a single sample, several representations available
• path sampling
• safe harmonic means
• mixtures like AMIS (aka IMIS, MAMIS)
• Rao-Blackwellisations
• recycling schemes
• quasi-Monte Carlo
• measure averaging (NPMLE) `a la Kong et al. (2003)
depending on how black is the black box mechanism
[Chen et al., 2000]
6. An outlier
Given
Xij ∼ fi(x) = cigi(x)
with gi known and ci unknown (i = 1, . . . , k, j = 1, . . . , ni),
constants ci estimated by a “reverse logistic regression” based on
the quasi-likelihood
L(η) =
k
i=1
ni
j=1
log pi(xij, η)
with
pi(x, η) = exp{ηi}gi(x)
k
i=1
exp{ηi}gi(x)
[Anderson, 1972; Geyer, 1992]
Approximation
log ˆci = log ni/n − ˆηi
7. An outlier
Given
Xij ∼ fi(x) = cigi(x)
with gi known and ci unknown (i = 1, . . . , k, j = 1, . . . , ni),
constants ci estimated by a “reverse logistic regression” based on
the quasi-likelihood
L(η) =
k
i=1
ni
j=1
log pi(xij, η)
with
pi(x, η) = exp{ηi}gi(x)
k
i=1
exp{ηi}gi(x)
[Anderson, 1972; Geyer, 1992]
Approximation
log ˆci = log ni/n − ˆηi
8. A statistical framework
Existence of a central limit theorem:
√
n (ˆηn − η)
L
−→ Nk(0, B+
AB+
)
[Geyer, 1992; Doss & Tan, 2015]
• strong convergence properties
• asymptotic approximation of the precision
• connection with bridge sampling and auxiliary model [mixture]
• ...but nothing statistical there [no estimation or data]
[Kong et al., 2003; Chopin & Robert, 2011]
9. A statistical framework
Existence of a central limit theorem:
√
n (ˆηn − η)
L
−→ Nk(0, B+
AB+
)
[Geyer, 1992; Doss & Tan, 2015]
• strong convergence properties
• asymptotic approximation of the precision
• connection with bridge sampling and auxiliary model [mixture]
• ...but nothing statistical there [no estimation or data]
[Kong et al., 2003; Chopin & Robert, 2011]
10. Evidence from a posterior sample
Harmonic identity
Ef ϕ(X)f(X)
h(X)f(X)
=
ˆ
ϕ(x)
h(x)f(x)
h(x)f(x)
I
dx
=
1
I
no matter what the proposal density ϕ(θ) is
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
11. Harmonic mean estimator
Constraint opposed to usual importance sampling constraints:
ϕ(x) must have lighter (rather than fatter) tails than h(x)g(x) for
the approximation
I = 1
1
T
T
t=1
ϕ(x(t))
h(x(t))g(x(t))
to have a finite variance [safe harmonic mean]
[Robert & Wraith, 2012]
12. “The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that this
estimator is consistent ie, it will very likely be very close to the correct
answer if you use a sufficiently large number of points from the posterior
distribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to na¨ıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
13. “The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that this
estimator is consistent ie, it will very likely be very close to the correct
answer if you use a sufficiently large number of points from the posterior
distribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to na¨ıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
14. Mixtures as proposals
Design specific mixture for simulation purposes, with density
˜ϕ(x) ∝ ω1h(x)g(x) + ϕ(x) ,
where ϕ(θ) is arbitrary (but normalised)
Note: ω1 is not a probability weight
[Chopin & Robert, 2011]
15. Mixtures as proposals
Design specific mixture for simulation purposes, with density
˜ϕ(x) ∝ ω1h(x)g(x) + ϕ(x) ,
where ϕ(θ) is arbitrary (but normalised)
Note: ω1 is not a probability weight
[Chopin & Robert, 2011]
17. Multiple Importance Sampling
Reycling: given several proposals Q1, . . . , QT , for 1 ≤ t ≤ T
generate an iid sample
xt
1, . . . , xt
N ∼ Qt
and estimate I by
ˆIQ,N (h) = T−1
T
t=1
N−1
N
i=1
h(xt
i)ωt
i
where
ωt
i =
f(xt
i)
qt(xt
i)
correct...
18. Multiple Importance Sampling
Reycling: given several proposals Q1, . . . , QT , for 1 ≤ t ≤ T
generate an iid sample
xt
1, . . . , xt
N ∼ Qt
and estimate I by
ˆIQ,N (h) = T−1
T
t=1
N−1
N
i=1
h(xt
i)ωt
i
where
ωt
i =
f(xt
i)
T−1 T
=1 q (xt
i)
still correct!
19. Mixture representation
Deterministic mixture correction of the weights proposed by Owen
and Zhou (JASA, 2000)
• The corresponding estimator is still unbiased [if not
self-normalised]
• All particles are on the same weighting scale rather than their
own
• Large variance proposals Qt do not take over
• Variance reduction thanks to weight stabilization & recycling
• [k.o.] removes the randomness in the component choice
[=Rao-Blackwellisation]
20. Global adaptation
AMIS algorithm
At iteration t = 1, · · · , T,
1 For 1 ≤ i ≤ N1, generate xt
i ∼ T3(ˆµt−1, ˆΣt−1)
2 Calculate the mixture importance weight of particle xt
i
ωt
i = f xt
i δt
i
where
δt
i =
t−1
l=0
qT (3) xt
i; ˆµl
, ˆΣl
21. Backward reweighting
3 If t ≥ 2, actualize the weights of all past particles, xl
i
1 ≤ l ≤ t − 1
ωl
i = f xt
i δl
i
where
δl
i = δl
i + qT (3) xl
i; ˆµt−1
, ˆΣt−1
4 Compute IS estimates of target mean and variance ˆµt and ˆΣt,
where
ˆµt
j =
t
l=1
N1
i=1
ωl
i(xj)l
i
t
l=1
N1
i=1
ωl
i . . .
[Cornuet et al., 2012; Marin et al., 2017]
22. NPMLE for Monte Carlo
“The task of estimating an integral by Monte Carlo
methods is formulated as a statistical model using
simulated observations as data.
The difficulty in this exercise is that we ordinarily have at
our disposal all of the information required to compute
integrals exactly by calculus or numerical integration, but
we choose to ignore some of the information for
simplicity or computational feasibility.”
[Kong, McCullagh, Meng, Nicolae & Tan, 2003]
23. NPMLE for Monte Carlo
“Our proposal is to use a semiparametric statistical
model that makes explicit what information is ignored
and what information is retained. The parameter space in
this model is a set of measures on the sample space,
which is ordinarily an infinite dimensional object.
None-the-less, from simulated data the base-line measure
can be estimated by maximum likelihood, and the
required integrals computed by a simple formula
previously derived by Geyer and by Meng and Wong using
entirely different arguments.”
[Kong, McCullagh, Meng, Nicolae & Tan, 2003]
24. NPMLE for Monte Carlo
“By contrast with Geyer’s retrospective likelihood, a
correct estimate of simulation error is available directly
from the Fisher information. The principal advantage of
the semiparametric model is that variance reduction
techniques are associated with submodels in which the
maximum likelihood estimator in the submodel may have
substantially smaller variance than the traditional
estimator.”
[Kong, McCullagh, Meng, Nicolae & Tan, 2003]
25. NPMLE for Monte Carlo
“At first glance, the problem appears to be an exercise in calculus
or numerical analysis, and not amenable to statistical formulation”
• use of Fisher information
• non-parametric MLE based on simulations
• comparison of sampling schemes through variances
• Rao–Blackwellised improvements by invariance constraints
26. NPMLE for Monte Carlo
Observing
Yij ∼ Fi(t) = c−1
i
ˆ t
−∞
ωi(x) dF(x)
with ωi known and F unknown
27. NPMLE for Monte Carlo
Observing
Yij ∼ Fi(t) = c−1
i
ˆ t
−∞
ωi(x) dF(x)
with ωi known and F unknown
“Maximum likelihood estimate” defined by weighted empirical cdf
i,j
ωi(yij)p(yij)δyij
maximising in p
ij
c−1
i ωi(yij) p(yij)
28. NPMLE for Monte Carlo
Observing
Yij ∼ Fi(t) = c−1
i
ˆ t
−∞
ωi(x) dF(x)
with ωi known and F unknown
“Maximum likelihood estimate” defined by weighted empirical cdf
i,j
ωi(yij)p(yij)δyij
maximising in p
ij
c−1
i ωi(yij) p(yij)
Result such that
ij
ˆc−1
r ωr(yij)
s nsˆc−1
s ωs(yij)
= 1
[Vardi, 1985]
29. NPMLE for Monte Carlo
Observing
Yij ∼ Fi(t) = c−1
i
ˆ t
−∞
ωi(x) dF(x)
with ωi known and F unknown
Result such that
ij
ˆc−1
r ωr(yij)
s nsˆc−1
s ωs(yij)
= 1
[Vardi, 1985]
Bridge sampling estimator
ij
ˆc−1
r ωr(yij)
s nsˆc−1
s ωs(yij)
= 1
[Gelman & Meng, 1998; Tan, 2004]
30. end of the 2002 discussion
“...essentially every Monte Carlo activity may be interpreted as
parameter estimation by maximum likelihood in a statistical model.
We do not claim that this point of view is necessary; nor do we
seek to establish a working principle from it.”
• restriction to discrete support measures [may be] suboptimal
[Ritov & Bickel, 1990; Robins et al., 1997, 2000, 2003]
• group averaging versions in-between multiple mixture
estimators and quasi-Monte Carlo version
each & Guibas, 1995; Owen & Zhou, 2000; Cornuet et al., 2012; Owen, 2003]
• statistical analogy provides at best narrative thread
31. end of the 2002 discussion
“The hard part of the exercise is to construct a submodel such
that the gain in precision is sufficient to justify the additional
computational effort”
• garden of forking paths, with infinite possibilities
• no free lunch (variance, budget, time)
• Rao–Blackwellisation may be detrimental in Markov setups
33. Accept-Reject
Given a density f(·) to simulate take
g(·) density such that
f(x) ≤ Mg(x)
for M ≥ 1
To simulate X ∼ f, it is sufficient to
generate
Y ∼ g U|Y = y ∼ U(0, Mg(y))
until
0 < u < f(y)
34. Much ado about...
[Exercice 3.33, MCSM]
Raw outcome: id sequences Y1, Y2, . . . , Yt ∼ g and
U1, U2, . . . , Ut ∼ U(0, 1)
Random number of accepted Yi’s
P(N = n) =
n − 1
t − 1
(1/M)t
(1 − 1/M)n−t
,
35. Much ado about...
[Exercice 3.33, MCSM]
Raw outcome: id sequences Y1, Y2, . . . , Yt ∼ g and
U1, U2, . . . , Ut ∼ U(0, 1)
Joint density of (N, Y, U)
P(N = n, Y1 ≤ y1, . . . , Yn ≤ yn, U1 ≤ u1, . . . , Un ≤ un)
=
ˆ yn
−∞
g(tn)(un ∧ wn)dtn
ˆ y1
−∞
. . .
ˆ yn−1
−∞
g(t1) . . . g(tn−1)
×
(i1,··· ,it−1)
t−1
j=1
(wij ∧ uij )
n−1
j=t
(uij − wij )+
dt1 · · · dtn−1,
where wi = f(yi)/Mg(yi) and sum over all subsets of
{1, . . . , n − 1} of size t − 1
36. Much ado about...
[Exercice 3.33, MCSM]
Raw outcome: id sequences Y1, Y2, . . . , Yt ∼ g and
U1, U2, . . . , Ut ∼ U(0, 1)
Marginal joint density of (Yi, Ui)|N = n, i < n
P(N = n, Y1 ≤ y, U1 ≤ u1)
=
n − 1
t − 1
1
M
t−1
1 −
1
M
n−t−1
×
t − 1
n − 1
(w1 ∧ u1) 1 −
1
M
+
n − t
n − 1
(u1 − w1)+ 1
M
ˆ y
−∞
g(t1)dt1
and marginal distribution of Yi
m(y) = t−1/n−1f(y) + n−t/n−1
g(y) − ρf(y)
1 − ρ
P(U1 ≤ w(y)|Y1 = y, N = n) =
g(y)w(y)Mt−1/n−1
m(y)
37. Much ado about noise
Accept-reject sample (X1, . . . , Xm) associated with (U1, . . . , UN )
and (Y1, . . . , YN )
N is stopping time for acceptance of m variables among Yj’s
Rewrite estimator of E[h] as
1
m
m
i=1
h(Xi) =
1
m
N
j=1
h(Yj) IUj≤wj ,
with wj = f(Yj)/Mg(Yj)
[Casella and Robert (1996)]
38. Much ado about noise
Rao-Blackwellisation: smaller variance produced by integrating
out the Ui’s,
1
m
N
j=1
E[IUj≤wj |N, Y1, . . . , YN ] h(Yj) =
1
m
N
i=1
ρih(Yi),
where (i < n)
ρi = P(Ui ≤ wi|N = n, Y1, . . . , Yn)
= wi
(i1,...,im−2)
m−2
j=1 wij
n−2
j=m−1(1 − wij )
(i1,...,im−1)
m−1
j=1 wij
n−1
j=m(1 − wij )
,
and ρn = 1.
Numerator sum over all subsets of {1, . . . , i − 1, i + 1, . . . , n − 1}
of size m − 2, and denominator sum over all subsets of size m − 1
[Casella and Robert (1996)]
39. extension to Metropolis–Hastings case
Sample produced by Metropolis–Hastings algorithm
x(1)
, . . . , x(T)
based on two samples,
y1, . . . , yT
proposals
and u1, . . . , uT
uniforms
[Casella and Robert (1996)]
40. extension to Metropolis–Hastings case
Sample produced by Metropolis–Hastings algorithm
x(1)
, . . . , x(T)
based on two samples,
y1, . . . , yT
proposals
and u1, . . . , uT
uniforms
Ergodic mean rewritten as
δMH
=
1
T
T
t=1
h(x(t)
) =
1
T
T
t=1
h(yt)
T
i=t
Ix(i)=yt
[Casella and Robert (1996)]
41. extension to Metropolis–Hastings case
Sample produced by Metropolis–Hastings algorithm
x(1)
, . . . , x(T)
based on two samples,
y1, . . . , yT
proposals
and u1, . . . , uT
uniforms
Conditional expectation
δRB
=
1
T
T
t=1
h(yt) E
T
i=t
IX(i)
= yt y1, . . . , yT
=
1
T
T
t=1
h(yt)
T
i=t
P(X(i)
= yt|y1, . . . , yT )
with smaller variance
[Casella and Robert (1996)]
42. weight derivation
Take
ρij =
f(yj)/q(yj|yi)
f(yi)/q(yi|yj)
∧ 1 (j > i),
ρij = ρijq(yj+1|yj), ρij
= (1 − ρij)q(yj+1|yi) (i < j < T),
ζjj = 1, ζjt =
t
l=j+1
ρjl
(i < j < T),
τ0 = 1, τj =
j−1
t=0
τtζt(j−1) ρtj, τT =
T−1
t=0
τtζt(T−1)ρtT (i < T),
ωi
T = 1, ωj
i = ρjiωi
i+1 + ρji
ωj
i+1 (0 ≤ j < i < T).
[Casella and Robert (1996)]
43. weight derivation
Theorem
The estimator δRB satisfies
δRB
=
T
i=0 ϕi h(yi)
T−1
i=0 τi ζi(T−1)
,
with (i < T)
ϕi = τi
T−1
j=i
ζijωi
j+1 + ζi(T−1)(1 − ρiT )
and ϕT = τT .
[Casella and Robert (1996)]
45. Some properties of the Metropolis–Hastings algorithm
Alternative representation of Metropolis–Hastings estimator δ as
δ =
1
n
n
t=1
h(x(t)
) =
1
n
Mn
i=1
nih(zi) ,
where
• {x(t)}t original Markov (MCMC) chain with stationary
distribution π(·) and transition q(·|·)
• zi’s are the accepted yj’s
• Mn is the number of accepted yj’s till time n
• ni is the number of times zi appears in the sequence {x(t)}t
46. The ”accepted candidates”
Define
˜q(·|zi) =
α(zi, ·) q(·|zi)
p(zi)
≤
q(·|zi)
p(zi)
where p(zi) =
´
α(zi, y) q(y|zi)dy
To simulate from ˜q(·|zi)
1 Propose a candidate y ∼ q(·|zi)
2 Accept with probability
˜q(y|zi)
q(y|zi)
p(zi)
= α(zi, y)
Otherwise, reject it and starts again.
this is the transition of the HM algorithm
47. The ”accepted candidates”
Define
˜q(·|zi) =
α(zi, ·) q(·|zi)
p(zi)
≤
q(·|zi)
p(zi)
where p(zi) =
´
α(zi, y) q(y|zi)dy
The transition kernel ˜q admits ˜π as a stationary distribution:
˜π(x)˜q(y|x) =
π(x)p(x)
´
π(u)p(u)du
˜π(x)
α(x, y)q(y|x)
p(x)
˜q(y|x)
48. The ”accepted candidates”
Define
˜q(·|zi) =
α(zi, ·) q(·|zi)
p(zi)
≤
q(·|zi)
p(zi)
where p(zi) =
´
α(zi, y) q(y|zi)dy
The transition kernel ˜q admits ˜π as a stationary distribution:
˜π(x)˜q(y|x) =
π(x)α(x, y)q(y|x)
´
π(u)p(u)du
49. The ”accepted candidates”
Define
˜q(·|zi) =
α(zi, ·) q(·|zi)
p(zi)
≤
q(·|zi)
p(zi)
where p(zi) =
´
α(zi, y) q(y|zi)dy
The transition kernel ˜q admits ˜π as a stationary distribution:
˜π(x)˜q(y|x) =
π(y)α(y, x)q(x|y)
´
π(u)p(u)du
50. The ”accepted candidates”
Define
˜q(·|zi) =
α(zi, ·) q(·|zi)
p(zi)
≤
q(·|zi)
p(zi)
where p(zi) =
´
α(zi, y) q(y|zi)dy
The transition kernel ˜q admits ˜π as a stationary distribution:
˜π(x)˜q(y|x) = ˜π(y)˜q(x|y) ,
51. The ”accepted chain”
Lemma (Douc & X., AoS, 2011)
The sequence (zi, ni) satisfies
1 (zi, ni)i is a Markov chain;
2 zi+1 and ni are independent given zi;
3 ni is distributed as a geometric random variable with
probability parameter
p(zi) :=
ˆ
α(zi, y) q(y|zi) dy ; (1)
4 (zi)i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
52. The ”accepted chain”
Lemma (Douc & X., AoS, 2011)
The sequence (zi, ni) satisfies
1 (zi, ni)i is a Markov chain;
2 zi+1 and ni are independent given zi;
3 ni is distributed as a geometric random variable with
probability parameter
p(zi) :=
ˆ
α(zi, y) q(y|zi) dy ; (1)
4 (zi)i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
53. The ”accepted chain”
Lemma (Douc & X., AoS, 2011)
The sequence (zi, ni) satisfies
1 (zi, ni)i is a Markov chain;
2 zi+1 and ni are independent given zi;
3 ni is distributed as a geometric random variable with
probability parameter
p(zi) :=
ˆ
α(zi, y) q(y|zi) dy ; (1)
4 (zi)i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
54. The ”accepted chain”
Lemma (Douc & X., AoS, 2011)
The sequence (zi, ni) satisfies
1 (zi, ni)i is a Markov chain;
2 zi+1 and ni are independent given zi;
3 ni is distributed as a geometric random variable with
probability parameter
p(zi) :=
ˆ
α(zi, y) q(y|zi) dy ; (1)
4 (zi)i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
57. Importance sampling perspective
1 A natural idea:
δ∗
Mn
i=1
h(zi)
p(zi)
Mn
i=1
1
p(zi)
=
Mn
i=1
π(zi)
˜π(zi)
h(zi)
Mn
i=1
π(zi)
˜π(zi)
.
2 But p not available in closed form.
58. Importance sampling perspective
1 A natural idea:
δ∗
Mn
i=1
h(zi)
p(zi)
Mn
i=1
1
p(zi)
=
Mn
i=1
π(zi)
˜π(zi)
h(zi)
Mn
i=1
π(zi)
˜π(zi)
.
2 But p not available in closed form.
3 The geometric ni is the replacement, an obvious solution that
is used in the original Metropolis–Hastings estimate since
E[ni] = 1/p(zi).
59. The Bernoulli factory
The crude estimate of 1/p(zi),
ni = 1 +
∞
j=1 ≤j
I {u ≥ α(zi, y )} ,
can be improved:
Lemma (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y|zi), the quantity
ˆξi = 1 +
∞
j=1 ≤j
{1 − α(zi, y )}
is an unbiased estimator of 1/p(zi) which variance, conditional on
zi, is lower than the conditional variance of ni, {1 − p(zi)}/p2(zi).
60. Rao-Blackwellised, for sure?
ˆξi = 1 +
∞
j=1 ≤j
{1 − α(zi, y )}
1 Infinite sum but finite with at least positive probability:
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon k version:
ˆξk
i = 1 +
∞
j=1 1≤ ≤k∧j
{1 − α(zi, yj)}
k+1≤ ≤j
I {u ≥ α(zi, y )}
61. Rao-Blackwellised, for sure?
ˆξi = 1 +
∞
j=1 ≤j
{1 − α(zi, y )}
1 Infinite sum but finite with at least positive probability:
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon k version:
ˆξk
i = 1 +
∞
j=1 1≤ ≤k∧j
{1 − α(zi, yj)}
k+1≤ ≤j
I {u ≥ α(zi, y )}
62. Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y|zi) and (uj)j is an
iid uniform sequence, for any k ≥ 0, the quantity
ˆξk
i = 1 +
∞
j=1 1≤ ≤k∧j
{1 − α(zi, yj)}
k+1≤ ≤j
I {u ≥ α(zi, y )}
is an unbiased estimator of 1/p(zi) with an almost sure finite
number of terms.
63. Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y|zi) and (uj)j is an
iid uniform sequence, for any k ≥ 0, the quantity
ˆξk
i = 1 +
∞
j=1 1≤ ≤k∧j
{1 − α(zi, yj)}
k+1≤ ≤j
I {u ≥ α(zi, y )}
is an unbiased estimator of 1/p(zi) with an almost sure finite
number of terms. Moreover, for k ≥ 1,
Vˆξk
i zi =
1 − p(zi)
p2(zi)
−
1 − (1 − 2p(zi) + r(zi))k
2p(zi) − r(zi)
2 − p(zi)
p2(zi)
(p(zi) − r(zi)) ,
where p(zi) :=
´
α(zi, y) q(y|zi) dy. and r(zi) :=
´
α2(zi, y) q(y|zi) dy.
64. Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y|zi) and (uj)j is an
iid uniform sequence, for any k ≥ 0, the quantity
ˆξk
i = 1 +
∞
j=1 1≤ ≤k∧j
{1 − α(zi, yj)}
k+1≤ ≤j
I {u ≥ α(zi, y )}
is an unbiased estimator of 1/p(zi) with an almost sure finite
number of terms. Therefore, we have
Vˆξizi ≤ Vˆξk
i zi ≤ Vˆξ0
i zi = Vnizi .
66. Non-informative inference for mixture models
Standard mixture of distributions model
k
i=1
wi f(x|θi) , with
k
i=1
wi = 1 . (1)
[Titterington et al., 1985; Fr¨uhwirth-Schnatter (2006)]
Jeffreys’ prior for mixture not available due to computational
reasons : it has not been tested so far
[Jeffreys, 1939]
Warning: Jeffreys’ prior improper in some settings
[Grazian & Robert, 2017]
67. Non-informative inference for mixture models
Grazian & Robert (2017) consider genuine Jeffreys’ prior for
complete set of parameters in (1), deduced from Fisher’s
information matrix
Computation of prior density costly, relying on many integrals like
ˆ
X
∂2
log
k
i=1 wi f(x|θi)
∂θh∂θj
k
i=1
wi f(x|θi) dx
Integrals with no analytical expression, hence involving numerical
or Monte Carlo (costly) integration
68. Non-informative inference for mixture models
When building Metropolis-Hastings proposal over (wi, θi)’s, prior
ratio more expensive than likelihood and proposal ratios
Suggestion: split the acceptance rule
α(x, y) := 1 ∧ r(x, y), r(x, y) :=
π(y|D)q(y, x)
π(x|D)q(x, y)
into
˜α(x, y) := 1 ∧
f(D|y)q(y, x)
f(D|x)q(x, y)
× 1 ∧
π(y)
π(x)
69. The “Big Data” plague
Simulation from posterior distribution with large sample size n
• Computing time at least of order O(n)
• solutions using likelihood decomposition
n
i=1
(θ|xi)
and handling subsets on different processors (CPU), graphical
units (GPU), or computers
[Korattikara et al. (2013), Scott et al. (2013)]
• no consensus on method of choice, with instabilities from
removing most prior input and uncalibrated approximations
[Neiswanger et al. (2013), Wang and Dunson (2013)]
70. Proposed solution
“There is no problem an absence of decision cannot
solve.” Anonymous
Given α(x, y) := 1 ∧ r(x, y), factorise
r(x, y) =
d
k=1
ρk(x, y)
under constraint ρk(x, y) = ρk(y, x)−1
Delayed Acceptance Markov kernel given by
˜P(x, A) :=
ˆ
A
q(x, y)˜α(x, y)dy + 1 −
ˆ
X
q(x, y)˜α(x, y)dy 1A(x)
where
˜α(x, y) :=
d
k=1
{1 ∧ ρk(x, y)}.
71. Proposed solution
“There is no problem an absence of decision cannot
solve.” Anonymous
Algorithm 1 Delayed Acceptance
To sample from ˜P(x, ·):
1 Sample y ∼ Q(x, ·).
2 For k = 1, . . . , d:
• with probability 1 ∧ ρk(x, y) continue
• otherwise stop and output x
3 Output y
Arrange terms in product so that most computationally intensive
ones calculated ‘at the end’ hence least often
72. Proposed solution
“There is no problem an absence of decision cannot
solve.” Anonymous
Algorithm 1 Delayed Acceptance
To sample from ˜P(x, ·):
1 Sample y ∼ Q(x, ·).
2 For k = 1, . . . , d:
• with probability 1 ∧ ρk(x, y) continue
• otherwise stop and output x
3 Output y
Generalization of Fox and Nicholls (1997) and Christen and Fox
(2005), where testing for acceptance with approximation before
computing exact likelihood first suggested
More recent occurences in literature
[Shestopaloff & Neal, 2013; Golightly, 2014]
73. Potential drawbacks
• Delayed Acceptance efficiently reduces computing cost only
when approximation ˜π is “good enough” or “flat enough”
• Probability of acceptance always smaller than in the original
Metropolis–Hastings scheme
• Decomposition of original data in likelihood bits may however
lead to deterioration of algorithmic properties without
impacting computational efficiency...
• ...e.g., case of a term explosive in x = 0 and computed by
itself: leaving x = 0 near impossible
74. Potential drawbacks
Figure: (left) Fit of delayed Metropolis–Hastings algorithm on a
Beta-binomial posterior p|x ∼ Be(x + a, n + b − x) when N = 100,
x = 32, a = 7.5 and b = .5. Binomial B(N, p) likelihood replaced with
product of 100 Bernoulli terms. Histogram based on 105
iterations, with
overall acceptance rate of 9%; (centre) raw sequence of p’s in Markov
chain; (right) autocorrelogram of the above sequence.
75. The “Big Data” plague
Delayed Acceptance intended for likelihoods or priors, but
not a clear solution for “Big Data” problems
1 all product terms must be computed
2 all terms previously computed either stored for future
comparison or recomputed
3 sequential approach limits parallel gains...
4 ...unless prefetching scheme added to delays
[Angelino et al. (2014), Strid (2010)]
76. Validation of the method
Lemma (1)
For any Markov chain with transition kernel Π of the form
Π(x, A) =
ˆ
A
q(x, y)a(x, y)dy + 1 −
ˆ
X
q(x, y)a(x, y)dy 1A(x),
and satisfying detailed balance, the function a(·) satisfies (for
π-a.e. x, y)
a(x, y)
a(y, x)
= r(x, y).
77. Validation of the method
Lemma (2)
( ˜Xn)n≥1, the Markov chain associated with ˜P, is a π-reversible
Markov chain.
Proof.
From Lemma 1 we just need to check that
˜α(x, y)
˜α(y, x)
=
d
k=1
1 ∧ ρk(x, y)
1 ∧ ρk(y, x)
=
d
k=1
ρk(x, y) = r(x, y),
since ρk(y, x) = ρk(x, y)−1 and (1 ∧ a)/(1 ∧ a−1) = a
78. Comparisons of P and ˜P
The acceptance probability ordering
˜α(x, y) =
d
k=1
{1∧ρk(x, y)} ≤ 1∧
d
k=1
ρk(x, y) = 1∧r(x, y) = α(x, y),
follows from (1 ∧ a)(1 ∧ b) ≤ (1 ∧ ab) for a, b ∈ R+.
Remark
By construction of ˜P,
var(f, P) ≤ var(f, ˜P)
for any f ∈ L2(X, π), using Peskun ordering (Peskun, 1973,
Tierney, 1998), since ˜α(x, y) ≤ α(x, y) for any (x, y) ∈ X2.
79. Comparisons of P and ˜P
Condition (1)
Defining A := {(x, y) ∈ X2 : r(x, y) ≥ 1}, there exists c such that
inf(x,y)∈A mink∈{1,...,d} ρk(x, y) ≥ c.
Ensures that when
α(x, y) = 1
then acceptance probability ˜α(x, y) uniformly lower-bounded by
positive constant.
Reversibility implies ˜α(x, y) uniformly lower-bounded by a constant
multiple of α(x, y) for all x, y ∈ X.
80. Comparisons of P and ˜P
Condition (1)
Defining A := {(x, y) ∈ X2 : r(x, y) ≥ 1}, there exists c such that
inf(x,y)∈A mink∈{1,...,d} ρk(x, y) ≥ c.
Proposition (1)
Under Condition (1), Lemma 34 in Andrieu & Lee (2013) implies
Gap ˜P ≥ Gap P and
var f, ˜P ≤ ( −1
− 1)varπ(f) + −1
var (f, P)
with f ∈ L2
0(E, π), = cd−1.
81. Comparisons of P and ˜P
Proposition (1)
Under Condition (1), Lemma 34 in Andrieu & Lee (2013) implies
Gap ˜P ≥ Gap P and
var f, ˜P ≤ ( −1
− 1)varπ(f) + −1
var (f, P)
with f ∈ L2
0(E, π), = cd−1.
Hence if P has right spectral gap, then so does ˜P.
Plus, quantitative bounds on asymptotic variance of MCMC
estimates using ( ˜Xn)n≥1 in relation to those using (Xn)n≥1
available
82. Comparisons of P and ˜P
Easiest use of above: modify any candidate factorisation
Given factorisation of r
r(x, y) =
d
k=1
˜ρk(x, y) ,
satisfying the balance condition, define a sequence of functions ρk
such that both r(x, y) = d
k=1 ρk(x, y) and Condition 1 holds.
83. Comparisons of P and ˜P
Take c ∈ (0, 1], define b = c
1
d−1 and set
˜ρk(x, y) := min
1
b
, max {b, ρk(x, y)} , k ∈ {1, . . . , d − 1},
and
˜ρd(x, y) :=
r(x, y)
d−1
k=1 ˜ρk(x, y)
.
Then:
Proposition (2)
Under this scheme, previous proposition holds with
= c2
= b2(d−1)
.
85. Variance reduction
Given estimators I1, . . . , Id of the same integral I, optimal
combination
ˆI =
d
i=1
ˆωiIi
where
ˆω = argω min
i ωi=1
ωT
Σ−1
ω
with Σ covariance matrix of (I1, . . . , Id)
covariance matrix estimated from samples behind the I ’s
86. Variance reduction
Given estimators I1, . . . , Id of the same integral I, optimal
combination
ˆI =
d
i=1
ˆωiIi
where
ˆω = argω min
i ωi=1
ωT
Σ−1
ω
with Σ covariance matrix of (I1, . . . , Id)
covariance matrix estimated from samples behind the I ’s
87. Issues
• Linear combination of estimators cannot exclude outliers and
infinite variance estimators
• Worse: retains infinite variance from any component
• Ignore case of estimators based on identical simulations (e.g.,
multiple AMIS solutions)
• Does not measure added value of new estimator
88. Issues
Comparison of several combinations of three importance samples
of length 103 [olive stands for h-median cdf]
89. Combining via cdfs
Revisit Monte Carlo estimates as expectations of distribution
estimates
Collection of estimators by-produces evaluations of cdf of h(X) as
ˆFj(h) =
n
i=1
ωijIh(xij<h)
New perspective on combining cdf estimates with different
distributions and potential correlations Average of cdfs poor choice
(infinite variance, variance ignored)
90. Combining via cdfs
Revisit Monte Carlo estimates as expectations of distribution
estimates
Collection of estimators by-produces evaluations of cdf of h(X) as
ˆFj(h) =
n
i=1
ωijIh(xij<h)
New perspective on combining cdf estimates with different
distributions and potential correlations Average of cdfs poor choice
(infinite variance, variance ignored)
91. Combining via cdf median
Towards robustness and elimination of infinite variance, use of
[vertical] median cdf
ˆFmed(h) = medianj
ˆFj(h)
−20 −15 −10 −5 0 5 10 15
0.00.20.40.60.81.0
ecdf7
which does not account for correlation and repetitions
92. Combining via cdf median
Alternative horizontal median cdf
ˆF−1
hmed(α) = medianj
ˆF−1
j (α)
which does not show visible variation from vertical median