SlideShare a Scribd company logo
Unbiased Bayes for Big Data:
Paths of Partial Posteriors
Heiko Strathmann
Gatsby Unit, University College London
Oxford ML lunch, February 25, 2015
Joint work
Being Bayesian: Averaging beliefs of the unknown
φ =
ˆ
dθϕ(θ) p(θ|D)
posterior
where p(θ|D) ∝ p(D|θ)
likelihood data
p(θ)
prior
Metropolis Hastings Transition Kernel
Target π(θ) ∝ p(θ|D)
At iteration j + 1, state θ(j)
Propose θ ∼ q θ|θ(j)
Accept θ(j+1)
← θ with probability
min
π(θ )
π(θ(j))
×
q(θ(j)
|θ )
q(θ |θ(j))
, 1
Reject θ(j+1)
← θ(j)
otherwise.
Big D & MCMC
Need to evaluate
p(θ|D) ∝ p(D|θ)p(θ)
in every iteration.
For example, for D = {x1, . . . , xN},
p(D|θ) =
N
i=1
p(xi|θ)
Infeasible for growing N
Lots of current research: Can we use subsets of D?
Desiderata for Bayesian estimators
1. No (additional) bias
2. Finite & controllable variance
3. Computational costs sub-linear in N
4. No problems with transition kernel design
Outline
Literature Overview
Partial Posterior Path Estimators
Experiments & Extensions
Discussion
Outline
Literature Overview
Partial Posterior Path Estimators
Experiments & Extensions
Discussion
Stochastic gradient Langevin (Welling & Teh 2011)
θ =
2
θ=θ(j) log p(θ) + θ=θ(j)
N
i=1
log p(xi|θ) + ηj
Two changes:
1. Noisy gradients with mini-batches. Let I ⊆ {1, . . . , N}
and use log-likelihood gradient
θ=θ(j)
i∈I
log p(xi|θ)
2. Don't evaluate MH ratio, but always accept, decrease
step-size/noise j → 0 to compensate
∞
i=1
i = ∞
∞
i=1
2
i < ∞
Austerity (Korattikara, Chen, Welling 2014)
Idea: rewrite MH ratio as hypothesis test
At iteration j, draw u ∼ Uniform[0, 1] and compute
µ0 =
1
N log u ×
p(θ(j)
)
p(θ )
×
q(θ |θ(j)
)
q(θ(j)|θ )
µ =
1
N
N
i=1
li li := log p(xi|θ ) − log p(xi|θ(j)
)
Accept if µ > µ0; reject otherwise
Subsample the li, central limit theorem, t-test
Increase data if no signicance, multiple testing correction
Bardenet, Doucet, Holmes 2014
Similar to Austerity, but with analysis:
Concentration bounds for MH (CLT might not hold)
Bound for probability of wrong decision
For uniformly ergodic original kernel
Approximate kernel converges
Bound for TV distance of approximation and target
Limitations:
Still approximate
Only random walk
Uses all data on hard (?) problems
Firey MCMC (Maclaurin  Adams 2014)
First asymptotically exact MCMC kernel using
sub-sampling
Augment state space with binary indcator variables
Only few data bright
Dark points approximated by a lower bound on likelihood
Limitations:
Bound might not be available
Loose bounds → worse than standard MCMC→ need
MAP estimate
Linear in N. Likelihood evaluations at least qdark→bright · N
Mixing time cannot be better than 1/qdark→bright
Alternative transition kernels
Existing methods construct alternative transition kernels.
(Welling  Teh 2011), (Korattikara, Chen, Welling 2014), (Bardenet, Doucet, Holmes 2014)
(Maclaurin  Adams 2014), (Chen, Fox, Guestrin 2014).
They
use mini-batches
inject noise
augment the state space
make clever use of approximations
Problem: Most methods
are biased
have no convergence guarantees
mix badly
Reminder: Where we came from  expectations
Ep(θ|D) {ϕ(θ)} ϕ : Θ → R
Idea: Assuming the goal is estimation, give up on simulation.
Outline
Literature Overview
Partial Posterior Path Estimators
Experiments  Extensions
Discussion
Idea Outline
1. Construct partial posterior distributions
2. Compute partial expectations (biased)
3. Remove bias
Note:
No simulation from p(θ|D)
Partial posterior expectations less challenging
Exploit standard MCMC methodology  engineering
But not restricted to MCMC
Disclaimer
Goal is not to replace posterior sampling, but to provide a ...
dierent perspective when the goal is estimation
Method does not do uniformly better than MCMC, but ...
we show cases where computational gains can be achieved
Partial Posterior Paths
Model p(x, θ) = p(x|θ)p(θ), data D = {x1, . . . , xN}
Full posterior πN := p(θ|D) ∝ p(x1, . . . , xN|θ)p(θ)
L subsets Dl of sizes |Dl| = nl
Here: n1 = a, n2 = 2
1
a, n3 = 2
2
a, . . . , nL = 2
L−1
a
Partial posterior ˜πl := p(Dl|θ) ∝ p(Dl|θ)p(θ)
Path from prior to full posterior
p(θ) = ˜π0 → ˜π1 → ˜π2 → · · · → ˜πL = πN = p(D|θ)
Gaussian Mean, Conjugate Prior
−5 −4 −3 −2 −1 0 1 2 3
µ1
−3
−2
−1
0
1
2
3
4
µ2
Prior
1/100
2/100
4/100
8/100
16/100
32/100
64/100
100/100
Partial posterior path statistics
For partial posterior paths
p(θ) = ˜π0 → ˜π1 → ˜π2 → · · · → ˜πL = πN = p(D|θ)
dene a sequence {φt}∞
t=1
as
φt := ˆE˜πt{ϕ(θ)} t  L
φt := φ := ˆEπN{ϕ(θ)} t ≥ L
This gives
φ1 → φ2 → · · · → φL = φ
ˆE˜πt{ϕ(θ)} is empirical estimate. Not necessarily MCMC.
Debiasing Lemma (Rhee  Glynn 2012, 2014)
φ and {φt}∞
t=1
real-valued random variables. Assume
lim
t→∞
E |φt − φ|2
= 0
T integer rv with P [T ≥ t]  0 for t ∈ N
Assume
∞
t=1
E |φt−1 − φ|2
P [T ≥ t]
 ∞
Unbiased estimator of E{φ}
φ∗
T =
T
t=1
φt − φt−1
P [T ≥ t]
Here: P [T ≥ t] = 0 for t  L since φt+1 − φt = 0
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Algorithm illustration
0 2 4 6
µ2
−2
−1
0
1
2
3
4
µ1
Prior mean
Debiasing estimate 1
R
R
r=1 ϕ∗
r
True Posterior mean
Debiasing estimates ϕ∗
r
Computational complexity
Assume geometric batch size increase nt and truncation
probabilities
Λt := P(T = t) ∝ 2
−αt
α ∈ (0, 1)
Average computational cost sub-linear
O a N
a
1−α
Variance-computation tradeos in Big Data
Variance
E (φ∗
T )2
=
∞
t=1
E {|φt−1 − φ|2
} − E {|φt − φ|2
}
P [T ≥ t]
If we assume ∀t ≤ L, there is a constant c and β  0 s.t.
E |φt−1 − φ|2
≤
c
nβ
t
and furthermore α  β, then
L
t=1
E |φt−1 − φ|2
P [T ≥ t]
= O(1)
and variance stays bounded as N → ∞.
Outline
Literature Overview
Partial Posterior Path Estimators
Experiments  Extensions
Discussion
Synthetic log-Gaussian
101 102 103 104 105
Number of data nt
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
log N(0, σ2
), posterior mean σ
(Bardenet, Doucet, Holmes 2014)  all data
(Korattikara, Chen, Welling 2014)  wrong result
Synthetic log-Gaussian  debiasing
0 50 100 150 200 250 300
Replication r
1.0
1.2
1.4
1.6
1.8
2.0
Running1
R
R
r=1ϕ∗
r
log N(µ, σ2
), posterior mean σ
0
2000
4000
6000
Tr
t=1nt
Used Data
Truly large-scale version: N ≈ 10
8
Sum of likelihood evaluations: ≈ 0.25N
Non-factorising likelihoods
No need for
p(D|θ) =
N
i=1
p(xi|θ)
Example: Approximate Gaussian Process regression
Estimate predictive mean
k∗ (K + λI)−1
y
No MCMC (!)
Toy example
N = 10
4
, D = 1
m = 100 random Fourier features (Rahimi, Recht, 2007)
Predictive mean on 1000 test data
101 102 103 104
Number of data nt
0.0
0.2
0.4
0.6
0.8
1.0
1.2
MSE
GP Regression, predictive mean
0 50 100 150 200
Replication R
0.0
0.5
1.0
MSE
Average cost: 469
MSE Convergence Debiasing
Gaussian Processes for Big Data
(Hensman, Fusi, Lawrence, 2013): SVI  inducing variables
Airtime delays, N = 700, 000, D = 8
Estimate predictive mean on 100, 000 test data
0 20 40 60 80 100
Replication R
30
40
50
60
70
RMSE
Average cost: 2773
Outline
Literature Overview
Partial Posterior Path Estimators
Experiments  Extensions
Discussion
Conclusions
If goal is estimation rather than simulation, we arrive at
1. No bias
2. Finite  controllable variance
3. Data complexity sub-linear in N
4. No problems with transition kernel design
Practical:
Not limited to MCMC
Not limited to factorising likelihoods
Competitiveinitial results
Parallelisable, re-uses existing engineering eort
Still biased?
MCMC and nite time
MCMC estimator ˆE˜πt{ϕ(θ)} is not unbiased
Could imagine two-stage process
Apply debiasing to MC estimator
Use to debias partial posterior path
Need conditions on MC convergence to control variance,
(Agapiou, Roberts, Vollmer, 2014)
Memory restrictions
Partial posterior expectations need be computable
Memory limitations cause bias
e.g. large-scale GMRF (Lyne et al, 2014)
Free lunch? Not uniformly better than MCMC
Need P [T ≥ t]  0 for all t
Negative example: a9a dataset (Welling  Teh, 2011)
N ≈ 32, 000
Converges, but full posterior sampling likely
0 50 100 150 200
Replication r
−4
−2
0
2
β1
Useful for very large (redundant) datasets
Xi'an's og, Feb 2015
Discussion of M. Betancourt's note on HMC and subsampling.
...the information provided by the whole data is only available
when looking at the whole data.
See http://goo.gl/bFQvd6
Thank you
Questions?

More Related Content

What's hot

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
Christian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
Christian Robert
 
Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
Christian Robert
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 

What's hot (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 

Similar to Unbiased Bayes for Big Data

Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Matt Moores
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Anmol Dwivedi
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
RayKim51
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
FredrikRonquist
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
Chiheb Ben Hammouda
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
Jagadeeswaran Rathinavel
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
The Statistical and Applied Mathematical Sciences Institute
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
Jagadeeswaran Rathinavel
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
Frank Nielsen
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
Alexander Litvinenko
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
Christian Robert
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
Alexander Litvinenko
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
Sean Meyn
 

Similar to Unbiased Bayes for Big Data (20)

Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
Christian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
Christian Robert
 

More from Christian Robert (20)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 

Recently uploaded

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 

Recently uploaded (20)

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 

Unbiased Bayes for Big Data

  • 1. Unbiased Bayes for Big Data: Paths of Partial Posteriors Heiko Strathmann Gatsby Unit, University College London Oxford ML lunch, February 25, 2015
  • 3. Being Bayesian: Averaging beliefs of the unknown φ = ˆ dθϕ(θ) p(θ|D) posterior where p(θ|D) ∝ p(D|θ) likelihood data p(θ) prior
  • 4. Metropolis Hastings Transition Kernel Target π(θ) ∝ p(θ|D) At iteration j + 1, state θ(j) Propose θ ∼ q θ|θ(j) Accept θ(j+1) ← θ with probability min π(θ ) π(θ(j)) × q(θ(j) |θ ) q(θ |θ(j)) , 1 Reject θ(j+1) ← θ(j) otherwise.
  • 5. Big D & MCMC Need to evaluate p(θ|D) ∝ p(D|θ)p(θ) in every iteration. For example, for D = {x1, . . . , xN}, p(D|θ) = N i=1 p(xi|θ) Infeasible for growing N Lots of current research: Can we use subsets of D?
  • 6. Desiderata for Bayesian estimators 1. No (additional) bias 2. Finite & controllable variance 3. Computational costs sub-linear in N 4. No problems with transition kernel design
  • 7. Outline Literature Overview Partial Posterior Path Estimators Experiments & Extensions Discussion
  • 8. Outline Literature Overview Partial Posterior Path Estimators Experiments & Extensions Discussion
  • 9. Stochastic gradient Langevin (Welling & Teh 2011) θ = 2 θ=θ(j) log p(θ) + θ=θ(j) N i=1 log p(xi|θ) + ηj Two changes: 1. Noisy gradients with mini-batches. Let I ⊆ {1, . . . , N} and use log-likelihood gradient θ=θ(j) i∈I log p(xi|θ) 2. Don't evaluate MH ratio, but always accept, decrease step-size/noise j → 0 to compensate ∞ i=1 i = ∞ ∞ i=1 2 i < ∞
  • 10. Austerity (Korattikara, Chen, Welling 2014) Idea: rewrite MH ratio as hypothesis test At iteration j, draw u ∼ Uniform[0, 1] and compute µ0 = 1 N log u × p(θ(j) ) p(θ ) × q(θ |θ(j) ) q(θ(j)|θ ) µ = 1 N N i=1 li li := log p(xi|θ ) − log p(xi|θ(j) ) Accept if µ > µ0; reject otherwise Subsample the li, central limit theorem, t-test Increase data if no signicance, multiple testing correction
  • 11. Bardenet, Doucet, Holmes 2014 Similar to Austerity, but with analysis: Concentration bounds for MH (CLT might not hold) Bound for probability of wrong decision For uniformly ergodic original kernel Approximate kernel converges Bound for TV distance of approximation and target Limitations: Still approximate Only random walk Uses all data on hard (?) problems
  • 12. Firey MCMC (Maclaurin Adams 2014) First asymptotically exact MCMC kernel using sub-sampling Augment state space with binary indcator variables Only few data bright Dark points approximated by a lower bound on likelihood Limitations: Bound might not be available Loose bounds → worse than standard MCMC→ need MAP estimate Linear in N. Likelihood evaluations at least qdark→bright · N Mixing time cannot be better than 1/qdark→bright
  • 13. Alternative transition kernels Existing methods construct alternative transition kernels. (Welling Teh 2011), (Korattikara, Chen, Welling 2014), (Bardenet, Doucet, Holmes 2014) (Maclaurin Adams 2014), (Chen, Fox, Guestrin 2014). They use mini-batches inject noise augment the state space make clever use of approximations Problem: Most methods are biased have no convergence guarantees mix badly
  • 14. Reminder: Where we came from expectations Ep(θ|D) {ϕ(θ)} ϕ : Θ → R Idea: Assuming the goal is estimation, give up on simulation.
  • 15. Outline Literature Overview Partial Posterior Path Estimators Experiments Extensions Discussion
  • 16. Idea Outline 1. Construct partial posterior distributions 2. Compute partial expectations (biased) 3. Remove bias Note: No simulation from p(θ|D) Partial posterior expectations less challenging Exploit standard MCMC methodology engineering But not restricted to MCMC
  • 17. Disclaimer Goal is not to replace posterior sampling, but to provide a ... dierent perspective when the goal is estimation Method does not do uniformly better than MCMC, but ... we show cases where computational gains can be achieved
  • 18. Partial Posterior Paths Model p(x, θ) = p(x|θ)p(θ), data D = {x1, . . . , xN} Full posterior πN := p(θ|D) ∝ p(x1, . . . , xN|θ)p(θ) L subsets Dl of sizes |Dl| = nl Here: n1 = a, n2 = 2 1 a, n3 = 2 2 a, . . . , nL = 2 L−1 a Partial posterior ˜πl := p(Dl|θ) ∝ p(Dl|θ)p(θ) Path from prior to full posterior p(θ) = ˜π0 → ˜π1 → ˜π2 → · · · → ˜πL = πN = p(D|θ)
  • 19. Gaussian Mean, Conjugate Prior −5 −4 −3 −2 −1 0 1 2 3 µ1 −3 −2 −1 0 1 2 3 4 µ2 Prior 1/100 2/100 4/100 8/100 16/100 32/100 64/100 100/100
  • 20. Partial posterior path statistics For partial posterior paths p(θ) = ˜π0 → ˜π1 → ˜π2 → · · · → ˜πL = πN = p(D|θ) dene a sequence {φt}∞ t=1 as φt := ˆE˜πt{ϕ(θ)} t L φt := φ := ˆEπN{ϕ(θ)} t ≥ L This gives φ1 → φ2 → · · · → φL = φ ˆE˜πt{ϕ(θ)} is empirical estimate. Not necessarily MCMC.
  • 21. Debiasing Lemma (Rhee Glynn 2012, 2014) φ and {φt}∞ t=1 real-valued random variables. Assume lim t→∞ E |φt − φ|2 = 0 T integer rv with P [T ≥ t] 0 for t ∈ N Assume ∞ t=1 E |φt−1 − φ|2 P [T ≥ t] ∞ Unbiased estimator of E{φ} φ∗ T = T t=1 φt − φt−1 P [T ≥ t] Here: P [T ≥ t] = 0 for t L since φt+1 − φt = 0
  • 22. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 23. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 24. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 25. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 26. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 27. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 28. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean
  • 29. Algorithm illustration 0 2 4 6 µ2 −2 −1 0 1 2 3 4 µ1 Prior mean Debiasing estimate 1 R R r=1 ϕ∗ r True Posterior mean Debiasing estimates ϕ∗ r
  • 30. Computational complexity Assume geometric batch size increase nt and truncation probabilities Λt := P(T = t) ∝ 2 −αt α ∈ (0, 1) Average computational cost sub-linear O a N a 1−α
  • 31. Variance-computation tradeos in Big Data Variance E (φ∗ T )2 = ∞ t=1 E {|φt−1 − φ|2 } − E {|φt − φ|2 } P [T ≥ t] If we assume ∀t ≤ L, there is a constant c and β 0 s.t. E |φt−1 − φ|2 ≤ c nβ t and furthermore α β, then L t=1 E |φt−1 − φ|2 P [T ≥ t] = O(1) and variance stays bounded as N → ∞.
  • 32. Outline Literature Overview Partial Posterior Path Estimators Experiments Extensions Discussion
  • 33. Synthetic log-Gaussian 101 102 103 104 105 Number of data nt 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 log N(0, σ2 ), posterior mean σ (Bardenet, Doucet, Holmes 2014) all data (Korattikara, Chen, Welling 2014) wrong result
  • 34. Synthetic log-Gaussian debiasing 0 50 100 150 200 250 300 Replication r 1.0 1.2 1.4 1.6 1.8 2.0 Running1 R R r=1ϕ∗ r log N(µ, σ2 ), posterior mean σ 0 2000 4000 6000 Tr t=1nt Used Data Truly large-scale version: N ≈ 10 8 Sum of likelihood evaluations: ≈ 0.25N
  • 35. Non-factorising likelihoods No need for p(D|θ) = N i=1 p(xi|θ) Example: Approximate Gaussian Process regression Estimate predictive mean k∗ (K + λI)−1 y No MCMC (!)
  • 36. Toy example N = 10 4 , D = 1 m = 100 random Fourier features (Rahimi, Recht, 2007) Predictive mean on 1000 test data 101 102 103 104 Number of data nt 0.0 0.2 0.4 0.6 0.8 1.0 1.2 MSE GP Regression, predictive mean 0 50 100 150 200 Replication R 0.0 0.5 1.0 MSE Average cost: 469 MSE Convergence Debiasing
  • 37. Gaussian Processes for Big Data (Hensman, Fusi, Lawrence, 2013): SVI inducing variables Airtime delays, N = 700, 000, D = 8 Estimate predictive mean on 100, 000 test data 0 20 40 60 80 100 Replication R 30 40 50 60 70 RMSE Average cost: 2773
  • 38. Outline Literature Overview Partial Posterior Path Estimators Experiments Extensions Discussion
  • 39. Conclusions If goal is estimation rather than simulation, we arrive at 1. No bias 2. Finite controllable variance 3. Data complexity sub-linear in N 4. No problems with transition kernel design Practical: Not limited to MCMC Not limited to factorising likelihoods Competitiveinitial results Parallelisable, re-uses existing engineering eort
  • 40. Still biased? MCMC and nite time MCMC estimator ˆE˜πt{ϕ(θ)} is not unbiased Could imagine two-stage process Apply debiasing to MC estimator Use to debias partial posterior path Need conditions on MC convergence to control variance, (Agapiou, Roberts, Vollmer, 2014) Memory restrictions Partial posterior expectations need be computable Memory limitations cause bias e.g. large-scale GMRF (Lyne et al, 2014)
  • 41. Free lunch? Not uniformly better than MCMC Need P [T ≥ t] 0 for all t Negative example: a9a dataset (Welling Teh, 2011) N ≈ 32, 000 Converges, but full posterior sampling likely 0 50 100 150 200 Replication r −4 −2 0 2 β1 Useful for very large (redundant) datasets
  • 42. Xi'an's og, Feb 2015 Discussion of M. Betancourt's note on HMC and subsampling. ...the information provided by the whole data is only available when looking at the whole data. See http://goo.gl/bFQvd6