SlideShare a Scribd company logo
1 of 73
Download to read offline
Stochastic Gradient MCMC for
Independent and Correlated Data
Yi-An (Yian) Ma
University of California, Berkeley
Tianqi ChenEmily Fox Nick Foti Felix Ye
Issue = Efficiency
multiple modes
strongly correlated
across dimensions
parameters observations
complex
Goal: Large-Data Posteriors
Goal: Large-Data Posteriors
largeparameters observations
Issue = Scalability
E.g.:
Wikipedia corpus analysis,
Human genome sequence,
Ion channel recordings
Classical Approach:
MCMC via Jump Processes
“Standard” method = Metropolis-Hastings (MH)
– Propose θ’ from kernel depending on past value θ
– Accept or reject
Example of a jump process
Often, inefficiently explores posterior
Continuous dynamic based samplers
aka Grad-MCMC
Use (stochastic) dynamics on energy landscape
to simulate distant proposal
Example: Hamiltonian Monte Carlo (HMC)
Hamiltonian
(total energy H)
Target posterior of θ
Add auxiliary “momentum”
variable r
Focus on Potential Energy
Kinetic Energy
Simulate Hamiltonian Dynamics
Use Hamiltonian dynamics to collect samples on a fixed
(continuous-time) interval
. . .
Resample Momentum for Ergodicity
Continuous Dynamic-Based Samplers
Dynamic 1 Dynamic 2 Dynamic 3
invariant under these dynamics
Simulated samples are desired posterior samples
Want:
(+ ergodicity)
Is there a general
recipe for
construction?
All Continuous
Markov Processes
Exploring “Correct” Dynamics
Processes with
ps(θ) = π(θ)
Langevin
dynamics (LD)
HMC
Riemann
Manifold
HMC
Riemann Manifold LD
Assume target distribution
A Recipe for Continuous Dynamics MCMC
d-dim Wiener process
parameters auxiliary vars
Target total energy
Ma, Chen, Fox, NIPS 2015.
SDE
PSD D
skew-sym Q
total energy H
is invariant
Assume target distribution
parameters auxiliary vars
Target total energy
Ma, Chen, Fox, NIPS 2015.
A Recipe for Continuous Dynamics MCMC
Recipe is Complete
All Continuous
Markov Processes
ps(z) =
π(z)
SDE defined by
D(z),
Q(z)
Ma, Chen, Fox, NIPS 2015.
All existing samplers can
be written in framework:
– HMC
– Riemann HMC
– LD
– Riemann LD
Any valid sampler
has a D and Q
in our framework
Continuous dynamic based samplers
aka Grad-MCMC
The nitty gritty practical issues
A Practical Algorithm
Consider ε – discretization
Some discretization error
m steps
Construct irreversible MH to correct this bias
Example: Metropolis-Hastings
Propose sample using
proposal dist. q
Calculate
accept-reject
ratio
Ratio of reverse
and forward
proposals
Accept or reject
Irreversible Jump: Naïve Revision
f
f
g
Forward proposal
Reverse proposal
Leads to wrong stationary distribution
Naïve Irreversible
Correcting the Algorithm
Jump to
adjoint process
upon rejection
Introduce auxiliary variable zp ~ U{1,-1}:
zp = 1
zp = -1
I-Jump Algorithm: Ma, Fox, Chen, Wu, arXiv 2016.
Irreversible MALA Algorithm
Continuous dynamical process
(e.g., irreversible SDE)
Adjoint process
Use as
f(z|y)
Use as
g(z|y)
Compute for one
step of dynamics
I-MALA Algorithm: Ma, Fox, Chen, Wu, arXiv 2016.
Q → - Q
Consider ε – discretization
m steps
Must compute gradient!
– Computations costly for large data
– Cannot handle streaming data
DATA
A Practical Algorithm
Scaling Grad-MCMC:
Handling large datasets efficiently
Scaling up: Stochastic Gradients
Compute noisy gradient based on minibatch
consisting of n i.i.d. observations:
– For minibatches sampled uniformly at random from data,
Only requires examining n data points
True gradientNoisy gradient
DATAand we assume (appealing to CLT):
Scalable Version of Algorithm
Original update rule:
Modified stochastic gradient update rule:
As , SG noise decreases and bias  0
Use small, finite in practice (allow some bias)
Subtract estimate of variance of SG noise
Example D and Q of Past Algorithms
D(z)
Q(z)
(D, Q) Space not
previously
explored
SGHMC:
SGLD:
SGRLD:
SGNHT:
SGLD
SGRLD
• Use existing D, Q building blocks to define new samplers
• SGRHMC existence previously only speculated
– Naïve(ish) approach has wrong stationary distribution
• Take D and Q of SGHMC and make state dependent
SGRiemannHMC
Ma, Chen, Fox, NIPS 2015.
State-dependent
pos. def. matrix
Perplexity
Iteration
SGLD
SGHMC
SGRHMC
SGRLD
Applied SGRHMC (using Fisher info metric) to online LDA
– Latent Dirichlet allocation (LDA) = mixed membership
document model
Scraped Wikipedia entries in a streaming manner
– Each entry was analyzed on-the-fly
Streaming Wikipedia Analysis
Ma, Chen, Fox, NIPS 2015.
Step size
selected via
grid search
Scaling Approach 2:
Stochastic Gradient MCMC
for Correlated datasets
Scaling up: Stochastic Gradients
Compute noisy gradient based on minibatch
consisting of n i.i.d. observations:
– For minibatches sampled uniformly at random from data,
Only requires examining n data points
DATA
Welling, Teh, ICML 2011. Chen, Fox, Guestrin, ICML 2014.
and we assume (appealing to CLT):
Scaling up: Stochastic Gradients
Compute noisy gradient based on minibatch
consisting of n i.i.d. observations:
– For minibatches sampled uniformly at random from data,
Only requires examining n data points
DATA
Welling, Teh, ICML 2011. Chen, Fox, Guestrin, ICML 2014.
and we assume (appealing to CLT):
i.i.d.
What about non i.i.d data?
E.g., time series:
:
OK
. . . .
How?
:
Hidden Markov Models (HMMs)
discrete state sequence
observations
transition probabilities,
observation parameters
Batch learning for HMMs:
A quick review
Batch Learning for HMMs
Use current to form local state beliefs:
– Propagate info forwards to form
Use current to form local state beliefs:
– Propagate info backwards
Batch Learning for HMMs
Batch Learning for HMMs
Combine to form smoothed local state belief:
Given local beliefs, update global parameter
Batch Learning for HMMs
Issue: Cost is O(K2T) per global update!
Costly when using uninformed initializations
or observations are redundant
Minibatch learning for HMMs
via SG-MCMC
Why is this not so straightforward?
SG-MCMC assumes continuous parameter space
– Typical HMM MCMC algorithms iterate on
(sampling) latent discrete-valued state sequence
Need to prove that correct stationary distribution
is maintained in presence of:
– Incomplete observations in subsequences
– Mutually correlated subsequences per minibatch
Ma, Foti, Fox, ICML 2017
Marginal Likelihood Representation
Marginalize x
Marginal Likelihood Representation
…
Rewriting in terms of a specific subsequence
…
…
Rewriting in terms of a specific subsequence
Random dynamical system:
Synchronizes as
Ye, Ma, Qian, arXiv 2017
Potential energy for Grad-MCMC
Issues with the gradient computation
q,π calculations
involve touching
(nearly) all T obs!
sum over all
subsequences
A Stochastic Gradient Approach
q,π calculations
involve touching
(nearly) all T obs!
sum over all
subsequences
𝑠
Approximating Gradient Terms with Buffering
𝑠
𝑠
Approximating Gradient Terms with Buffering
How much buffering is sufficient?
B𝐿 𝑠B
Set buffer length B by estimating the Lyapunov exponent
of the underlying random dynamical system
Ye, Ma, Qian, arXiv 2017
Minibatch = Set of Subsequences
Subsequences are correlated! Reduces efficiency
B𝐿 𝑠B
Mitigating Subsequence Correlations
Minimum gap: ν
Based on
2nd largest eig(A)
B𝐿 𝑠B
Resulting Gradient Approximation
Plugs into
SG-MCMC
theory
B𝐿 𝑠B
Ion Channel Analysis – Segmentations
716.19 sec 7245.14 sec2124.45 sec
44.05 sec 138.51 sec 466.82 sec
1 MHz recording of single alamethicin channel
Our dataset: 209,634 observations
BatchGrad-
MCMC
SG-MCMC
[Rosenstein et al. 2013]
Ion Channel Analysis – Estimation
Consequence of non-i.i.d. data
& importance of buffering
DiagonallyDominantReversedCycles
log predictive prob || A − Atrue ||F
Thank You
Irreversibility:
Increasing Sampler Efficiency
When Q(z) is non-zero, process is irreversible
– i.e., time-reversed process is statistically
distinguishable from forward process
– Saw greater efficiency for such processes
(e.g., HMC, Riemann HMC, SGHMC, SGRHMC,…)
Reversibility: Continuous Dynamics
Skew-symmetric
Hwang, Hwang-Ma and Cheu (1993, 2005); Rey-Bellet and Spiliopoulos (2014)
Reversibility: Jump Processes
Reversibility
Irreversibility
explores distribution in a directed manner
asymmetric, cyclic motion
Chen, Lovasz and Pak (1999); Diaconis, Holmes and Neal (2000); Bierkens(2015)
symmetric dynamics
Reversibility: Jump Processes
Reversibility
Irreversibility
Chen, Lovasz and Pak (1999); Diaconis, Holmes and Neal (2000); Bierkens(2015)
Easy!
e.g., Metropolis-Hastings
Hard
Example: Metropolis-Hastings
Propose sample using
proposal dist. q
Calculate
accept-reject
ratio
Ratio of reverse
and forward
proposals
Accept or reject
Irreversible Jump: Naïve Revision
f
f
g
Forward proposal
Reverse proposal
Leads to wrong stationary distribution
Naïve Irreversible
What can we do?
For continuous dynamic part:
SDE(f(z))
Defines diffusion
Hard to specify!
What can we do?
For continuous dynamic part:
SDE(D(z),Q(z))
Defines
irreversibility
PSD Skew-sym
For jump part:
MJP(W(z|x))
Transition
kernel
Hard to specify!
MH is one choice,
but reversible
What can we do?
For continuous dynamic part:
SDE(D(z),Q(z))
Defines
irreversibility
PSD Skew-sym
For jump part:
MJP(S(x,z),A(x,z))
Antisymmetric
kernel
Symmetric
kernel
Defines
irreversibility
“Only” need
Ma, Fox, Chen, Wu, arXiv 2016.
Correcting the Algorithm
f
f
g
Naïve Irreversible
Correcting the Algorithm
Jump to
adjoint process
upon rejection
Introduce auxiliary variable zp ~ U{1,-1}:
zp = 1
zp = -1
Lifting Method: Turitsyn, Chertkov and Vucelja (2011)
Simple Irreversible Jump Algorithm
Keep track of
process in use
Flip the process
Can show:
Ma, Fox, Chen, Wu, arXiv 2016.
Simple Irreversible Jump Algorithm
Possible choice:
f
g
f
g
Ma, Fox, Chen, Wu, arXiv 2016.
zp
Proposal
rejected
(z,1) (z*,1)1
(z*,1)2
(z*,1)3
(z,-1)
zp
(z,1) (z1,1)
(z4,-1)
(z2,1)
(z2,-1)
(z3,-1)
Moving to Higher Dimensions
zp
In 1D, there’s only one choice
of direction: In higher dimensions:
1. Let
uniformly distributed
on unit ball
2. Flip sign of
upon rejection
3. After multiple rejections,
resample
Approach 1:
jump process
with π
invariant
Approach 2:
continuous
dynamics with π
invariant
Combined:
irreversible MALA
Combining Approach 1 & 2
Irreversible MALA Algorithm
Metropolis Adjusted Langevin Algorithm (MALA) algorithm: Xifara, Sherlock ,
Livingstone, Byrne and Girolami (2014); Zig-Zag: Bierkens and Roberts (2016)
Continuous dynamical process
(e.g., irreversible SDE)
Adjoint process
Use as
f(z|y)
Use as
g(z|y)
Can compute for one
step of dynamics
Thank You

More Related Content

What's hot

論文紹介 Adaptive metropolis algorithm using variational bayesian
論文紹介 Adaptive metropolis algorithm using variational bayesian論文紹介 Adaptive metropolis algorithm using variational bayesian
論文紹介 Adaptive metropolis algorithm using variational bayesianShuuji Mihara
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Tom Hubregtsen
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraData Science Milan
 
論文紹介 Probabilistic sfa for behavior analysis
論文紹介 Probabilistic sfa for behavior analysis論文紹介 Probabilistic sfa for behavior analysis
論文紹介 Probabilistic sfa for behavior analysisShuuji Mihara
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routingbutest
 
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...IRJET Journal
 
Real Time most famous algorithms
Real Time most famous algorithmsReal Time most famous algorithms
Real Time most famous algorithmsAndrea Tino
 
An improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningAn improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningIJRES Journal
 
MCMCベースレンダリング入門
MCMCベースレンダリング入門MCMCベースレンダリング入門
MCMCベースレンダリング入門Hisanari Otsu
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel AlgorithmsHeman Pathak
 

What's hot (20)

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
論文紹介 Adaptive metropolis algorithm using variational bayesian
論文紹介 Adaptive metropolis algorithm using variational bayesian論文紹介 Adaptive metropolis algorithm using variational bayesian
論文紹介 Adaptive metropolis algorithm using variational bayesian
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
論文紹介 Probabilistic sfa for behavior analysis
論文紹介 Probabilistic sfa for behavior analysis論文紹介 Probabilistic sfa for behavior analysis
論文紹介 Probabilistic sfa for behavior analysis
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routing
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
A Novel Route Optimized Cluster Based Routing Protocol for Pollution Controll...
 
Real Time most famous algorithms
Real Time most famous algorithmsReal Time most famous algorithms
Real Time most famous algorithms
 
MPC
MPCMPC
MPC
 
An improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioningAn improved fading Kalman filter in the application of BDS dynamic positioning
An improved fading Kalman filter in the application of BDS dynamic positioning
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
MCMCベースレンダリング入門
MCMCベースレンダリング入門MCMCベースレンダリング入門
MCMCベースレンダリング入門
 
Slideshare
SlideshareSlideshare
Slideshare
 
Kalman Filter
Kalman FilterKalman Filter
Kalman Filter
 
E0812730
E0812730E0812730
E0812730
 
Dn nday3&4
Dn nday3&4Dn nday3&4
Dn nday3&4
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
 

Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Stochastic Gradient MCMC for Independent & Correlated Data - Yian Ma, Dec 11, 2017

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusAdvanced-Concepts-Team
 
2. System Simulation modeling unit i
2. System Simulation modeling unit i2. System Simulation modeling unit i
2. System Simulation modeling unit iAmita Gautam
 
Power System Dynamics & Stability Overview & Electromagnetic Transients
Power System Dynamics & Stability Overview  &  Electromagnetic TransientsPower System Dynamics & Stability Overview  &  Electromagnetic Transients
Power System Dynamics & Stability Overview & Electromagnetic TransientsPower System Operation
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...butest
 
Presentation iswc
Presentation iswcPresentation iswc
Presentation iswcSydGillani
 

Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Stochastic Gradient MCMC for Independent & Correlated Data - Yian Ma, Dec 11, 2017 (20)

How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
HMC and NUTS
HMC and NUTSHMC and NUTS
HMC and NUTS
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Real time traffic management - challenges and solutions
Real time traffic management - challenges and solutionsReal time traffic management - challenges and solutions
Real time traffic management - challenges and solutions
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
presentation_btp
presentation_btppresentation_btp
presentation_btp
 
Quantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel KusQuantum Gaussian Processes - Gawel Kus
Quantum Gaussian Processes - Gawel Kus
 
2. System Simulation modeling unit i
2. System Simulation modeling unit i2. System Simulation modeling unit i
2. System Simulation modeling unit i
 
Power System Dynamics & Stability Overview & Electromagnetic Transients
Power System Dynamics & Stability Overview  &  Electromagnetic TransientsPower System Dynamics & Stability Overview  &  Electromagnetic Transients
Power System Dynamics & Stability Overview & Electromagnetic Transients
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
Presentation iswc
Presentation iswcPresentation iswc
Presentation iswc
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Stochastic Gradient MCMC for Independent & Correlated Data - Yian Ma, Dec 11, 2017

  • 1. Stochastic Gradient MCMC for Independent and Correlated Data Yi-An (Yian) Ma University of California, Berkeley Tianqi ChenEmily Fox Nick Foti Felix Ye
  • 2. Issue = Efficiency multiple modes strongly correlated across dimensions parameters observations complex Goal: Large-Data Posteriors
  • 3. Goal: Large-Data Posteriors largeparameters observations Issue = Scalability E.g.: Wikipedia corpus analysis, Human genome sequence, Ion channel recordings
  • 4. Classical Approach: MCMC via Jump Processes “Standard” method = Metropolis-Hastings (MH) – Propose θ’ from kernel depending on past value θ – Accept or reject Example of a jump process Often, inefficiently explores posterior
  • 5. Continuous dynamic based samplers aka Grad-MCMC Use (stochastic) dynamics on energy landscape to simulate distant proposal
  • 6. Example: Hamiltonian Monte Carlo (HMC) Hamiltonian (total energy H) Target posterior of θ Add auxiliary “momentum” variable r Focus on Potential Energy Kinetic Energy
  • 7. Simulate Hamiltonian Dynamics Use Hamiltonian dynamics to collect samples on a fixed (continuous-time) interval
  • 8. . . . Resample Momentum for Ergodicity
  • 9. Continuous Dynamic-Based Samplers Dynamic 1 Dynamic 2 Dynamic 3 invariant under these dynamics Simulated samples are desired posterior samples Want: (+ ergodicity)
  • 10. Is there a general recipe for construction? All Continuous Markov Processes Exploring “Correct” Dynamics Processes with ps(θ) = π(θ) Langevin dynamics (LD) HMC Riemann Manifold HMC Riemann Manifold LD
  • 11. Assume target distribution A Recipe for Continuous Dynamics MCMC d-dim Wiener process parameters auxiliary vars Target total energy Ma, Chen, Fox, NIPS 2015. SDE PSD D skew-sym Q total energy H is invariant
  • 12. Assume target distribution parameters auxiliary vars Target total energy Ma, Chen, Fox, NIPS 2015. A Recipe for Continuous Dynamics MCMC
  • 13. Recipe is Complete All Continuous Markov Processes ps(z) = π(z) SDE defined by D(z), Q(z) Ma, Chen, Fox, NIPS 2015. All existing samplers can be written in framework: – HMC – Riemann HMC – LD – Riemann LD Any valid sampler has a D and Q in our framework
  • 14. Continuous dynamic based samplers aka Grad-MCMC The nitty gritty practical issues
  • 15. A Practical Algorithm Consider ε – discretization Some discretization error m steps Construct irreversible MH to correct this bias
  • 16. Example: Metropolis-Hastings Propose sample using proposal dist. q Calculate accept-reject ratio Ratio of reverse and forward proposals Accept or reject
  • 17. Irreversible Jump: Naïve Revision f f g Forward proposal Reverse proposal Leads to wrong stationary distribution Naïve Irreversible
  • 18. Correcting the Algorithm Jump to adjoint process upon rejection Introduce auxiliary variable zp ~ U{1,-1}: zp = 1 zp = -1 I-Jump Algorithm: Ma, Fox, Chen, Wu, arXiv 2016.
  • 19. Irreversible MALA Algorithm Continuous dynamical process (e.g., irreversible SDE) Adjoint process Use as f(z|y) Use as g(z|y) Compute for one step of dynamics I-MALA Algorithm: Ma, Fox, Chen, Wu, arXiv 2016. Q → - Q
  • 20. Consider ε – discretization m steps Must compute gradient! – Computations costly for large data – Cannot handle streaming data DATA A Practical Algorithm
  • 21. Scaling Grad-MCMC: Handling large datasets efficiently
  • 22. Scaling up: Stochastic Gradients Compute noisy gradient based on minibatch consisting of n i.i.d. observations: – For minibatches sampled uniformly at random from data, Only requires examining n data points True gradientNoisy gradient DATAand we assume (appealing to CLT):
  • 23. Scalable Version of Algorithm Original update rule: Modified stochastic gradient update rule: As , SG noise decreases and bias  0 Use small, finite in practice (allow some bias) Subtract estimate of variance of SG noise
  • 24. Example D and Q of Past Algorithms D(z) Q(z) (D, Q) Space not previously explored SGHMC: SGLD: SGRLD: SGNHT: SGLD SGRLD
  • 25. • Use existing D, Q building blocks to define new samplers • SGRHMC existence previously only speculated – Naïve(ish) approach has wrong stationary distribution • Take D and Q of SGHMC and make state dependent SGRiemannHMC Ma, Chen, Fox, NIPS 2015. State-dependent pos. def. matrix
  • 26. Perplexity Iteration SGLD SGHMC SGRHMC SGRLD Applied SGRHMC (using Fisher info metric) to online LDA – Latent Dirichlet allocation (LDA) = mixed membership document model Scraped Wikipedia entries in a streaming manner – Each entry was analyzed on-the-fly Streaming Wikipedia Analysis Ma, Chen, Fox, NIPS 2015. Step size selected via grid search
  • 27. Scaling Approach 2: Stochastic Gradient MCMC for Correlated datasets
  • 28. Scaling up: Stochastic Gradients Compute noisy gradient based on minibatch consisting of n i.i.d. observations: – For minibatches sampled uniformly at random from data, Only requires examining n data points DATA Welling, Teh, ICML 2011. Chen, Fox, Guestrin, ICML 2014. and we assume (appealing to CLT):
  • 29. Scaling up: Stochastic Gradients Compute noisy gradient based on minibatch consisting of n i.i.d. observations: – For minibatches sampled uniformly at random from data, Only requires examining n data points DATA Welling, Teh, ICML 2011. Chen, Fox, Guestrin, ICML 2014. and we assume (appealing to CLT): i.i.d. What about non i.i.d data? E.g., time series: : OK . . . . How? :
  • 30. Hidden Markov Models (HMMs) discrete state sequence observations transition probabilities, observation parameters
  • 31. Batch learning for HMMs: A quick review
  • 32. Batch Learning for HMMs Use current to form local state beliefs: – Propagate info forwards to form
  • 33. Use current to form local state beliefs: – Propagate info backwards Batch Learning for HMMs
  • 34. Batch Learning for HMMs Combine to form smoothed local state belief:
  • 35. Given local beliefs, update global parameter Batch Learning for HMMs Issue: Cost is O(K2T) per global update! Costly when using uninformed initializations or observations are redundant
  • 36. Minibatch learning for HMMs via SG-MCMC
  • 37. Why is this not so straightforward? SG-MCMC assumes continuous parameter space – Typical HMM MCMC algorithms iterate on (sampling) latent discrete-valued state sequence Need to prove that correct stationary distribution is maintained in presence of: – Incomplete observations in subsequences – Mutually correlated subsequences per minibatch Ma, Foti, Fox, ICML 2017
  • 40. Rewriting in terms of a specific subsequence … …
  • 41. Rewriting in terms of a specific subsequence Random dynamical system: Synchronizes as Ye, Ma, Qian, arXiv 2017
  • 42. Potential energy for Grad-MCMC
  • 43. Issues with the gradient computation q,π calculations involve touching (nearly) all T obs! sum over all subsequences
  • 44. A Stochastic Gradient Approach q,π calculations involve touching (nearly) all T obs! sum over all subsequences 𝑠
  • 45. Approximating Gradient Terms with Buffering 𝑠
  • 47. How much buffering is sufficient? B𝐿 𝑠B Set buffer length B by estimating the Lyapunov exponent of the underlying random dynamical system Ye, Ma, Qian, arXiv 2017
  • 48. Minibatch = Set of Subsequences Subsequences are correlated! Reduces efficiency B𝐿 𝑠B
  • 49. Mitigating Subsequence Correlations Minimum gap: ν Based on 2nd largest eig(A) B𝐿 𝑠B
  • 50. Resulting Gradient Approximation Plugs into SG-MCMC theory B𝐿 𝑠B
  • 51. Ion Channel Analysis – Segmentations 716.19 sec 7245.14 sec2124.45 sec 44.05 sec 138.51 sec 466.82 sec 1 MHz recording of single alamethicin channel Our dataset: 209,634 observations BatchGrad- MCMC SG-MCMC [Rosenstein et al. 2013]
  • 52. Ion Channel Analysis – Estimation
  • 53. Consequence of non-i.i.d. data & importance of buffering DiagonallyDominantReversedCycles log predictive prob || A − Atrue ||F
  • 56. When Q(z) is non-zero, process is irreversible – i.e., time-reversed process is statistically distinguishable from forward process – Saw greater efficiency for such processes (e.g., HMC, Riemann HMC, SGHMC, SGRHMC,…) Reversibility: Continuous Dynamics Skew-symmetric Hwang, Hwang-Ma and Cheu (1993, 2005); Rey-Bellet and Spiliopoulos (2014)
  • 57. Reversibility: Jump Processes Reversibility Irreversibility explores distribution in a directed manner asymmetric, cyclic motion Chen, Lovasz and Pak (1999); Diaconis, Holmes and Neal (2000); Bierkens(2015) symmetric dynamics
  • 58. Reversibility: Jump Processes Reversibility Irreversibility Chen, Lovasz and Pak (1999); Diaconis, Holmes and Neal (2000); Bierkens(2015) Easy! e.g., Metropolis-Hastings Hard
  • 59. Example: Metropolis-Hastings Propose sample using proposal dist. q Calculate accept-reject ratio Ratio of reverse and forward proposals Accept or reject
  • 60. Irreversible Jump: Naïve Revision f f g Forward proposal Reverse proposal Leads to wrong stationary distribution Naïve Irreversible
  • 61. What can we do? For continuous dynamic part: SDE(f(z)) Defines diffusion Hard to specify!
  • 62. What can we do? For continuous dynamic part: SDE(D(z),Q(z)) Defines irreversibility PSD Skew-sym For jump part: MJP(W(z|x)) Transition kernel Hard to specify! MH is one choice, but reversible
  • 63. What can we do? For continuous dynamic part: SDE(D(z),Q(z)) Defines irreversibility PSD Skew-sym For jump part: MJP(S(x,z),A(x,z)) Antisymmetric kernel Symmetric kernel Defines irreversibility “Only” need Ma, Fox, Chen, Wu, arXiv 2016.
  • 65. Correcting the Algorithm Jump to adjoint process upon rejection Introduce auxiliary variable zp ~ U{1,-1}: zp = 1 zp = -1 Lifting Method: Turitsyn, Chertkov and Vucelja (2011)
  • 66. Simple Irreversible Jump Algorithm Keep track of process in use Flip the process Can show: Ma, Fox, Chen, Wu, arXiv 2016.
  • 67. Simple Irreversible Jump Algorithm Possible choice: f g f g Ma, Fox, Chen, Wu, arXiv 2016.
  • 70. Moving to Higher Dimensions zp In 1D, there’s only one choice of direction: In higher dimensions: 1. Let uniformly distributed on unit ball 2. Flip sign of upon rejection 3. After multiple rejections, resample
  • 71. Approach 1: jump process with π invariant Approach 2: continuous dynamics with π invariant Combined: irreversible MALA Combining Approach 1 & 2
  • 72. Irreversible MALA Algorithm Metropolis Adjusted Langevin Algorithm (MALA) algorithm: Xifara, Sherlock , Livingstone, Byrne and Girolami (2014); Zig-Zag: Bierkens and Roberts (2016) Continuous dynamical process (e.g., irreversible SDE) Adjoint process Use as f(z|y) Use as g(z|y) Can compute for one step of dynamics