SlideShare a Scribd company logo
EXACT MCMC ON BIGDATA: 
THE TIP OF AN ICEBERG 
University of Helsinki 
Gianvito Siciliano 
(2014 - Probabilistic Models for Big Data Seminar)
AGENDA 
1. MCMC intro: 
• Bayesian Inference 
• Sampling methods (Gibbs, MH) 
2. MCMC and Big Data 
• Issues 
• Approximate solutions (SGLD, SGFS, MH Test) 
3. Firefly Monte Carlo 
4. Conclusions
BAYESIAN MODELING 
• To obtain quantities of interest from the posterior we usually need to engage with an 
integral in this form: 
• The problem is that these integrals are usually impossible to evaluate analytically 
• Bayes rule allows us to express the posterior over parameters in terms of the prior 
and likelihood terms: 
P(✓|X) / 
NY 
i=1 
P(xi|✓)P(✓)
MCMC 
• Monte Carlo: simulation to draw 
quantities of interest from the 
distribution 
• Markov Chain: stochastic process in 
which future states are independent of 
past states given the present state. 
• Hence, MCMC is a class of method in 
which we can simulate draws that are 
slightly dependent and are 
approximately from posterior 
distribution.
HOW TO SAMPLE? 
In Bayesian statistics, there are generally two algorithms that you can use (to allow 
pseudo-random sampling from a distribution) 
Gibbs Sampler 
Metropolis-Hastings algorithm. 
Used to sample from a joint distribution, if 
we knew the full conditional distributions 
for each parameter 
JD = p(θ1, . . . , θk ) 
The full conditional distribution is the 
distribution of the parameter conditional on 
the known information and all the other 
parameters: 
FCD = p(θj|θ−j, X) 
Used when… 
• the posterior doesn’t look like any distribution 
we know (no conjugacy) 
• the posterior consists of more than 2 
parameters (grid approximations 
intractable) 
• some (or all) of the full conditionals do not 
look like any distributions we know (no 
Gibbs sampling for those whose full 
conditionals we don’t know)
Gibbs Sampler 
1. 
Pick a vector of starting values θ(0). 
2. 
Start with any θ (order does not matter). 
Draw a value θ1(1) from the full conditional p(θ1 | θ2(0), θ3(0), y). 
3. 
Draw a value θ2(1) (again order does not matter) from the full 
conditional p(θ2 | θ1(1), θ3(0), y). Note that we must use the 
updated value of θ1(1). 
4. 
Repeat (for all parameters) until we get M draws, with each draw 
being a vector θ(t). 
5. 
Optional burn-in and/or thinning.
MH Algorithm 
1. 
Choose a starting value θ(0). 
2. 
At iteration t, draw a candidate θ(∗) from a jumping distribution 
Jt(θ∗ | θ(t−1)). 
3. 
Compute an acceptance ratio conditioned: 
r = p(θ∗|y)/Jt(θ∗|θ(t−1)) / p(θ(t−1)|y)/Jt(θ(t−1)|θ∗) 
4. Accept θ∗ as θ(t) with probability min(r,1). 
If θ∗ is not accepted, then θ(t) = θ(t−1). 
5. Repeat steps 2-4 M times to get M draws from p(θ | y), with optional 
burn-in and/or thinning.
MH Algorithm 
1. 
Choose a starting value θ(0). 
2. 
At iteration t, draw a candidate θ(∗) from a jumping distribution 
Jt(θ∗ | θ(t−1)). 
3. 
Compute an acceptance ratio conditioned: 
r = p(θ∗|y) / p(θ(t−1)|y) 
4. Accept θ∗ as θ(t) with probability min(r,1). 
If θ∗ is not accepted, then θ(t) = θ(t−1). 
5. Repeat steps 2-4 M times to get M draws from p(θ | y), with optional 
burn-in and/or thinning.
MCMC and BIG DATA 
Propose: ✓0 ⇠ Q(✓0|✓) 
Accept with Prob. ↵ = min 
" 
1, 
• Canonical MCMC algorithm proposed samples from a distribution Q and 
accept/reject the proposals with a rule that need to examine the likelihood 
of all data-items 
• All the data are processed at each iteration, 
run-time may be excessive! 
Q(✓|✓0)P(✓0) 
QN 
i=1 P(xi|✓0) 
Q(✓0|✓)P(✓) 
QN 
i=1 P(xi|✓) 
# 
If accept=True: ✓ ✓0
MCMC APPROXIMATE SOLUTIONS FOR BIG DATA 
IDEA 
• Assume that you have T units of computation to achieve the lowest 
possible error. 
• Your MCMC procedure has a knob to control the bias/variance 
tradeoff 
So, during the sampling phase… 
Turn left => SLOW: small bias, high variance 
Turn right => FAST: strong bias, low variance
SGLD & SGFS: knob = stepsize 
Stochastic Gradient Langevin Dynamics 
Langevin dynamics based on stochastic gradients 
[W. & Teh, ICML 2011] 
• The idea is to expand Stochastic Gradient descend optimization algorithm to include gaussian noise with Langevin Dynamics. 
• One of the advantages of SGLD is that the entire data sets should never be saved into memory 
• Disadvantages: 
• it has to read from external data each iteration 
• gradients are computationally expensive 
• it uses a proper pre-conditions matrix to decide the size step of the transaction operator. 
Stochastic Gradient Fisher Scoring 
[Ahn, et al, ICML 2012] 
Built on SGLD and it tries to beat its predecessor by offering a three phase procedure: 
1. Burn-in: large stepsize. 
2. Reached distribution: still large stepsize and samples from the asymptotic gaussian approximation of the posterior. 
3. Further annealing: smaller stepsize to generate increasingly accurate samples from the true posterior. 
• With this approach the algorithm tries to reduce the bias in burn-in phase and then starts sampling to reduce variance.
MH TEST: knob = confidence 
CUTTING THE MH ALGORITHM BUDGET 
[Korattikara et al, ICML 1023] 
…by conducing sequential hypothesis tests to decide whether accept/reject a given sample and find the majority of these 
decision based on a small fraction of the data 
• Works directly on the rule-step of MH algorithm 
• Accept a proposal with a given confidence 
• Applicable to problem where is impossible to compute gradient
FIREFLY EXACT SOLUTION 
ISSUE 1: prohibitive cost of evaluating every likelihood terms at every iteration (for a 
big data-sets) 
ISSUE 2: latter procedures construct an approximated transition operator (using 
subsets of data) 
GOAL: obtain an exact procedures, that leaves the true full-data posterior distribution 
invariant! 
HOW: by querying only the likelihood of a potentially small subset of the data at each 
iteration yet simulates from the exact posterior 
IDEA: introduce a collection of Bernoulli variables that turn on (and off) the data for 
which calculate the likelihoods
FLYMC: HOW IT WORKS 
Assuming we have: 
1. Target Distribution 2. Likelihood function 
Compute all N likelihoods at every iteration is a bottleneck! 
3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 
5. Each zn has the following Bernoulli Distribution (conditioned) 
6. And augment the posterior with these N vars
FLYMC: HOW IT WORKS 
Assuming we have: 
1. Target Distribution 2. Likelihood function 
Compute all N likelihoods at every iteration is a bottleneck! 
3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 
5. Each zn has the following Bernoulli Distribution (conditioned) 
6. And augment the posterior with these N vars 
Why Exact? 
} the marginal distrib. is still the correct posterior 
given in equation 1
FLYMC: HOW IT WORKS 
Assuming we have: 
1. Target Distribution 2. Likelihood function 
Compute all N likelihoods at every iteration is a bottleneck! 
3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 
5. Each zn has the following Bernoulli Distribution (conditioned) 
6. And augment the posterior with these N vars 
Why Firefly? 
} from this joint distrib. evaluate only those 
likelihood terms for which zn = 1 (light terms)
FLYMC: THE REDUCED SPACE 
• We simulate the Markov 
chain on the zn space: 
zn = 0 => Dark point (no likelihoods computed) 
zn = 1 => Light point (likelihoods computed) 
{ 
• If the Markov chain 
will tend to occupy zn = 0
ALGORITHM IMPL.
FLYMC: LOWER BOUND 
The lower bound Bn(θ) of each data point’s likelihood Ln(θ), should 
satisfy 2 properties: 
• Tightness, to determine the number of bright data points (M is the average): 
• It must be easy to compute the product (using scale exponential-family lower bounds) 
With this setting, we achieve speedup of N/M, from O(ND) ev. time of regular MCMC
MAP-OPTIMISATION 
…in order to find an Approximate Maximum a Posteriori value of θ and to construct Bn to 
be tight there. 
The proposed algorithm versions (used in the experiments) are: 
• Untuned FlyMC, with the choice of ε = 1.5 for all data points. 
• MAP-tuned FlyMC that performs a gradient descent optimization to find an ε value for 
each data points. (This last way allows to obtain a nearer bounds to the posteriori value of 
θ). 
• Regular full-posterior MCMC (for comparison)
EXPERIMENTS 
Expectation: 
• slower in mixing 
• faster in iterating 
Results: 
• FlyMC offers a speedup of at 
least one order of magnitude 
compared with reg. MCMC
CONCLUSIONS 
FlyMC is an exact procedures that has the true full-posterior as its target 
The introduction of the binary latent variables is a simple and efficient idea 
The lower bound is a requirement, and it can be difficult to obtain for many 
problems
Acknoledgements 
Dr. Antti Honkela 
Dr. Arto Klami 
Reviewers
Thank you! 
(gianvito.siciliano@gmail.com)

More Related Content

What's hot

CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
Krishnan MuthuManickam
 
Unit i
Unit iUnit i
Unit i
guna287176
 
Unit 3
Unit 3Unit 3
Unit 3
guna287176
 
Unit 2
Unit 2Unit 2
Unit 2
guna287176
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
Monika Choudhery
 
Business Logistics Assignment Help
Business Logistics Assignment HelpBusiness Logistics Assignment Help
Business Logistics Assignment Help
Statistics Homework Helper
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
홍배 김
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
Vikas Sharma
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
Matlab Assignment Experts
 
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
홍배 김
 
test pre
test pretest pre
test pre
farazch
 
Simulation in terminated system
Simulation in terminated system Simulation in terminated system
Simulation in terminated system
Saleem Almaqashi
 
Analysis of Algorithm
Analysis of AlgorithmAnalysis of Algorithm
Analysis of Algorithm
أحلام انصارى
 
Algorithm big o
Algorithm big oAlgorithm big o
Algorithm big o
Ashim Lamichhane
 
03 notes
03 notes03 notes
Electrical Engineering Assignment Help
Electrical Engineering Assignment HelpElectrical Engineering Assignment Help
Electrical Engineering Assignment Help
Matlab Assignment Experts
 
02 Notes Divide and Conquer
02 Notes Divide and Conquer02 Notes Divide and Conquer
02 Notes Divide and Conquer
Andres Mendez-Vazquez
 
01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes
Andres Mendez-Vazquez
 
Fractal dimension versus Computational Complexity
Fractal dimension versus Computational ComplexityFractal dimension versus Computational Complexity
Fractal dimension versus Computational Complexity
Hector Zenil
 
Dynamic programming
Dynamic programmingDynamic programming

What's hot (20)

CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
 
Unit i
Unit iUnit i
Unit i
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 2
Unit 2Unit 2
Unit 2
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
Business Logistics Assignment Help
Business Logistics Assignment HelpBusiness Logistics Assignment Help
Business Logistics Assignment Help
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
test pre
test pretest pre
test pre
 
Simulation in terminated system
Simulation in terminated system Simulation in terminated system
Simulation in terminated system
 
Analysis of Algorithm
Analysis of AlgorithmAnalysis of Algorithm
Analysis of Algorithm
 
Algorithm big o
Algorithm big oAlgorithm big o
Algorithm big o
 
03 notes
03 notes03 notes
03 notes
 
Electrical Engineering Assignment Help
Electrical Engineering Assignment HelpElectrical Engineering Assignment Help
Electrical Engineering Assignment Help
 
02 Notes Divide and Conquer
02 Notes Divide and Conquer02 Notes Divide and Conquer
02 Notes Divide and Conquer
 
01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes
 
Fractal dimension versus Computational Complexity
Fractal dimension versus Computational ComplexityFractal dimension versus Computational Complexity
Fractal dimension versus Computational Complexity
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 

Similar to Firefly exact MCMC for Big Data

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
HaibinSu2
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
Md Abul Hayat
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
Stéphane Canu
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx
ssuser1fb3df
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Varad Meru
 
Talk 5
Talk 5Talk 5
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
OpenWorld6
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
Umberto Picchini
 
Particle filter
Particle filterParticle filter
Particle filter
Mohammad Reza Jabbari
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Chiheb Ben Hammouda
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
Northwestern University
 
Input analysis
Input analysisInput analysis
Input analysis
Bhavik A Shah
 
intro
introintro
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
gmorishita
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 

Similar to Firefly exact MCMC for Big Data (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx2.03.Asymptotic_analysis.pptx
2.03.Asymptotic_analysis.pptx
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Talk 5
Talk 5Talk 5
Talk 5
 
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Particle filter
Particle filterParticle filter
Particle filter
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Input analysis
Input analysisInput analysis
Input analysis
 
intro
introintro
intro
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 

More from Gianvito Siciliano

Image Classification and Retrieval on Spark
Image Classification and Retrieval on SparkImage Classification and Retrieval on Spark
Image Classification and Retrieval on Spark
Gianvito Siciliano
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
Gianvito Siciliano
 
Intro Angular Ionic
Intro Angular Ionic Intro Angular Ionic
Intro Angular Ionic
Gianvito Siciliano
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
Gianvito Siciliano
 
Social Study (project architecture review)
Social Study  (project architecture review)Social Study  (project architecture review)
Social Study (project architecture review)
Gianvito Siciliano
 
Consensus Concurrent problem
Consensus Concurrent problemConsensus Concurrent problem
Consensus Concurrent problem
Gianvito Siciliano
 
Yana - disabled assistance by google watch
Yana - disabled assistance by google watchYana - disabled assistance by google watch
Yana - disabled assistance by google watch
Gianvito Siciliano
 
Social study - Network
Social study - NetworkSocial study - Network
Social study - Network
Gianvito Siciliano
 
New interaction Technologies
New interaction TechnologiesNew interaction Technologies
New interaction Technologies
Gianvito Siciliano
 

More from Gianvito Siciliano (9)

Image Classification and Retrieval on Spark
Image Classification and Retrieval on SparkImage Classification and Retrieval on Spark
Image Classification and Retrieval on Spark
 
Image Classification and Retrieval logic
Image Classification and Retrieval logicImage Classification and Retrieval logic
Image Classification and Retrieval logic
 
Intro Angular Ionic
Intro Angular Ionic Intro Angular Ionic
Intro Angular Ionic
 
MAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine LearningMAD skills for analysis and big data Machine Learning
MAD skills for analysis and big data Machine Learning
 
Social Study (project architecture review)
Social Study  (project architecture review)Social Study  (project architecture review)
Social Study (project architecture review)
 
Consensus Concurrent problem
Consensus Concurrent problemConsensus Concurrent problem
Consensus Concurrent problem
 
Yana - disabled assistance by google watch
Yana - disabled assistance by google watchYana - disabled assistance by google watch
Yana - disabled assistance by google watch
 
Social study - Network
Social study - NetworkSocial study - Network
Social study - Network
 
New interaction Technologies
New interaction TechnologiesNew interaction Technologies
New interaction Technologies
 

Recently uploaded

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

Firefly exact MCMC for Big Data

  • 1. EXACT MCMC ON BIGDATA: THE TIP OF AN ICEBERG University of Helsinki Gianvito Siciliano (2014 - Probabilistic Models for Big Data Seminar)
  • 2. AGENDA 1. MCMC intro: • Bayesian Inference • Sampling methods (Gibbs, MH) 2. MCMC and Big Data • Issues • Approximate solutions (SGLD, SGFS, MH Test) 3. Firefly Monte Carlo 4. Conclusions
  • 3. BAYESIAN MODELING • To obtain quantities of interest from the posterior we usually need to engage with an integral in this form: • The problem is that these integrals are usually impossible to evaluate analytically • Bayes rule allows us to express the posterior over parameters in terms of the prior and likelihood terms: P(✓|X) / NY i=1 P(xi|✓)P(✓)
  • 4. MCMC • Monte Carlo: simulation to draw quantities of interest from the distribution • Markov Chain: stochastic process in which future states are independent of past states given the present state. • Hence, MCMC is a class of method in which we can simulate draws that are slightly dependent and are approximately from posterior distribution.
  • 5. HOW TO SAMPLE? In Bayesian statistics, there are generally two algorithms that you can use (to allow pseudo-random sampling from a distribution) Gibbs Sampler Metropolis-Hastings algorithm. Used to sample from a joint distribution, if we knew the full conditional distributions for each parameter JD = p(θ1, . . . , θk ) The full conditional distribution is the distribution of the parameter conditional on the known information and all the other parameters: FCD = p(θj|θ−j, X) Used when… • the posterior doesn’t look like any distribution we know (no conjugacy) • the posterior consists of more than 2 parameters (grid approximations intractable) • some (or all) of the full conditionals do not look like any distributions we know (no Gibbs sampling for those whose full conditionals we don’t know)
  • 6. Gibbs Sampler 1. Pick a vector of starting values θ(0). 2. Start with any θ (order does not matter). Draw a value θ1(1) from the full conditional p(θ1 | θ2(0), θ3(0), y). 3. Draw a value θ2(1) (again order does not matter) from the full conditional p(θ2 | θ1(1), θ3(0), y). Note that we must use the updated value of θ1(1). 4. Repeat (for all parameters) until we get M draws, with each draw being a vector θ(t). 5. Optional burn-in and/or thinning.
  • 7. MH Algorithm 1. Choose a starting value θ(0). 2. At iteration t, draw a candidate θ(∗) from a jumping distribution Jt(θ∗ | θ(t−1)). 3. Compute an acceptance ratio conditioned: r = p(θ∗|y)/Jt(θ∗|θ(t−1)) / p(θ(t−1)|y)/Jt(θ(t−1)|θ∗) 4. Accept θ∗ as θ(t) with probability min(r,1). If θ∗ is not accepted, then θ(t) = θ(t−1). 5. Repeat steps 2-4 M times to get M draws from p(θ | y), with optional burn-in and/or thinning.
  • 8. MH Algorithm 1. Choose a starting value θ(0). 2. At iteration t, draw a candidate θ(∗) from a jumping distribution Jt(θ∗ | θ(t−1)). 3. Compute an acceptance ratio conditioned: r = p(θ∗|y) / p(θ(t−1)|y) 4. Accept θ∗ as θ(t) with probability min(r,1). If θ∗ is not accepted, then θ(t) = θ(t−1). 5. Repeat steps 2-4 M times to get M draws from p(θ | y), with optional burn-in and/or thinning.
  • 9. MCMC and BIG DATA Propose: ✓0 ⇠ Q(✓0|✓) Accept with Prob. ↵ = min " 1, • Canonical MCMC algorithm proposed samples from a distribution Q and accept/reject the proposals with a rule that need to examine the likelihood of all data-items • All the data are processed at each iteration, run-time may be excessive! Q(✓|✓0)P(✓0) QN i=1 P(xi|✓0) Q(✓0|✓)P(✓) QN i=1 P(xi|✓) # If accept=True: ✓ ✓0
  • 10. MCMC APPROXIMATE SOLUTIONS FOR BIG DATA IDEA • Assume that you have T units of computation to achieve the lowest possible error. • Your MCMC procedure has a knob to control the bias/variance tradeoff So, during the sampling phase… Turn left => SLOW: small bias, high variance Turn right => FAST: strong bias, low variance
  • 11. SGLD & SGFS: knob = stepsize Stochastic Gradient Langevin Dynamics Langevin dynamics based on stochastic gradients [W. & Teh, ICML 2011] • The idea is to expand Stochastic Gradient descend optimization algorithm to include gaussian noise with Langevin Dynamics. • One of the advantages of SGLD is that the entire data sets should never be saved into memory • Disadvantages: • it has to read from external data each iteration • gradients are computationally expensive • it uses a proper pre-conditions matrix to decide the size step of the transaction operator. Stochastic Gradient Fisher Scoring [Ahn, et al, ICML 2012] Built on SGLD and it tries to beat its predecessor by offering a three phase procedure: 1. Burn-in: large stepsize. 2. Reached distribution: still large stepsize and samples from the asymptotic gaussian approximation of the posterior. 3. Further annealing: smaller stepsize to generate increasingly accurate samples from the true posterior. • With this approach the algorithm tries to reduce the bias in burn-in phase and then starts sampling to reduce variance.
  • 12. MH TEST: knob = confidence CUTTING THE MH ALGORITHM BUDGET [Korattikara et al, ICML 1023] …by conducing sequential hypothesis tests to decide whether accept/reject a given sample and find the majority of these decision based on a small fraction of the data • Works directly on the rule-step of MH algorithm • Accept a proposal with a given confidence • Applicable to problem where is impossible to compute gradient
  • 13. FIREFLY EXACT SOLUTION ISSUE 1: prohibitive cost of evaluating every likelihood terms at every iteration (for a big data-sets) ISSUE 2: latter procedures construct an approximated transition operator (using subsets of data) GOAL: obtain an exact procedures, that leaves the true full-data posterior distribution invariant! HOW: by querying only the likelihood of a potentially small subset of the data at each iteration yet simulates from the exact posterior IDEA: introduce a collection of Bernoulli variables that turn on (and off) the data for which calculate the likelihoods
  • 14. FLYMC: HOW IT WORKS Assuming we have: 1. Target Distribution 2. Likelihood function Compute all N likelihoods at every iteration is a bottleneck! 3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 5. Each zn has the following Bernoulli Distribution (conditioned) 6. And augment the posterior with these N vars
  • 15. FLYMC: HOW IT WORKS Assuming we have: 1. Target Distribution 2. Likelihood function Compute all N likelihoods at every iteration is a bottleneck! 3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 5. Each zn has the following Bernoulli Distribution (conditioned) 6. And augment the posterior with these N vars Why Exact? } the marginal distrib. is still the correct posterior given in equation 1
  • 16. FLYMC: HOW IT WORKS Assuming we have: 1. Target Distribution 2. Likelihood function Compute all N likelihoods at every iteration is a bottleneck! 3. Assume that each product term in Ln can be bounded by a cheaper lower bound: 5. Each zn has the following Bernoulli Distribution (conditioned) 6. And augment the posterior with these N vars Why Firefly? } from this joint distrib. evaluate only those likelihood terms for which zn = 1 (light terms)
  • 17. FLYMC: THE REDUCED SPACE • We simulate the Markov chain on the zn space: zn = 0 => Dark point (no likelihoods computed) zn = 1 => Light point (likelihoods computed) { • If the Markov chain will tend to occupy zn = 0
  • 19. FLYMC: LOWER BOUND The lower bound Bn(θ) of each data point’s likelihood Ln(θ), should satisfy 2 properties: • Tightness, to determine the number of bright data points (M is the average): • It must be easy to compute the product (using scale exponential-family lower bounds) With this setting, we achieve speedup of N/M, from O(ND) ev. time of regular MCMC
  • 20. MAP-OPTIMISATION …in order to find an Approximate Maximum a Posteriori value of θ and to construct Bn to be tight there. The proposed algorithm versions (used in the experiments) are: • Untuned FlyMC, with the choice of ε = 1.5 for all data points. • MAP-tuned FlyMC that performs a gradient descent optimization to find an ε value for each data points. (This last way allows to obtain a nearer bounds to the posteriori value of θ). • Regular full-posterior MCMC (for comparison)
  • 21. EXPERIMENTS Expectation: • slower in mixing • faster in iterating Results: • FlyMC offers a speedup of at least one order of magnitude compared with reg. MCMC
  • 22. CONCLUSIONS FlyMC is an exact procedures that has the true full-posterior as its target The introduction of the binary latent variables is a simple and efficient idea The lower bound is a requirement, and it can be difficult to obtain for many problems
  • 23. Acknoledgements Dr. Antti Honkela Dr. Arto Klami Reviewers