SlideShare a Scribd company logo
d-VMP:
Distributed Variational Message Passing
Andrés R. Masegosa1, Ana M. Martínez2,
Helge Langseth1, Thomas D. Nielsen2, Antonio Salmerón3,
Darío Ramos-López3, Anders L. Madsen2,4
1Department of Computer Science, Aalborg University, Denmark
2 Department of Computer and Information Science,
The Norwegian University of Science and Technology, Norway
3Department of Mathematics, University of Almería, Spain
4 Hugin Expert A/S, Aalborg, Denmark
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 2
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 3
Motivation
Large
Imbalance
? values
Complex distributions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 4
Motivation
Large
Imbalance
? values
Complex distributions
Attribute range
Density
0 1 2 3 4 5 6
0123456
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 5
Motivation
Goal: learn a generative model for a finantial dataset to monitor the
customers and make predictions for a single customer.
Xi Hi
θα
i = 1, . . . , N
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 6
Popular existing approach: SVI
Stochastic Variational Inference: iteratively updates the model
parameters based on subsampled data batches.
No estimation of all local hidden variables of the model.
No generation of lower bound.
Poor fit if batch of data is not representative from all data.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 7
Our contribution:
d-VMP: a distributed message passing scheme.
Defined for a broader class of models (than SVI).
Better and faster convergence results compared to SVI.
Posterior over all local latent variables and lower bound.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 8
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 9
Models:
Bayesian learning on iid. data using conjugate exponential BN models:
ln p(X) = ln hX + sX · η − AX (η)
Xi Hi
θα
i = 1, . . . , N
We want to calculate p(θ, H|D).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 10
Variational Inference:
Approximate p(θ, H|D) (often intractable) by finding tractable
posterior distributions q ∈ Q by minimizing:
min
q(θ,H)∈Q
KL(q(θ, H)|p(θ, H|D)),
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 11
Variational Inference:
Approximate p(θ, H|D) (often intractable) by finding tractable
posterior distributions q ∈ Q by minimizing:
min
q(θ,H)∈Q
KL(q(θ, H)|p(θ, H|D)),
In the mean field variational approach, Q is assumed to fully
factorize:
q(θ, H) =
M
k=1
q(θk)
N
i=1
J
j=1
q(Hi,j ),
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 12
Variational Inference:
Variational Inference exploits:
ln P(D)
constant
= L(q(θ, H))
Maximize
+ KL(q(θ, H)|p(θ, H|D))
Minimize
,
Iterative coordinate ascent of the variational distributions.
Updates in the variational distribution of a variable only
involves variables in its Markov blanket.
Coordinate ascent algorithm formulated as a message passing
scheme.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 13
Variational Message Passing, VMP:
Message from parent to child: moment parameters
(expectation of the sufficient statistics).
Message from child to parent: natural parameters
(based on the messages received from the co-parents).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 14
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 15
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t)(θ) is broadcasted to all the slave nodes.
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 16
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)(H) = arg maxq(H) L(q(H), q(t)(θ))
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 17
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)(Hn) = arg maxq(Hn) Ln(q(Hn), q(t)(θ))
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 18
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)
(θ) = arg max
q(θ)
L(q(t)
(H), q(θ))
May be coupled
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 19
Candidate solutions:
Resort to a generalized mean-field approximation as SVI: does
not factorize over the global parameters.
Prohibitive for models with a large number of global (coupled)
parameters, e.g. linear regression.
Our proposal: VMP as a distributed projected natural
gradient ascent algorithm (PGNA).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 20
d-VMP as a projected natural gradient ascent
Insight 1: VMP can be expressed as a projected natural
gradient ascent algorithm.
η
(t+1)
X = η
(t)
X + ρX,t[ ˆ ηL(η(t)
)]+
X (1)
[·] is the projection operator.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 21
d-VMP as a projected natural gradient ascent
Insight 2: The natural gradient of the lower bound can be
expressed as follows:
ˆ ηθ
L = mPa(θ)→θ + mHi →θ
The gradient can be computed in parallel.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 22
d-VMP as a projected natural gradient ascent
Insight 3: Global parameters are “coupled” only if they belong
to each other’s Markov blanket.
Define a disjoint partition of the global parameters:
R = {J1, . . . , JS }
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 23
d-VMP as a projected natural gradient ascent
d-VMP is based on performing independent global updates
over the global parameters of each partition:
η
(t+1)
Jr
= η
(t)
Jr
+ ρr,t[ ˆ ηL(η(t)
)]+
Jr
ρr,t is the learning rate. If |Jr | = 1 then ρr,t = 1.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 24
dVMP as a distributed PNGA algorithm:
Master
θα
η
(t+1)
Jr
X
H
θ1
i = 1, . . . , N
Slave 1
η
(t+1)
1
X
H
θ2
i = 1, . . . , N
Slave 2
η
(t+1)
2
X
H
θ3
i = 1, . . . , N
Slave 3
η
(t+1)
3
η
(t)
Jr
+ ρr,t[ˆηL(η(t)
)]+
Jr
For all Jr ∈ R
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 25
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 26
Model fit to the data
Xij HijYi
Hi θ
α
j = 1, . . . , J
i = 1, . . . , N
Attribute range
Density
0 1 2 3 4 5 6
0123456
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Representative sample of 55K clients (N) and 33 attributes (J).
“Unrolled” model of more than 3.5M nodes (75% latent variables).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 27
Model fit to the data
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 28
Model fit to the data
0 500 1000 1500 2000
−1.5e+07−1.0e+07−5.0e+060.0e+00
Time (seconds)
Globallowerbound
Alg. BS(data%)/LR
SVI 1%/0.55
SVI 1%/0.75
SVI 1%/0.99
SVI 5%/0.55
SVI 5%/0.75
SVI 5%/0.99
SVI 10%/0.55
SVI 10%/0.75
SVI 10%/0.99
d−VMP
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 29
Test marginal log-likelihood
BS (% data) LR Log-Likel.
SVI
1 %
0.55 -180902.87
0.75 -298564.03
0.99 -426979.52
5 %
0.55 -177302.24
0.75 -333264.16
0.99 -628105.70
10 %
0.55 -347035.22
0.75 -397525.45
0.99 -538087.13
d-VMP 1.0 67265.34
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 30
Mixtures of learnt posteriors for one attribute
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 31
Scalability settings
Generated data set of 42 million samples per client and 12
variables.
“Unrolled” model of more than 1 billion (109) nodes
(75% latent variables).
AMIDST Toolbox with Apache Flink.
Amazon Web Services (AWS) as distributed computing
environment.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 32
Scalability results
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 33
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 34
Conclusions
Variational methods can be scaled using distributed
computation instead of sampling techniques.
Bayesian learning in model with more than 1 billion nodes
(75% of hidden).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 35
Thank you for your attention
Questions?
You can download our open source Java toolbox:
amidsttoolbox.com
Acknowledgments: This project has received funding from the European Union’s
Seventh Framework Programme for research, technological development and
demonstration under grant agreement no 619209
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 36

More Related Content

What's hot

Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
The Statistical and Applied Mathematical Sciences Institute
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
Francesco Casalegno
 
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Guillaume Costeseque
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
jfrchicanog
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Guillaume Costeseque
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
jfrchicanog
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
Ralph Schlosser
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
Guillaume Costeseque
 
ABC-Xian
ABC-XianABC-Xian
ABC-XianDeb Roy
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Christian Robert
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
Christian Robert
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
Christian Robert
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
The Statistical and Applied Mathematical Sciences Institute
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)
Feynman Liang
 
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML
 
Dcm
DcmDcm
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jiří Šmída
 

What's hot (19)

Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Planted Clique Research Paper
Planted Clique Research PaperPlanted Clique Research Paper
Planted Clique Research Paper
 
Randomization
RandomizationRandomization
Randomization
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)
 
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
 
Dcm
DcmDcm
Dcm
 
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
 

Viewers also liked

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
AMIDST Toolbox
 
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
Chad Norman
 
Healthcare fraud detection
Healthcare fraud detectionHealthcare fraud detection
Healthcare fraud detection
Mahdi Esmailoghli
 
Successful Uses of R in Banking
Successful Uses of R in BankingSuccessful Uses of R in Banking
Successful Uses of R in Banking
Revolution Analytics
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
Andrea Dal Pozzolo
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
k.surya kumar
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
Nitesh Kumar
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and R
Ayan Das
 

Viewers also liked (11)

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
 
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
 
Healthcare fraud detection
Healthcare fraud detectionHealthcare fraud detection
Healthcare fraud detection
 
Successful Uses of R in Banking
Successful Uses of R in BankingSuccessful Uses of R in Banking
Successful Uses of R in Banking
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and R
 

Similar to d-VMP: Distributed Variational Message Passing (PGM2016)

One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Work-Bench
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
The Statistical and Applied Mathematical Sciences Institute
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
AMIDST Toolbox
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
Axel de Romblay
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
James McMurray
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
Anna Carbone
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
Guillaume Costeseque
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Feynman Liang
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
Umberto Picchini
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
QCD Phase Diagram
QCD Phase DiagramQCD Phase Diagram
QCD Phase Diagram
RomanHllwieser
 
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysisAnomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
khinsen
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
Konstantinos Giannakis
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
Masahiro Suzuki
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 

Similar to d-VMP: Distributed Variational Message Passing (PGM2016) (20)

SASA 2016
SASA 2016SASA 2016
SASA 2016
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Final Report-1-(1)
Final Report-1-(1)Final Report-1-(1)
Final Report-1-(1)
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
QCD Phase Diagram
QCD Phase DiagramQCD Phase Diagram
QCD Phase Diagram
 
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysisAnomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 

d-VMP: Distributed Variational Message Passing (PGM2016)

  • 1. d-VMP: Distributed Variational Message Passing Andrés R. Masegosa1, Ana M. Martínez2, Helge Langseth1, Thomas D. Nielsen2, Antonio Salmerón3, Darío Ramos-López3, Anders L. Madsen2,4 1Department of Computer Science, Aalborg University, Denmark 2 Department of Computer and Information Science, The Norwegian University of Science and Technology, Norway 3Department of Mathematics, University of Almería, Spain 4 Hugin Expert A/S, Aalborg, Denmark Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1
  • 2. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 2
  • 3. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 3
  • 4. Motivation Large Imbalance ? values Complex distributions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 4
  • 5. Motivation Large Imbalance ? values Complex distributions Attribute range Density 0 1 2 3 4 5 6 0123456 ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 5
  • 6. Motivation Goal: learn a generative model for a finantial dataset to monitor the customers and make predictions for a single customer. Xi Hi θα i = 1, . . . , N Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 6
  • 7. Popular existing approach: SVI Stochastic Variational Inference: iteratively updates the model parameters based on subsampled data batches. No estimation of all local hidden variables of the model. No generation of lower bound. Poor fit if batch of data is not representative from all data. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 7
  • 8. Our contribution: d-VMP: a distributed message passing scheme. Defined for a broader class of models (than SVI). Better and faster convergence results compared to SVI. Posterior over all local latent variables and lower bound. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 8
  • 9. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 9
  • 10. Models: Bayesian learning on iid. data using conjugate exponential BN models: ln p(X) = ln hX + sX · η − AX (η) Xi Hi θα i = 1, . . . , N We want to calculate p(θ, H|D). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 10
  • 11. Variational Inference: Approximate p(θ, H|D) (often intractable) by finding tractable posterior distributions q ∈ Q by minimizing: min q(θ,H)∈Q KL(q(θ, H)|p(θ, H|D)), Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 11
  • 12. Variational Inference: Approximate p(θ, H|D) (often intractable) by finding tractable posterior distributions q ∈ Q by minimizing: min q(θ,H)∈Q KL(q(θ, H)|p(θ, H|D)), In the mean field variational approach, Q is assumed to fully factorize: q(θ, H) = M k=1 q(θk) N i=1 J j=1 q(Hi,j ), Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 12
  • 13. Variational Inference: Variational Inference exploits: ln P(D) constant = L(q(θ, H)) Maximize + KL(q(θ, H)|p(θ, H|D)) Minimize , Iterative coordinate ascent of the variational distributions. Updates in the variational distribution of a variable only involves variables in its Markov blanket. Coordinate ascent algorithm formulated as a message passing scheme. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 13
  • 14. Variational Message Passing, VMP: Message from parent to child: moment parameters (expectation of the sufficient statistics). Message from child to parent: natural parameters (based on the messages received from the co-parents). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 14
  • 15. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 15
  • 16. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t)(θ) is broadcasted to all the slave nodes. maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 16
  • 17. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1)(H) = arg maxq(H) L(q(H), q(t)(θ)) maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 17
  • 18. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1)(Hn) = arg maxq(Hn) Ln(q(Hn), q(t)(θ)) maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 18
  • 19. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1) (θ) = arg max q(θ) L(q(t) (H), q(θ)) May be coupled maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 19
  • 20. Candidate solutions: Resort to a generalized mean-field approximation as SVI: does not factorize over the global parameters. Prohibitive for models with a large number of global (coupled) parameters, e.g. linear regression. Our proposal: VMP as a distributed projected natural gradient ascent algorithm (PGNA). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 20
  • 21. d-VMP as a projected natural gradient ascent Insight 1: VMP can be expressed as a projected natural gradient ascent algorithm. η (t+1) X = η (t) X + ρX,t[ ˆ ηL(η(t) )]+ X (1) [·] is the projection operator. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 21
  • 22. d-VMP as a projected natural gradient ascent Insight 2: The natural gradient of the lower bound can be expressed as follows: ˆ ηθ L = mPa(θ)→θ + mHi →θ The gradient can be computed in parallel. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 22
  • 23. d-VMP as a projected natural gradient ascent Insight 3: Global parameters are “coupled” only if they belong to each other’s Markov blanket. Define a disjoint partition of the global parameters: R = {J1, . . . , JS } Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 23
  • 24. d-VMP as a projected natural gradient ascent d-VMP is based on performing independent global updates over the global parameters of each partition: η (t+1) Jr = η (t) Jr + ρr,t[ ˆ ηL(η(t) )]+ Jr ρr,t is the learning rate. If |Jr | = 1 then ρr,t = 1. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 24
  • 25. dVMP as a distributed PNGA algorithm: Master θα η (t+1) Jr X H θ1 i = 1, . . . , N Slave 1 η (t+1) 1 X H θ2 i = 1, . . . , N Slave 2 η (t+1) 2 X H θ3 i = 1, . . . , N Slave 3 η (t+1) 3 η (t) Jr + ρr,t[ˆηL(η(t) )]+ Jr For all Jr ∈ R maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 25
  • 26. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 26
  • 27. Model fit to the data Xij HijYi Hi θ α j = 1, . . . , J i = 1, . . . , N Attribute range Density 0 1 2 3 4 5 6 0123456 ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Representative sample of 55K clients (N) and 33 attributes (J). “Unrolled” model of more than 3.5M nodes (75% latent variables). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 27
  • 28. Model fit to the data Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 28
  • 29. Model fit to the data 0 500 1000 1500 2000 −1.5e+07−1.0e+07−5.0e+060.0e+00 Time (seconds) Globallowerbound Alg. BS(data%)/LR SVI 1%/0.55 SVI 1%/0.75 SVI 1%/0.99 SVI 5%/0.55 SVI 5%/0.75 SVI 5%/0.99 SVI 10%/0.55 SVI 10%/0.75 SVI 10%/0.99 d−VMP Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 29
  • 30. Test marginal log-likelihood BS (% data) LR Log-Likel. SVI 1 % 0.55 -180902.87 0.75 -298564.03 0.99 -426979.52 5 % 0.55 -177302.24 0.75 -333264.16 0.99 -628105.70 10 % 0.55 -347035.22 0.75 -397525.45 0.99 -538087.13 d-VMP 1.0 67265.34 Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 30
  • 31. Mixtures of learnt posteriors for one attribute ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 31
  • 32. Scalability settings Generated data set of 42 million samples per client and 12 variables. “Unrolled” model of more than 1 billion (109) nodes (75% latent variables). AMIDST Toolbox with Apache Flink. Amazon Web Services (AWS) as distributed computing environment. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 32
  • 33. Scalability results Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 33
  • 34. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 34
  • 35. Conclusions Variational methods can be scaled using distributed computation instead of sampling techniques. Bayesian learning in model with more than 1 billion nodes (75% of hidden). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 35
  • 36. Thank you for your attention Questions? You can download our open source Java toolbox: amidsttoolbox.com Acknowledgments: This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 36