SlideShare a Scribd company logo
Stochastic Discriminative EM (sdEM) 
Andr´es R. Masegosayz 
y Dept. of Computer and Information Science z Dept. of Computer Science and A. I. 
Norwegian University of Science and Technology University of Granada 
Trondheim, Norway Granada, Spain 
andres@idi.ntnu.no andrew@decsai.ugr.es 
Abstract 
Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic 
generative models belonging to the natural exponential family. In this work, we introduce and justify this algorithm 
as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the 
parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic gen-erative 
models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood 
and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability 
distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either 
when being learned or when making predictions. The performance of this method is illustrated by several text 
classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are 
learned using different discriminative loss functions, and in a online manner. 
1 Introduction 
2 Models & Assumptions 
Y denotes the random variable (continuous, discrete or vector-value random variable) to be predicted, 
X denotes the observable predictive variables and Z the non-observable or hidden variables. 
 The generative data model belongs to a natural exponential family 
p(y; z; xj) / exp(hs(y; z; x); i  Al()) 
where  2  is the natural parameter, s(y; z; x) 2 S is the vector of sufficient statistics and Al is 
the log partition function. 
 A conjugate prior distribution 
p(j) / exp(hs(); i  Ag()) 
where the sufficient statistics are s() = (;Al()) and the hyperparameter  has two components 
(; ).  is a positive scalar and  is a vector. 
 Transformation from the expectation parameter  = E[s(y; z; x)j] to the natural parameter 
 expressed as is available in closed form. Or, equivalently, the maximum likelihood parameters 
associated to a given sufficient statistics can be computed in closed form. 
3 The sdEM Algorithm 
Learning Settings 
A data set D with n observations f(y1; x1); : : : ; (yn; xn)g and discriminative loss function `(yi; xi; ). 
Our learning problem consists in minimizing the following objective function: 
L() = 
Xn 
i=1 
`(yi; xi; )  ln p(j) = E [`(y; x; )j]  
1 
n 
ln p(j) = E 
 
`(y; x; )j 
 
where  is the empirical distribution of the data D. 
The stochastic updating equation of sdEM 
sdEM can be interpreted as a stochastic gradient descent algorithm iterating over the expectation 
parameters and guided by the natural gradient in its Riemannian space 
t+1 = t  tI(t)1@ `(yt; xt; (t)) 
@ 
Theorem 1. In a natural exponential family, the natural gradient of a loss function with respect to the 
expectation parameters equals the gradient of the loss function with respect to the natural parameters, 
I()1@ `(y; x; ()) 
@ 
= 
@ `(y; x; ) 
@ 
Pseudo-Code Description of sdEM 
Require: D is randomly shuffled. 
1: 0 = ; (initialize according to the prior) 
2: 0 = (0); 
3: t = 0; 
4: repeat 
5: for i = 1; : : : ; n do 
6: E-Step: t+1 = t  1 
(1+t) 
@ `(yi;xi;t) 
@ ; 
7: Check-Step: t+1 = Check(t+1; S); 
8: M-Step: t+1 = (t+1); 
9: t = t + 1; 
10: end for 
11: until convergence 
12: return (t); 
Recent results in information geometry [25] show that sdEM could also be interpreted as a mirror 
descent algorithm or proximal gradient method with a Bregman divergence as a proximitiy measure. 
4 Discriminative Loss Functions 
Negative conditional log-likelihood (NCLL) 
This loss function (which is valid for classification, regression, multi-label, etc.) is defined as follows: 
`NCLL(yt; xt; ) = ln p(ytjxt; ) = ln 
Z 
p(yt; z; xtj)dz + ln 
Z 
p(y; z; xtj)dydz 
And its gradient is computed as 
@`NCLL(yt; xt; ) 
@ 
= Ez[s(yt; z; xt)j] + Eyz[s(y; z; xt)j] 
The Hinge loss (Hinge) for probabilistic generative models 
We build on LeCun et al.’s [21] ideas about energy-based learning for prediction problems. We define 
the hinge loss (which is only valid for classification) as follows 
`Hinge(yt; xt; ) = max(0; 1  ln 
p(yt; xtj) 
p(yt; xtj) 
) (1) 
where yt denotes here too the most offending incorrect answer, yt = arg maxy6=yt p(y; xtj). 
The gradient of this loss function can be simply computed as follows 
@`Hinge(yt; xt; ) 
@ 
= 
8 
: 
0 if ln p(yt;xtj) 
p(yt;xtj)  1 
Ez[s(yt; z; xt)j] + Ez[s(yt; z; xt)j] otherwise 
sdEM updating equations for partially observed data 
NLL t+1 = (1  t(1 + n 
))t + t 
 
Ez[s(yt; z; xt)j(t)] + 1n 
 
 
NCLL t+1 = (1  t 
n 
)t + t 
 
Ez[s(yt; z; xt)j(t)]  Eyz[s(y; z; xt)j(t)] + 1n 
 
 
Hinge t+1 = (1  t 
n 
)t + t 
8 : 
1n 
 if ln 
R 
R p(yt;z;xtj)dz 
p(yt;z;xtj)dz 
 1 
Ez[s(yt; z; xt)j(t)] 
Ez[s(yt; z; xt)j(t)] + 1n 
 
otherwise 
where yt = arg maxy6=yt 
R 
p(y; z; xtj)dz 
5 Experimental Analysis 
Simulated Data 
Density 
−10 −5 0 5 10 
0.00 0.05 0.10 0.15 0.20 
NLL y=1 
NLL y=−1 
NCLL y=1 
NCLL y=−1 
Density 
−10 −5 0 5 10 
0.00 0.05 0.10 0.15 0.20 
NLL y=1 
NLL y=−1 
Hinge y=1 
Hinge y=−1 
(a) NCLL Loss (b) Hinge Loss 
Discriminative Learning with Multinomial-NB and LDA 
NCLL−MNB Hinge−MNB 
Epoch 
50 60 70 80 90 100 
Accuracy 
r52 
20ng 
Cade 
85 90 95 100 
NLL−LDA NCLL−LDA Hinge−LDA sLDA 
web−kb 
reuters−R8 
1 3 5 7 9 11 13 15 17 19 0 1 2 3 4 5 6 7 8 9 
Epoch 
Accuracy 
(a) Discriminative Multinomial Naive Bayes (b) Discriminative Online LDA 
Acknowledgements 
This work has been partially funded from the European Union’s Seventh Framework Programme for 
research, technological development and demonstration under grant agreement no 619209 (AMIDST 
project).

More Related Content

What's hot

On the lambert w function
On the lambert w functionOn the lambert w function
On the lambert w function
TrungKienVu3
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]
Palak Sanghani
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
Gaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine LearningGaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine Learning
butest
 

What's hot (20)

Maple
MapleMaple
Maple
 
Chapter 9 ds
Chapter 9 dsChapter 9 ds
Chapter 9 ds
 
On the lambert w function
On the lambert w functionOn the lambert w function
On the lambert w function
 
Interp lagrange
Interp lagrangeInterp lagrange
Interp lagrange
 
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsDivide and Conquer - Part II - Quickselect and Closest Pair of Points
Divide and Conquer - Part II - Quickselect and Closest Pair of Points
 
algorithm Unit 3
algorithm Unit 3algorithm Unit 3
algorithm Unit 3
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]
 
algorithm Unit 5
algorithm Unit 5 algorithm Unit 5
algorithm Unit 5
 
Integration presentation
Integration presentationIntegration presentation
Integration presentation
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
 
algorithm unit 1
algorithm unit 1algorithm unit 1
algorithm unit 1
 
gfg
gfggfg
gfg
 
Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1Rabbit challenge 3 DNN Day1
Rabbit challenge 3 DNN Day1
 
Machine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regressionMachine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regression
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
 
Gaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine LearningGaussian Processes: Applications in Machine Learning
Gaussian Processes: Applications in Machine Learning
 
Langrange Interpolation Polynomials
Langrange Interpolation PolynomialsLangrange Interpolation Polynomials
Langrange Interpolation Polynomials
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Machine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision TreesMachine Learning and Data Mining - Decision Trees
Machine Learning and Data Mining - Decision Trees
 

Viewers also liked

Viewers also liked (18)

Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
Selective Gaussian Naïve Bayes Model for Diffuse Large-B-Cell Lymphoma Classi...
 
Effects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy PredictionEffects of Highly Agreed Documents in Relevancy Prediction
Effects of Highly Agreed Documents in Relevancy Prediction
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
 
Learning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait lociLearning classifiers from discretized expression quantitative trait loci
Learning classifiers from discretized expression quantitative trait loci
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
Bagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification NoiseBagging Decision Trees on Data Sets with Classification Noise
Bagging Decision Trees on Data Sets with Classification Noise
 
Varying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilitiesVarying parameter in classification based on imprecise probabilities
Varying parameter in classification based on imprecise probabilities
 
Evaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy predictionEvaluating query-independent object features for relevancy prediction
Evaluating query-independent object features for relevancy prediction
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
A Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification TreesA Bayesian Random Split to Build Ensembles of Classification Trees
A Bayesian Random Split to Build Ensembles of Classification Trees
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
 
An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...An interactive approach for cleaning noisy observations in Bayesian networks ...
An interactive approach for cleaning noisy observations in Bayesian networks ...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
Application of a Selective Gaussian Naïve Bayes Model for Diffuse-Large B-Cel...
 

Similar to Conference poster 6

2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
Zheng Mengdi
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
Marco Moldenhauer
 
Paper computer
Paper computerPaper computer
Paper computer
bikram ...
 
Paper computer
Paper computerPaper computer
Paper computer
bikram ...
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Language
maggiexyz
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Alexander Decker
 

Similar to Conference poster 6 (20)

QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
sada_pres
sada_pressada_pres
sada_pres
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
SLC 2015 talk improved version
SLC 2015 talk improved versionSLC 2015 talk improved version
SLC 2015 talk improved version
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
Paper computer
Paper computerPaper computer
Paper computer
 
Paper computer
Paper computerPaper computer
Paper computer
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution Algorithms
 
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution AlgorithmsSimplified Runtime Analysis of Estimation of Distribution Algorithms
Simplified Runtime Analysis of Estimation of Distribution Algorithms
 
Project2
Project2Project2
Project2
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Language
 
Informe laboratorio n°1
Informe laboratorio n°1Informe laboratorio n°1
Informe laboratorio n°1
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 

Recently uploaded

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Conference poster 6

  • 1. Stochastic Discriminative EM (sdEM) Andr´es R. Masegosayz y Dept. of Computer and Information Science z Dept. of Computer Science and A. I. Norwegian University of Science and Technology University of Granada Trondheim, Norway Granada, Spain andres@idi.ntnu.no andrew@decsai.ugr.es Abstract Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the natural exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic gen-erative models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either when being learned or when making predictions. The performance of this method is illustrated by several text classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are learned using different discriminative loss functions, and in a online manner. 1 Introduction 2 Models & Assumptions Y denotes the random variable (continuous, discrete or vector-value random variable) to be predicted, X denotes the observable predictive variables and Z the non-observable or hidden variables. The generative data model belongs to a natural exponential family p(y; z; xj) / exp(hs(y; z; x); i Al()) where 2 is the natural parameter, s(y; z; x) 2 S is the vector of sufficient statistics and Al is the log partition function. A conjugate prior distribution p(j) / exp(hs(); i Ag()) where the sufficient statistics are s() = (;Al()) and the hyperparameter has two components (; ). is a positive scalar and is a vector. Transformation from the expectation parameter = E[s(y; z; x)j] to the natural parameter expressed as is available in closed form. Or, equivalently, the maximum likelihood parameters associated to a given sufficient statistics can be computed in closed form. 3 The sdEM Algorithm Learning Settings A data set D with n observations f(y1; x1); : : : ; (yn; xn)g and discriminative loss function `(yi; xi; ). Our learning problem consists in minimizing the following objective function: L() = Xn i=1 `(yi; xi; ) ln p(j) = E [`(y; x; )j] 1 n ln p(j) = E `(y; x; )j where is the empirical distribution of the data D. The stochastic updating equation of sdEM sdEM can be interpreted as a stochastic gradient descent algorithm iterating over the expectation parameters and guided by the natural gradient in its Riemannian space t+1 = t tI(t)1@ `(yt; xt; (t)) @ Theorem 1. In a natural exponential family, the natural gradient of a loss function with respect to the expectation parameters equals the gradient of the loss function with respect to the natural parameters, I()1@ `(y; x; ()) @ = @ `(y; x; ) @ Pseudo-Code Description of sdEM Require: D is randomly shuffled. 1: 0 = ; (initialize according to the prior) 2: 0 = (0); 3: t = 0; 4: repeat 5: for i = 1; : : : ; n do 6: E-Step: t+1 = t 1 (1+t) @ `(yi;xi;t) @ ; 7: Check-Step: t+1 = Check(t+1; S); 8: M-Step: t+1 = (t+1); 9: t = t + 1; 10: end for 11: until convergence 12: return (t); Recent results in information geometry [25] show that sdEM could also be interpreted as a mirror descent algorithm or proximal gradient method with a Bregman divergence as a proximitiy measure. 4 Discriminative Loss Functions Negative conditional log-likelihood (NCLL) This loss function (which is valid for classification, regression, multi-label, etc.) is defined as follows: `NCLL(yt; xt; ) = ln p(ytjxt; ) = ln Z p(yt; z; xtj)dz + ln Z p(y; z; xtj)dydz And its gradient is computed as @`NCLL(yt; xt; ) @ = Ez[s(yt; z; xt)j] + Eyz[s(y; z; xt)j] The Hinge loss (Hinge) for probabilistic generative models We build on LeCun et al.’s [21] ideas about energy-based learning for prediction problems. We define the hinge loss (which is only valid for classification) as follows `Hinge(yt; xt; ) = max(0; 1 ln p(yt; xtj) p(yt; xtj) ) (1) where yt denotes here too the most offending incorrect answer, yt = arg maxy6=yt p(y; xtj). The gradient of this loss function can be simply computed as follows @`Hinge(yt; xt; ) @ = 8 : 0 if ln p(yt;xtj) p(yt;xtj) 1 Ez[s(yt; z; xt)j] + Ez[s(yt; z; xt)j] otherwise sdEM updating equations for partially observed data NLL t+1 = (1 t(1 + n ))t + t Ez[s(yt; z; xt)j(t)] + 1n NCLL t+1 = (1 t n )t + t Ez[s(yt; z; xt)j(t)] Eyz[s(y; z; xt)j(t)] + 1n Hinge t+1 = (1 t n )t + t 8 : 1n if ln R R p(yt;z;xtj)dz p(yt;z;xtj)dz 1 Ez[s(yt; z; xt)j(t)] Ez[s(yt; z; xt)j(t)] + 1n otherwise where yt = arg maxy6=yt R p(y; z; xtj)dz 5 Experimental Analysis Simulated Data Density −10 −5 0 5 10 0.00 0.05 0.10 0.15 0.20 NLL y=1 NLL y=−1 NCLL y=1 NCLL y=−1 Density −10 −5 0 5 10 0.00 0.05 0.10 0.15 0.20 NLL y=1 NLL y=−1 Hinge y=1 Hinge y=−1 (a) NCLL Loss (b) Hinge Loss Discriminative Learning with Multinomial-NB and LDA NCLL−MNB Hinge−MNB Epoch 50 60 70 80 90 100 Accuracy r52 20ng Cade 85 90 95 100 NLL−LDA NCLL−LDA Hinge−LDA sLDA web−kb reuters−R8 1 3 5 7 9 11 13 15 17 19 0 1 2 3 4 5 6 7 8 9 Epoch Accuracy (a) Discriminative Multinomial Naive Bayes (b) Discriminative Online LDA Acknowledgements This work has been partially funded from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 (AMIDST project).