SlideShare a Scribd company logo
PRML
14.5 5560 ota
Conditional Mixture Models
p14.5.1 Mixtures of linear regression models
p14.5.2 Mixtures of logistic models
p14.5.3 Mixtures of experts
Conditional Mixture Models
Decision trees
Positive
interpretability
Negative
hard assignment
Splits: axis-aligned of the input space.
Regression:
piecewise-constant predictions
discontinuities at the split boundaries.
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Conditional Mixture Models
Positive
soft probabilistic splits of the input space
Splits : functions of all of the input variables
Negative
no interpretability
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Fig.14.9
Model A
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Model B
Model
Mixture of Experts Model (MoE)
<latexit sha1_base64="(null)">(null)</latexit>
n a fully probabilistic tree-based model
Hierarchical Mixture of Experts (HME)
<latexit sha1_base64="(null)">(null)</latexit>
Ex)
Mixtures of linear regression models
Gaussian Mixture Models
n Clustering (unsupervised Learning)
Fig. 9.5
<latexit sha1_base64="(null)">(null)</latexit>
Observed data
If k = 3
k
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Mixtures of linear regression models
<latexit sha1_base64="(null)">(null)</latexit>
n A general of switching regression
, 9 3 79 7 97 3 9 8 7 70 7 A7 1 7 3 3 7 )93 . 7
) 1 (
Linear regression
Learning the parameters
EM algorithm
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Parameters
Data set <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
E step
responsibilities
initialized <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
The expectation of the complete-data log likelihood
M step
Fixed <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Maximize the function about
the constraint
1) <latexit sha1_base64="(null)">(null)</latexit>
the mixing coefficients
Using a Lagrange multiplier method,
M step
2) <latexit sha1_base64="(null)">(null)</latexit>
the parameter of the k-th linear regression model
<latexit sha1_base64="(null)">(null)</latexit>
Linear regression model (3.12)
<latexit sha1_base64="(null)">(null)</latexit>
weighted least squares problem
If k = 3
k
<latexit sha1_base64="(null)">(null)</latexit>
n = 1
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
M step
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
the parameter of the k-th linear regression model
<latexit sha1_base64="(null)">(null)</latexit>
matrix notation
<latexit sha1_base64="(null)">(null)</latexit>
n are learned the data of the high responsibility .<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
M step
a precision parameter<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Example Fig.14.8
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
The predictive conditional density
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
No independent of the input
Mixtures of logistic models
Mixtures of logistic models
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
logistic regression
Model
Learning the parameters
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
EM algorithm
Parameters
data set <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
the complete-data log likelihood
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
}
E step
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
M step
<latexit sha1_base64="(null)">(null)</latexit>
Fixed <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Maximize the function about
<latexit sha1_base64="(null)">(null)</latexit>
the mixing coefficients
}
M step
<latexit sha1_base64="(null)">(null)</latexit>
the parameter of the k-th logistic regression model
does not have a closed-form solution.<latexit sha1_base64="(null)">(null)</latexit>
(IRLS) algorithm
<latexit sha1_base64="(null)">(null)</latexit>
n Need the gradient and Hessian
<latexit sha1_base64="(null)">(null)</latexit>
IRLS algorithm
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Gradient
Hessian
<latexit sha1_base64="(null)">(null)</latexit>
While not convergence do
<latexit sha1_base64="(null)">(null)</latexit>
A mixture of logistic regression models
true probability of the class label single logistic regression mixture of two logistic regression
<latexit sha1_base64="(null)">(null)</latexit>
Model A
Model B
Mixtures of Experts
Mixtures of experts
<latexit sha1_base64="(null)">(null)</latexit>
Gating function:
determine which expert are dominant in which region.
be represented by a linear softmax or sigmoid.
n a mixture of experts model
Expert:
can model in different regions of input space.
predict in their own region.
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Gating function
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
softmaxsigmoid
<latexit sha1_base64="(null)">(null)</latexit>
MoE Linear regression model
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
sigmoid
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
MoE
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Omg!
Hierarchical Mixture of Experts
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
HME
Negative
l large number of parameters =>Bayesian HMoE (2003)
5.6 mixture density network
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
output
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
n share the hidden units of the neural network.
n the splits of the input space are relaxed, can be nolinear!
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Bayesian Hierarchical Mixtures of Experts
Bishop, C. M. and M. Svense ́n (2003). Bayesian hierarchical mixtures of experts. In U. Kjaerulff and C. Meek (Eds.),
Proceedings Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 57–64. Morgan Kaufmann.
Application: the kinematics of robot arms
inverse problem
two pattern
Input: parameters and angles of the robot arm.
Output : the end effector of the robot arm .
Bayesian Hierarchical Mixtures of Experts
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Expert <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Graphical Model
branch point
Probabilistic tree-based model
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
: The total number of experts <latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
HMoE Model
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
branch pointExpert
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Data set
Likelihood
Joint probability
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
⇨ Variational Inference
fin.

More Related Content

What's hot

What's hot (20)

Poset in Relations(Discrete Mathematics)
Poset in Relations(Discrete Mathematics)Poset in Relations(Discrete Mathematics)
Poset in Relations(Discrete Mathematics)
 
data structures- back tracking
data structures- back trackingdata structures- back tracking
data structures- back tracking
 
Relation Hasse diagram
Relation Hasse diagramRelation Hasse diagram
Relation Hasse diagram
 
Genetic Algorithms - GAs
Genetic Algorithms - GAsGenetic Algorithms - GAs
Genetic Algorithms - GAs
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Predicate calculus
Predicate calculusPredicate calculus
Predicate calculus
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Generating functions solve recurrence
Generating functions solve recurrenceGenerating functions solve recurrence
Generating functions solve recurrence
 
String matching algorithms(knuth morris-pratt)
String matching algorithms(knuth morris-pratt)String matching algorithms(knuth morris-pratt)
String matching algorithms(knuth morris-pratt)
 
2.2.ppt.SC
2.2.ppt.SC2.2.ppt.SC
2.2.ppt.SC
 
Theory of Computation
Theory of ComputationTheory of Computation
Theory of Computation
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
 
Gauss jordan
Gauss jordanGauss jordan
Gauss jordan
 
Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.Lab report for Prolog program in artificial intelligence.
Lab report for Prolog program in artificial intelligence.
 
Ring
RingRing
Ring
 
Tsp branch and-bound
Tsp branch and-boundTsp branch and-bound
Tsp branch and-bound
 
5 csp
5 csp5 csp
5 csp
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
svm classification
svm classificationsvm classification
svm classification
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression
 

Similar to PRML 条件付き混合モデル 14.5

Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Databricks
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with Scala
Nimrod Argov
 
Meta-objective Lisp @名古屋 Reject 会議
Meta-objective Lisp @名古屋 Reject 会議Meta-objective Lisp @名古屋 Reject 会議
Meta-objective Lisp @名古屋 Reject 会議
dico_leque
 

Similar to PRML 条件付き混合モデル 14.5 (20)

Minimax learning with implications on domain adaptation and adversarial attack
Minimax learning with implications on domain adaptation and adversarial attackMinimax learning with implications on domain adaptation and adversarial attack
Minimax learning with implications on domain adaptation and adversarial attack
 
Introduction To Lisp
Introduction To LispIntroduction To Lisp
Introduction To Lisp
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
 
Eta
EtaEta
Eta
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVM
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
Scaling Up: How Switching to Apache Spark Improved Performance, Realizability...
 
2006 Small Scheme
2006 Small Scheme2006 Small Scheme
2006 Small Scheme
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
 
Hello scala
Hello scalaHello scala
Hello scala
 
Scala vs Ruby
Scala vs RubyScala vs Ruby
Scala vs Ruby
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with Scala
 
Meta-objective Lisp @名古屋 Reject 会議
Meta-objective Lisp @名古屋 Reject 会議Meta-objective Lisp @名古屋 Reject 会議
Meta-objective Lisp @名古屋 Reject 会議
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Production
 
Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReducePolyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
 

More from tmtm otm

More from tmtm otm (16)

テーブル・テキスト・画像の反実仮想説明
テーブル・テキスト・画像の反実仮想説明テーブル・テキスト・画像の反実仮想説明
テーブル・テキスト・画像の反実仮想説明
 
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -自然言語処理における深層学習を用いた予測の不確実性  - Predictive Uncertainty in NLP -
自然言語処理における深層学習を用いた予測の不確実性 - Predictive Uncertainty in NLP -
 
予測の不確かさのユーザー調査
予測の不確かさのユーザー調査予測の不確かさのユーザー調査
予測の不確かさのユーザー調査
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題
 
ICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性についてICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性について
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
PRML ベイズロジスティック回帰 4.5 4.5.2
PRML ベイズロジスティック回帰 4.5 4.5.2PRML ベイズロジスティック回帰 4.5 4.5.2
PRML ベイズロジスティック回帰 4.5 4.5.2
 
PRML 多項式曲線フィッティング 1.1
PRML 多項式曲線フィッティング 1.1PRML 多項式曲線フィッティング 1.1
PRML 多項式曲線フィッティング 1.1
 
PRML 2.3.7 2.3.9
PRML 2.3.7 2.3.9PRML 2.3.7 2.3.9
PRML 2.3.7 2.3.9
 
PRML エビデンス近似 3.5 3.6.1
PRML エビデンス近似  3.5 3.6.1PRML エビデンス近似  3.5 3.6.1
PRML エビデンス近似 3.5 3.6.1
 
PRML カーネルPCA 12.2.3 12.3
PRML カーネルPCA 12.2.3 12.3PRML カーネルPCA 12.2.3 12.3
PRML カーネルPCA 12.2.3 12.3
 
PRML BNN 5.7 5.7.3
PRML BNN 5.7 5.7.3 PRML BNN 5.7 5.7.3
PRML BNN 5.7 5.7.3
 
PRML RVM 7.2 7.2.3
PRML RVM 7.2 7.2.3PRML RVM 7.2 7.2.3
PRML RVM 7.2 7.2.3
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

PRML 条件付き混合モデル 14.5

  • 2. Conditional Mixture Models p14.5.1 Mixtures of linear regression models p14.5.2 Mixtures of logistic models p14.5.3 Mixtures of experts
  • 4. Decision trees Positive interpretability Negative hard assignment Splits: axis-aligned of the input space. Regression: piecewise-constant predictions discontinuities at the split boundaries. <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 5. Conditional Mixture Models Positive soft probabilistic splits of the input space Splits : functions of all of the input variables Negative no interpretability <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Fig.14.9 Model A <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Model B Model
  • 6. Mixture of Experts Model (MoE) <latexit sha1_base64="(null)">(null)</latexit> n a fully probabilistic tree-based model Hierarchical Mixture of Experts (HME) <latexit sha1_base64="(null)">(null)</latexit> Ex)
  • 7. Mixtures of linear regression models
  • 8. Gaussian Mixture Models n Clustering (unsupervised Learning) Fig. 9.5 <latexit sha1_base64="(null)">(null)</latexit> Observed data If k = 3 k <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 9. Mixtures of linear regression models <latexit sha1_base64="(null)">(null)</latexit> n A general of switching regression , 9 3 79 7 97 3 9 8 7 70 7 A7 1 7 3 3 7 )93 . 7 ) 1 ( Linear regression
  • 10. Learning the parameters EM algorithm <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Parameters Data set <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 11. E step responsibilities initialized <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> The expectation of the complete-data log likelihood
  • 12. M step Fixed <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Maximize the function about the constraint 1) <latexit sha1_base64="(null)">(null)</latexit> the mixing coefficients Using a Lagrange multiplier method,
  • 13. M step 2) <latexit sha1_base64="(null)">(null)</latexit> the parameter of the k-th linear regression model <latexit sha1_base64="(null)">(null)</latexit> Linear regression model (3.12) <latexit sha1_base64="(null)">(null)</latexit> weighted least squares problem If k = 3 k <latexit sha1_base64="(null)">(null)</latexit> n = 1 <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 14. M step <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> the parameter of the k-th linear regression model <latexit sha1_base64="(null)">(null)</latexit> matrix notation <latexit sha1_base64="(null)">(null)</latexit> n are learned the data of the high responsibility .<latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 15. M step a precision parameter<latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 16. Example Fig.14.8 <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 17. The predictive conditional density <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> No independent of the input
  • 19. Mixtures of logistic models <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> logistic regression Model
  • 20. Learning the parameters <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> EM algorithm Parameters data set <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> the complete-data log likelihood <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> }
  • 21. E step <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 22. M step <latexit sha1_base64="(null)">(null)</latexit> Fixed <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Maximize the function about <latexit sha1_base64="(null)">(null)</latexit> the mixing coefficients }
  • 23. M step <latexit sha1_base64="(null)">(null)</latexit> the parameter of the k-th logistic regression model does not have a closed-form solution.<latexit sha1_base64="(null)">(null)</latexit> (IRLS) algorithm <latexit sha1_base64="(null)">(null)</latexit> n Need the gradient and Hessian <latexit sha1_base64="(null)">(null)</latexit>
  • 24. IRLS algorithm <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Gradient Hessian <latexit sha1_base64="(null)">(null)</latexit> While not convergence do <latexit sha1_base64="(null)">(null)</latexit>
  • 25. A mixture of logistic regression models true probability of the class label single logistic regression mixture of two logistic regression <latexit sha1_base64="(null)">(null)</latexit> Model A Model B
  • 27. Mixtures of experts <latexit sha1_base64="(null)">(null)</latexit> Gating function: determine which expert are dominant in which region. be represented by a linear softmax or sigmoid. n a mixture of experts model Expert: can model in different regions of input space. predict in their own region. <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 28. Gating function <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> softmaxsigmoid <latexit sha1_base64="(null)">(null)</latexit>
  • 29. MoE Linear regression model <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> sigmoid <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> MoE <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Omg!
  • 30. Hierarchical Mixture of Experts <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> HME Negative l large number of parameters =>Bayesian HMoE (2003)
  • 31. 5.6 mixture density network <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> output <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> n share the hidden units of the neural network. n the splits of the input space are relaxed, can be nolinear! <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 32. Bayesian Hierarchical Mixtures of Experts Bishop, C. M. and M. Svense ́n (2003). Bayesian hierarchical mixtures of experts. In U. Kjaerulff and C. Meek (Eds.), Proceedings Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 57–64. Morgan Kaufmann. Application: the kinematics of robot arms inverse problem two pattern Input: parameters and angles of the robot arm. Output : the end effector of the robot arm .
  • 33. Bayesian Hierarchical Mixtures of Experts <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Expert <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Graphical Model branch point
  • 34. Probabilistic tree-based model <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> : The total number of experts <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>
  • 35. HMoE Model <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> branch pointExpert <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> Data set Likelihood
  • 36. Joint probability <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit> ⇨ Variational Inference
  • 37. fin.