SlideShare a Scribd company logo
1 of 34
Download to read offline
Structured Probabilistic
Models for Deep Learning
Lecture slides for Chapter 16 of Deep Learning
www.deeplearningbook.org
Ian Goodfellow
2016-10-04
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Tasks for Generative Models
• Density estimation
• Denoising
• Sample generation
• Missing value imputation
• Conditional sample generation
• Conditional density estimation
(Goodfellow 2017)
Samples from a BEGAN
(Berthelot et al, 2017)
Images are 128 pixels wide, 128 pixels tall
R, G, and B pixel at each location.
(Goodfellow 2017)
Cost of Tabular Approach
g P(x) by storing
quires kn param
not feasible for s
Number of values per variable
Number of variables
For BEGAN faces: 256
For BEGAN faces:
128 ⇥ 128 = 16384
There are roughly ten to the power of forty thousand
times more points in the discretized domain of the BEGAN
face model than there are atoms in the universe.
(Goodfellow 2017)
Tabular Approach is Infeasible
• Memory: cannot store that many parameters
• Runtime: inference and sampling are both slow
• Statistical efficiency: extremely high number of
parameters requires extremely high number of
training examples
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Insight of Model Structure
• Most variables influence each other
• Most variables do not influence each other directly
• Describe influence with a graph
• Edges represent direct influence
• Paths represent indirect influence
• Computational and statistical savings come from omissions
of edges
(Goodfellow 2017)
Directed Models
ED PROBABILISTIC MODELS FOR DEEP LEA
t0
t0 t1
t1 t2
t2
Alice Bob Carol
phical model depicting the relay race example. A
shing time t1, because Bob does not get to start
arol only gets to start running after Bob finis
fluences Carol’s finishing time t2.
Figure 16.2
elay race example from section 16.1, suppose we name
Bob’s finishing time t1, and Carol’s finishing time t2.
mate of t1 depends on t0. Our estimate of t2 depends
rectly on t0. We can draw this relationship in a directed
d in figure 16.2.
raphical model defined on variables x is defined by a
hose vertices are the random variables in the model, and
al probability distributions p(xi | PaG(xi)), where
of xi in G. The probability distribution over x is given
p(x) = ⇧ip(xi | PaG(xi)). (16.1)
le, this means that, using the graph drawn in figure 16.2,
, t1, t2) = p(t0)p(t1 | t0)p(t2 | t1). (16.2)
e t0, Bob’s finishing time t1, and Carol’s finishing time t2.
ur estimate of t1 depends on t0. Our estimate of t2 depends
y indirectly on t0. We can draw this relationship in a directed
strated in figure 16.2.
ted graphical model defined on variables x is defined by a
h G whose vertices are the random variables in the model, and
itional probability distributions p(xi | PaG(xi)), where
rents of xi in G. The probability distribution over x is given
p(x) = ⇧ip(xi | PaG(xi)). (16.1)
xample, this means that, using the graph drawn in figure 16.2,
p(t0, t1, t2) = p(t0)p(t1 | t0)p(t2 | t1). (16.2)
time seeing a structured probabilistic model in action. We
t of using it, to observe how structured modeling has many
Directed models work best when influence
clearly flows in one direction
(Goodfellow 2017)
Undirected Models
RUCTURED PROBABILISTIC MODELS FOR DEE
hr
hr hy
hy hc
hc
undirected graph representing how your roomma
r work colleague’s health hc affect each other. You
other with a cold, and you and your work colleagu
Undirected models work best when influence
has no clear direction or is best modeled as
flowing in both directions
Do you have a cold?
Does your
roommate have a
cold?
Does your work
colleague have a
cold?
(Goodfellow 2017)
Undirected Models
ph G. For each clique C in the graph,3 a factor (C)
al) measures the affinity of the variables in that clique
ssible joint states. The factors are constrained to be
efine an unnormalized probability distribution
p̃(x) = ⇧C2G (C). (16.3)
bility distribution is efficient to work with so long as
encodes the idea that states with higher affinity are
in a Bayesian network, there is little structure to the
here is nothing to guarantee that multiplying them
obability distribution. See figure 16.4 for an example
mation from an undirected graph.
unction
ability distribution is guaranteed to be nonnegative
teed to sum or integrate to 1. To obtain a valid
must use the corresponding normalized probability
p(x) =
1
Z
p̃(x), (16.4)
esults in the probability distribution summing or
Z =
Z
p̃(x)dx. (16.5)
unction
ability distribution is guaranteed to be nonnegative
teed to sum or integrate to 1. To obtain a valid
must use the corresponding normalized probability
p(x) =
1
Z
p̃(x), (16.4)
esults in the probability distribution summing or
Z =
Z
p̃(x)dx. (16.5)
Unnormalized probability
Partition function
(Goodfellow 2017)
Separation
R 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN
a s b a s b
(a) (b)
6: (a) The path between random variable a and random variable b t
cause s is not observed. This means that a and b are not separated.
in, to indicate that it is observed. Because the only path between
and that path is inactive, we can conclude that a and b are separat
Separation and D-Separation
When s is not observed,
influence can flow from a
to b and vice versa through s.
When s is observed,
it blocks the flow of
influence between a
and b: they are
separated
(Goodfellow 2017)
Separation example
HAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN
a
b c
d
igure 16.7: An example of reading separation properties from an undirected gr
is shaded to indicate that it is observed. Because observing b blocks the only
to c, we say that a and c are separated from each other given b. The observ
so blocks one path between a and d, but there is a second, active path betw
herefore, a and d are not separated given b.
The nodes a and c are separated
One path between a and d is still active,
though the other path is blocked, so these
two nodes are not separated.
(Goodfellow 2017)
d-separation
The flow of influence is more complicated for directed models
The path between a and b is active for all of these graphs:
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a s b
a s b a s b
(a) (b)
a
s
b a s b
c
(c) (d)
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a s b
a s b a s b
(a) (b)
a
s
b a s b
c
(c) (d)
Figure 16.8: All the kinds of active paths of length two that can exist between random
variables a and b. (a) Any path with arrows proceeding directly from a to b or vice versa.
This kind of path becomes blocked if s is observed. We have already seen this kind of
path in the relay race example. (b) Variables a and b are connected by a common cause s.
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a s b
a s b a s b
(a) (b)
a
s
b a s b
c
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a s b
a s b a s b
(a) (b)
a
s
b a s b
c
(c) (d)
Figure 16.8: All the kinds of active paths of length two that can exist between random
(Goodfellow 2017)
d-separation example
a b
c
d e
graph, we can read out several d-separation properties. Examples
parated given the empty set
parated given c
parated given c
some variables are no longer d-separated when we observe some
d
Figure 16.9: From this graph, we can read out sever
include:
• a and b are d-separated given the empty set
• a and e are d-separated given c
• d and e are d-separated given c
We can also see that some variables are no longer
variables:
• a and b are not d-separated given c
• a and b are not d-separated given d
c
d
Figure 16.9: From this graph, we can read ou
include:
• a and b are d-separated given the empt
• a and e are d-separated given c
• d and e are d-separated given c
We can also see that some variables are no
variables:
• a and b are not d-separated given c
• a and b are not d-separated given d
Observing variables can activate paths!
(Goodfellow 2017)
A complete graph can represent
any probability distribution
R 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN
10: Examples of complete graphs, which can describe any probability d
how examples with four random variables. (Left) The complete undirec
directed case, the complete graph is unique. (Right) A complete direc
ected case, there is not a unique complete graph. We choose an orde
and draw an arc from each variable to every variable that comes afte
The benefits of graphical models come from omitting edges
(Goodfellow 2017)
Converting between graphs
• Any specific probability distribution can be
represented by either an undirected or a directed
graph
• Some probability distributions have conditional
independences that one kind of graph fails to imply
(the distribution is simpler than the graph
describes; need to know the conditional probability
distributions to see the independences)
(Goodfellow 2017)
Converting directed to
undirected
APTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
h1
h1 h2
h2 h3
h3
v1
v1 v2
v2 v3
v3
a b
c
a
c
b
h1
h1 h2
h2 h3
h3
v1
v1 v2
v2 v3
v3
a b
c
a
c
b
ure 16.11: Examples of converting directed models (top row) to undirected models
Must add an edge between
unconnected coparents
(Goodfellow 2017)
Converting undirected to
directed
ER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a b
d c
a b
d c
a b
d c
16.12: Converting an undirected model to a directed model. (Left) This und
cannot be converted to a directed model because it has a loop of length fo
ds. Specifically, the undirected model encodes two different independenc
cted model can capture simultaneously: a?c | {b, d} and b?d | {a, c}. (C
vert the undirected model to a directed model, we must triangulate the
uring that all loops of greater than length three have a chord. To do so,
No loops of
length
greater than
three allowed!
Add edges to
triangulate
long loops
Assign
directions to
edges. No
directed cycles
allowed.
(Goodfellow 2017)
Factor graphs are less
ambiguous
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
a b
c
a b
c
f1
f1
a b
c
f1
f1
f2
f2
f3
f3
Figure 16.13: An example of how a factor graph can resolve ambiguity in the interpretation
of undirected networks. (Left) An undirected network with a clique involving three
variables: a, b and c. (Center) A factor graph corresponding to the same undirected
model. This factor graph has one factor over all three variables. (Right) Another valid
actor graph for the same undirected model. This factor graph has three factors, each
over only two variables. Representation, inference, and learning are all asymptotically
heaper in this factor graph than in the factor graph depicted in the center, even though
Undirected graph: is
this three pairwise
potentials or one
potential over three
variables?
Factor graphs
disambiguate by
placing each potential
in the graph
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Sampling from directed models
• Easy and fast to draw fair samples from the whole
model
• Ancestral sampling: pass through the graph in
topological order. Sample each node given its
parents.
• Harder to sample some nodes given other nodes,
unless the observed nodes are at the start of the
topology
(Goodfellow 2017)
Sampling from undirected
models
• Usually requires Markov chains
• Usually cannot be done exactly
• Usually requires multiple iterations even to
approximate
• Described in Chapter 17
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Tabular Case
• Assume each node has a tabular distribution given its parents
• Memory, sampling, inference are now exponential in number of
variables in factor with largest scope
• For many interesting models, this is very small
• e.g., RBMs: all factor scopes are size 2 or 1
• Previously, these costs were exponential in total number of nodes
• Statistically, much easier to estimate this manageable number of
parameters
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Learning about dependencies
• Suppose we have thousands of variables
• Maybe gene expression data
• Some interact
• Some do not
• We do not know which ahead of time
(Goodfellow 2017)
Structure learning strategy
• Try out several graphs
• See which graph does best job of some criterion
• Fitting training set with small model complexity
• Fitting validation set
• Iterative search, propose new graphs similar to best
graph so far (remove edge / add edge / flip edge)
(Goodfellow 2017)
Latent variable strategy
• Use one graph structure
• Many latent variables
• Dense connections of latent variables to observed variables
• Parameters learn that each latent variable interacts
strongly with only a small subset of observed variables
• Trainable just with gradient descent; no discrete search
over graphs
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Inference and Approximate
Inference
• Inferring marginal distribution over some nodes or
conditional distribution of some nodes given other nodes
is #P hard
• NP-hardness describes decision problems. #P-
hardness describes counting problems, e.g., how many
solutions are there to a problem where finding one
solution is NP-hard
• We usually rely on approximate inference, described in
chapter 19
(Goodfellow 2017)
Roadmap
• Challenges of Unstructured Modeling
• Using Graphs to Describe Model Structure
• Sampling from Graphical Models
• Advantages of Structured Modeling
• Structure Learning and Latent Variables
• Inference and Approximate Inference
• The Deep Learning Approach to Structured Probabilistic Modeling
(Goodfellow 2017)
Deep Learning Stylistic
Tendencies
• Nodes organized into layers
• High amount of connectivity between layers
• Examples: RBMs, DBMs, GANs, VAEs
CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING
h1
h1 h2
h2 h3
h3
v1
v1 v2
v2 v3
v3
h4
h4
Figure 16.14: An RBM drawn as a Markov network.
(Goodfellow 2017)
For more information…

More Related Content

Similar to 16_graphical_models.pdf

Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2
showslidedump
 
CombinatorialModels_Poster2016 (2)
CombinatorialModels_Poster2016 (2)CombinatorialModels_Poster2016 (2)
CombinatorialModels_Poster2016 (2)
Yicheng Pu
 
Analysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 censusAnalysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 census
Shuai Yuan
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn os
alisha230390
 
Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
Angie Lee
 

Similar to 16_graphical_models.pdf (20)

17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
Slides Chapter10.1 10.2
Slides Chapter10.1 10.2Slides Chapter10.1 10.2
Slides Chapter10.1 10.2
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
IRJET- On the Generalization of Lami’s Theorem
IRJET- On the Generalization of Lami’s TheoremIRJET- On the Generalization of Lami’s Theorem
IRJET- On the Generalization of Lami’s Theorem
 
Map Coloring and Some of Its Applications
Map Coloring and Some of Its Applications Map Coloring and Some of Its Applications
Map Coloring and Some of Its Applications
 
The Power of Graphs in Immersive Communications
The Power of Graphs in Immersive CommunicationsThe Power of Graphs in Immersive Communications
The Power of Graphs in Immersive Communications
 
Chap 8 graph
Chap 8 graphChap 8 graph
Chap 8 graph
 
CombinatorialModels_Poster2016 (2)
CombinatorialModels_Poster2016 (2)CombinatorialModels_Poster2016 (2)
CombinatorialModels_Poster2016 (2)
 
Solution of the Special Case "CLP" of the Problem of Apollonius via Vector Ro...
Solution of the Special Case "CLP" of the Problem of Apollonius via Vector Ro...Solution of the Special Case "CLP" of the Problem of Apollonius via Vector Ro...
Solution of the Special Case "CLP" of the Problem of Apollonius via Vector Ro...
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
 
Introduction to Graphs
Introduction to GraphsIntroduction to Graphs
Introduction to Graphs
 
Tarea1
Tarea1Tarea1
Tarea1
 
50120130406025 2
50120130406025 250120130406025 2
50120130406025 2
 
Graphs data Structure
Graphs data StructureGraphs data Structure
Graphs data Structure
 
Analysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 censusAnalysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 census
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptx
 
Spanningtreesppt
SpanningtreespptSpanningtreesppt
Spanningtreesppt
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn os
 
Congruence between triangles
Congruence between trianglesCongruence between triangles
Congruence between triangles
 
Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
 

More from KSChidanandKumarJSSS (8)

12_applications.pdf
12_applications.pdf12_applications.pdf
12_applications.pdf
 
10_rnn.pdf
10_rnn.pdf10_rnn.pdf
10_rnn.pdf
 
14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
18_partition.pdf
18_partition.pdf18_partition.pdf
18_partition.pdf
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
13_linear_factors.pdf
13_linear_factors.pdf13_linear_factors.pdf
13_linear_factors.pdf
 

Recently uploaded

obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
yulianti213969
 
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klatenobat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
siskavia95
 
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptxwepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
BChella
 
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
awuboo
 
Sun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung daiSun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung dai
GiangTra20
 
一比一定制英国赫特福德大学毕业证学位证书
一比一定制英国赫特福德大学毕业证学位证书一比一定制英国赫特福德大学毕业证学位证书
一比一定制英国赫特福德大学毕业证学位证书
AS
 
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitungobat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
siskavia916
 
prodtion diary updated.pptxrfhkfjgjggjkgjk
prodtion diary updated.pptxrfhkfjgjggjkgjkprodtion diary updated.pptxrfhkfjgjggjkgjk
prodtion diary updated.pptxrfhkfjgjggjkgjk
LeonBraley
 
prodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyuprodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyu
LeonBraley
 
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
vishnuram11000
 
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
basxuke
 
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
yulianti213969
 

Recently uploaded (20)

obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
obat aborsi pemalang wa 081336238223 jual obat aborsi cytotec asli di pemalan...
 
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klatenobat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
obat aborsi Klaten wa 082135199655 jual obat aborsi cytotec asli di Klaten
 
Our great adventures in Warsaw - On arrival training
Our great adventures in Warsaw - On arrival trainingOur great adventures in Warsaw - On arrival training
Our great adventures in Warsaw - On arrival training
 
Reading 8 Artworks about books and readers
Reading 8 Artworks about books and readersReading 8 Artworks about books and readers
Reading 8 Artworks about books and readers
 
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
Russian ℂall Girls Vijay Nagar Hire Me Neha 96XXXXXXX Top Class ℂall Girl Ser...
 
Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdfGreen Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
Green Lantern the Animated Series Practice Boards by Phoebe Holmes.pdf
 
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptxwepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
wepik-mastering-the-art-of-effective-communication-20240222122545I5QF.pptx
 
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
一比一原版西悉尼大学毕业证(UWS毕业证)成绩单如可办理
 
Sun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung daiSun day thang 4 sun life team trung dai
Sun day thang 4 sun life team trung dai
 
一比一定制英国赫特福德大学毕业证学位证书
一比一定制英国赫特福德大学毕业证学位证书一比一定制英国赫特福德大学毕业证学位证书
一比一定制英国赫特福德大学毕业证学位证书
 
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitungobat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
obat aborsi Cibitung wa 082223595321 jual obat aborsi cytotec asli di Cibitung
 
prodtion diary updated.pptxrfhkfjgjggjkgjk
prodtion diary updated.pptxrfhkfjgjggjkgjkprodtion diary updated.pptxrfhkfjgjggjkgjk
prodtion diary updated.pptxrfhkfjgjggjkgjk
 
K_ E_ S_ Retail Store Scavenger Hunt.pptx
K_ E_ S_ Retail Store Scavenger Hunt.pptxK_ E_ S_ Retail Store Scavenger Hunt.pptx
K_ E_ S_ Retail Store Scavenger Hunt.pptx
 
prodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyuprodtion diary updated.pptxyyghktyuitykiyu
prodtion diary updated.pptxyyghktyuitykiyu
 
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
Kala jadu MANPASAND shadi ka asli taweez dene wale amil baba in pakistan,kala...
 
Neighborhood Guide To Atlanta’s Awe-Inspiring Art Galleries
Neighborhood Guide To Atlanta’s Awe-Inspiring Art GalleriesNeighborhood Guide To Atlanta’s Awe-Inspiring Art Galleries
Neighborhood Guide To Atlanta’s Awe-Inspiring Art Galleries
 
Reading 1 Artworks about books and readers
Reading 1 Artworks about books and readersReading 1 Artworks about books and readers
Reading 1 Artworks about books and readers
 
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
一比一原版(CCSF毕业证书)旧金山城市学院毕业证原件一模一样
 
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
obat aborsi klaten wa 081336238223 jual obat aborsi cytotec asli di klaten54-...
 
DeFeliceKitley_Resume_BFAVCDGraduated2024
DeFeliceKitley_Resume_BFAVCDGraduated2024DeFeliceKitley_Resume_BFAVCDGraduated2024
DeFeliceKitley_Resume_BFAVCDGraduated2024
 

16_graphical_models.pdf

  • 1. Structured Probabilistic Models for Deep Learning Lecture slides for Chapter 16 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-10-04
  • 2. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 3. (Goodfellow 2017) Tasks for Generative Models • Density estimation • Denoising • Sample generation • Missing value imputation • Conditional sample generation • Conditional density estimation
  • 4. (Goodfellow 2017) Samples from a BEGAN (Berthelot et al, 2017) Images are 128 pixels wide, 128 pixels tall R, G, and B pixel at each location.
  • 5. (Goodfellow 2017) Cost of Tabular Approach g P(x) by storing quires kn param not feasible for s Number of values per variable Number of variables For BEGAN faces: 256 For BEGAN faces: 128 ⇥ 128 = 16384 There are roughly ten to the power of forty thousand times more points in the discretized domain of the BEGAN face model than there are atoms in the universe.
  • 6. (Goodfellow 2017) Tabular Approach is Infeasible • Memory: cannot store that many parameters • Runtime: inference and sampling are both slow • Statistical efficiency: extremely high number of parameters requires extremely high number of training examples
  • 7. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 8. (Goodfellow 2017) Insight of Model Structure • Most variables influence each other • Most variables do not influence each other directly • Describe influence with a graph • Edges represent direct influence • Paths represent indirect influence • Computational and statistical savings come from omissions of edges
  • 9. (Goodfellow 2017) Directed Models ED PROBABILISTIC MODELS FOR DEEP LEA t0 t0 t1 t1 t2 t2 Alice Bob Carol phical model depicting the relay race example. A shing time t1, because Bob does not get to start arol only gets to start running after Bob finis fluences Carol’s finishing time t2. Figure 16.2 elay race example from section 16.1, suppose we name Bob’s finishing time t1, and Carol’s finishing time t2. mate of t1 depends on t0. Our estimate of t2 depends rectly on t0. We can draw this relationship in a directed d in figure 16.2. raphical model defined on variables x is defined by a hose vertices are the random variables in the model, and al probability distributions p(xi | PaG(xi)), where of xi in G. The probability distribution over x is given p(x) = ⇧ip(xi | PaG(xi)). (16.1) le, this means that, using the graph drawn in figure 16.2, , t1, t2) = p(t0)p(t1 | t0)p(t2 | t1). (16.2) e t0, Bob’s finishing time t1, and Carol’s finishing time t2. ur estimate of t1 depends on t0. Our estimate of t2 depends y indirectly on t0. We can draw this relationship in a directed strated in figure 16.2. ted graphical model defined on variables x is defined by a h G whose vertices are the random variables in the model, and itional probability distributions p(xi | PaG(xi)), where rents of xi in G. The probability distribution over x is given p(x) = ⇧ip(xi | PaG(xi)). (16.1) xample, this means that, using the graph drawn in figure 16.2, p(t0, t1, t2) = p(t0)p(t1 | t0)p(t2 | t1). (16.2) time seeing a structured probabilistic model in action. We t of using it, to observe how structured modeling has many Directed models work best when influence clearly flows in one direction
  • 10. (Goodfellow 2017) Undirected Models RUCTURED PROBABILISTIC MODELS FOR DEE hr hr hy hy hc hc undirected graph representing how your roomma r work colleague’s health hc affect each other. You other with a cold, and you and your work colleagu Undirected models work best when influence has no clear direction or is best modeled as flowing in both directions Do you have a cold? Does your roommate have a cold? Does your work colleague have a cold?
  • 11. (Goodfellow 2017) Undirected Models ph G. For each clique C in the graph,3 a factor (C) al) measures the affinity of the variables in that clique ssible joint states. The factors are constrained to be efine an unnormalized probability distribution p̃(x) = ⇧C2G (C). (16.3) bility distribution is efficient to work with so long as encodes the idea that states with higher affinity are in a Bayesian network, there is little structure to the here is nothing to guarantee that multiplying them obability distribution. See figure 16.4 for an example mation from an undirected graph. unction ability distribution is guaranteed to be nonnegative teed to sum or integrate to 1. To obtain a valid must use the corresponding normalized probability p(x) = 1 Z p̃(x), (16.4) esults in the probability distribution summing or Z = Z p̃(x)dx. (16.5) unction ability distribution is guaranteed to be nonnegative teed to sum or integrate to 1. To obtain a valid must use the corresponding normalized probability p(x) = 1 Z p̃(x), (16.4) esults in the probability distribution summing or Z = Z p̃(x)dx. (16.5) Unnormalized probability Partition function
  • 12. (Goodfellow 2017) Separation R 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN a s b a s b (a) (b) 6: (a) The path between random variable a and random variable b t cause s is not observed. This means that a and b are not separated. in, to indicate that it is observed. Because the only path between and that path is inactive, we can conclude that a and b are separat Separation and D-Separation When s is not observed, influence can flow from a to b and vice versa through s. When s is observed, it blocks the flow of influence between a and b: they are separated
  • 13. (Goodfellow 2017) Separation example HAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN a b c d igure 16.7: An example of reading separation properties from an undirected gr is shaded to indicate that it is observed. Because observing b blocks the only to c, we say that a and c are separated from each other given b. The observ so blocks one path between a and d, but there is a second, active path betw herefore, a and d are not separated given b. The nodes a and c are separated One path between a and d is still active, though the other path is blocked, so these two nodes are not separated.
  • 14. (Goodfellow 2017) d-separation The flow of influence is more complicated for directed models The path between a and b is active for all of these graphs: CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a s b a s b a s b (a) (b) a s b a s b c (c) (d) CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a s b a s b a s b (a) (b) a s b a s b c (c) (d) Figure 16.8: All the kinds of active paths of length two that can exist between random variables a and b. (a) Any path with arrows proceeding directly from a to b or vice versa. This kind of path becomes blocked if s is observed. We have already seen this kind of path in the relay race example. (b) Variables a and b are connected by a common cause s. CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a s b a s b a s b (a) (b) a s b a s b c CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a s b a s b a s b (a) (b) a s b a s b c (c) (d) Figure 16.8: All the kinds of active paths of length two that can exist between random
  • 15. (Goodfellow 2017) d-separation example a b c d e graph, we can read out several d-separation properties. Examples parated given the empty set parated given c parated given c some variables are no longer d-separated when we observe some d Figure 16.9: From this graph, we can read out sever include: • a and b are d-separated given the empty set • a and e are d-separated given c • d and e are d-separated given c We can also see that some variables are no longer variables: • a and b are not d-separated given c • a and b are not d-separated given d c d Figure 16.9: From this graph, we can read ou include: • a and b are d-separated given the empt • a and e are d-separated given c • d and e are d-separated given c We can also see that some variables are no variables: • a and b are not d-separated given c • a and b are not d-separated given d Observing variables can activate paths!
  • 16. (Goodfellow 2017) A complete graph can represent any probability distribution R 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARN 10: Examples of complete graphs, which can describe any probability d how examples with four random variables. (Left) The complete undirec directed case, the complete graph is unique. (Right) A complete direc ected case, there is not a unique complete graph. We choose an orde and draw an arc from each variable to every variable that comes afte The benefits of graphical models come from omitting edges
  • 17. (Goodfellow 2017) Converting between graphs • Any specific probability distribution can be represented by either an undirected or a directed graph • Some probability distributions have conditional independences that one kind of graph fails to imply (the distribution is simpler than the graph describes; need to know the conditional probability distributions to see the independences)
  • 18. (Goodfellow 2017) Converting directed to undirected APTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 a b c a c b h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 a b c a c b ure 16.11: Examples of converting directed models (top row) to undirected models Must add an edge between unconnected coparents
  • 19. (Goodfellow 2017) Converting undirected to directed ER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a b d c a b d c a b d c 16.12: Converting an undirected model to a directed model. (Left) This und cannot be converted to a directed model because it has a loop of length fo ds. Specifically, the undirected model encodes two different independenc cted model can capture simultaneously: a?c | {b, d} and b?d | {a, c}. (C vert the undirected model to a directed model, we must triangulate the uring that all loops of greater than length three have a chord. To do so, No loops of length greater than three allowed! Add edges to triangulate long loops Assign directions to edges. No directed cycles allowed.
  • 20. (Goodfellow 2017) Factor graphs are less ambiguous CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING a b c a b c f1 f1 a b c f1 f1 f2 f2 f3 f3 Figure 16.13: An example of how a factor graph can resolve ambiguity in the interpretation of undirected networks. (Left) An undirected network with a clique involving three variables: a, b and c. (Center) A factor graph corresponding to the same undirected model. This factor graph has one factor over all three variables. (Right) Another valid actor graph for the same undirected model. This factor graph has three factors, each over only two variables. Representation, inference, and learning are all asymptotically heaper in this factor graph than in the factor graph depicted in the center, even though Undirected graph: is this three pairwise potentials or one potential over three variables? Factor graphs disambiguate by placing each potential in the graph
  • 21. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 22. (Goodfellow 2017) Sampling from directed models • Easy and fast to draw fair samples from the whole model • Ancestral sampling: pass through the graph in topological order. Sample each node given its parents. • Harder to sample some nodes given other nodes, unless the observed nodes are at the start of the topology
  • 23. (Goodfellow 2017) Sampling from undirected models • Usually requires Markov chains • Usually cannot be done exactly • Usually requires multiple iterations even to approximate • Described in Chapter 17
  • 24. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 25. (Goodfellow 2017) Tabular Case • Assume each node has a tabular distribution given its parents • Memory, sampling, inference are now exponential in number of variables in factor with largest scope • For many interesting models, this is very small • e.g., RBMs: all factor scopes are size 2 or 1 • Previously, these costs were exponential in total number of nodes • Statistically, much easier to estimate this manageable number of parameters
  • 26. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 27. (Goodfellow 2017) Learning about dependencies • Suppose we have thousands of variables • Maybe gene expression data • Some interact • Some do not • We do not know which ahead of time
  • 28. (Goodfellow 2017) Structure learning strategy • Try out several graphs • See which graph does best job of some criterion • Fitting training set with small model complexity • Fitting validation set • Iterative search, propose new graphs similar to best graph so far (remove edge / add edge / flip edge)
  • 29. (Goodfellow 2017) Latent variable strategy • Use one graph structure • Many latent variables • Dense connections of latent variables to observed variables • Parameters learn that each latent variable interacts strongly with only a small subset of observed variables • Trainable just with gradient descent; no discrete search over graphs
  • 30. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 31. (Goodfellow 2017) Inference and Approximate Inference • Inferring marginal distribution over some nodes or conditional distribution of some nodes given other nodes is #P hard • NP-hardness describes decision problems. #P- hardness describes counting problems, e.g., how many solutions are there to a problem where finding one solution is NP-hard • We usually rely on approximate inference, described in chapter 19
  • 32. (Goodfellow 2017) Roadmap • Challenges of Unstructured Modeling • Using Graphs to Describe Model Structure • Sampling from Graphical Models • Advantages of Structured Modeling • Structure Learning and Latent Variables • Inference and Approximate Inference • The Deep Learning Approach to Structured Probabilistic Modeling
  • 33. (Goodfellow 2017) Deep Learning Stylistic Tendencies • Nodes organized into layers • High amount of connectivity between layers • Examples: RBMs, DBMs, GANs, VAEs CHAPTER 16. STRUCTURED PROBABILISTIC MODELS FOR DEEP LEARNING h1 h1 h2 h2 h3 h3 v1 v1 v2 v2 v3 v3 h4 h4 Figure 16.14: An RBM drawn as a Markov network.
  • 34. (Goodfellow 2017) For more information…