Upcoming SlideShare
×

# 01 graphical models

603
-1

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
603
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
16
0
Likes
1
Embeds 0
No embeds

No notes for slide

### 01 graphical models

1. 1. Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
2. 2. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shownSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
3. 3. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotationsSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
4. 4. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to deﬁne this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: deﬁne form p(y |x) or p(x, y ) for all y ∈ YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
5. 5. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to deﬁne this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: deﬁne form p(y |x) or p(x, y ) for all y ∈ YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
6. 6. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsGraphical Models A graphical model deﬁnes a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
7. 7. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsGraphical Models A graphical model deﬁnes a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random ﬁelds), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, inﬂuence diagrams, etc.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
8. 8. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsBayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ).Sebastian Nowozin and Christoph H. Lampert Family of distributionsPart 2: Introduction to Graphical Models
9. 9. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsBayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ).Sebastian Nowozin and Christoph H. Lampert Family of distributionsPart 2: Introduction to Graphical Models
10. 10. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsUndirected Graphical Models Yi Yj Yk = Markov random ﬁeld (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
11. 11. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsUndirected Graphical Models Yi Yj Yk = Markov random ﬁeld (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
12. 12. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsExample 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
13. 13. Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsExample 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l}Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
14. 14. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFactor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
15. 15. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFactor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
16. 16. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsWhy factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
17. 17. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsCapacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph deﬁnes family of distributions Some families are larger than othersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
18. 18. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFour remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training dataSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
19. 19. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFour remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training dataSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
20. 20. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we deﬁne p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random ﬁelds (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distributionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
21. 21. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we deﬁne p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random ﬁelds (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈FSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
22. 22. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we deﬁne p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random ﬁelds (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈FSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
23. 23. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
24. 24. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
25. 25. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
26. 26. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsEnergy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph modelSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
27. 27. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsParameterization Factor graphs deﬁne a family of distributions Parameterization: identifying individual members by parameters wSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
28. 28. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsParameterization Factor graphs deﬁne a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in familySebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
29. 29. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
30. 30. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image ﬁltersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
31. 31. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
32. 32. Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies diﬀer because of diﬀerent ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
33. 33. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceMaking Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
34. 34. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceLoss function Deﬁne a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependentSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
35. 35. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
36. 36. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
37. 37. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
38. 38. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 1: 0/1 loss Loss 0 iﬀ perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
39. 39. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 1: 0/1 loss Loss 0 iﬀ perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
40. 40. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
41. 41. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
42. 42. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 3: Squared error Assume a vector space on Yi (pixel intensities, optical ﬂow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
43. 43. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 3: Squared error Assume a vector space on Yi (pixel intensities, optical ﬂow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
44. 44. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceInference Task: Maximum A Posteriori (MAP) Inference Deﬁnition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, ﬁnd y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
45. 45. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceInference Task: Probabilistic Inference Deﬁnition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, ﬁnd log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
46. 46. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training imagesSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
47. 47. Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ).Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
48. 48. Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining the Model What can be learned? Model structure: factors Model variables: observed variables ﬁxed, but we can add unobserved variables Factor energies: parametersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
49. 49. Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining the Model What can be learned? Model structure: factors Model variables: observed variables ﬁxed, but we can add unobserved variables Factor energies: parametersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
50. 50. Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: ﬁrst model d(y |x) well by p(y |x, w ), then predict by minimizing the expected lossSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
51. 51. Graphical Models Factor Graphs Test-time Inference TrainingTrainingProbabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of ﬁnding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
52. 52. Graphical Models Factor Graphs Test-time Inference TrainingTrainingProbabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of ﬁnding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
53. 53. Graphical Models Factor Graphs Test-time Inference TrainingTrainingLoss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of ﬁnding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
54. 54. Graphical Models Factor Graphs Test-time Inference TrainingTrainingLoss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of ﬁnding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.