01 graphical models
Upcoming SlideShare
Loading in...5
×
 

01 graphical models

on

  • 780 views

 

Statistics

Views

Total Views
780
Views on SlideShare
780
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

01 graphical models 01 graphical models Presentation Transcript

  • Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shownSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotationsSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsIntroduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsGraphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsGraphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsBayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ).Sebastian Nowozin and Christoph H. Lampert Family of distributionsPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsBayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ).Sebastian Nowozin and Christoph H. Lampert Family of distributionsPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsUndirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsUndirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsExample 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) ZSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingGraphical ModelsExample 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l}Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFactor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFactor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsWhy factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsCapacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than othersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFour remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training dataSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsFour remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training dataSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distributionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈FSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsConditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈FSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsPotentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsEnergy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph modelSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsParameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters wSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsParameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in familySebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filtersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingFactor GraphsExample: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceMaking Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceLoss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependentSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceTest-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) predictionSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceInference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈YSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceInference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w )Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training imagesSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTest-time InferenceExample: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ).Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parametersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parametersSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingTraining: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected lossSebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingProbabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingProbabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail.Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingLoss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models
  • Graphical Models Factor Graphs Test-time Inference TrainingTrainingLoss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x)Sebastian Nowozin and Christoph H. LampertPart 2: Introduction to Graphical Models