Computing Marginal Distributions over Co...
Upcoming SlideShare
Loading in …5

Computing Marginal in CCMRFs - NIPS 2010


Published on

Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Computing Marginal in CCMRFs - NIPS 2010

  1. 1. Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning Matthias Bröcheler and Lise Getoor Supported by NSF Grant No. 0937094 The complexity of computing an approximate Lovasz & Vempala ‘04 Problem? distribution σ* using hit-and-run sampling such that Computing marginal distributions in constrained the total variation distance of σ* and P is less than ε is continuous MRFs (CCMRF) ∗ 3 d O n (kB + n + m) ˜ ˜ Motivation? where ñ=n-kA, under the assumptions that we start from an initial distribution σ such Many applications of CCMRF, probabilistic soft logic Xi p that the density function dσ/dP is bounded by M except on a set S with σ(S)≤ε/s being one of them Contributions? Hit-and-Run Sampling In  Theory…   q In  Prac@ce…   Analysis of the theoretical and practical aspects of 1.  Sample random direction computing marginals in CCMRFs 2.  Compute line segment d 3.  Induce density on line Algorithm ε1 4.  Sample from induced density p 1.  Start=MAP state What’s  a  CCMRF?   2.  Dimensionality reduction and LA Constrained Continuous Markov Random Field Let’s  approximate!   3.  How do we get out ε2 of corners? X = {X1 , .., Xn } : Di ⊂ R D = ×n Di zk − W k d i T i=1 Computing the marginal probability density function 1.  Corner heuristic di+1 = di + 2 Wk φ = {φ1 , .., φm } : φj : D → [0, M] 4.  Induce f efficiently Constraints fX (x ) = f (x , y)dy for a subset X ⊂ X under Wk 2 Λ = {λ1 , .., λm } ˜ y∈×D ,s.t.X ∈X i i / Equality Constraints the probability measure defined by a CCMRF is #P Probability measure P over X defined through A : D → RkA , a ∈ Rk A 1 m hard in the worst case. Experimental  Results   Inequality Constraintsf (x) = exp[− λj φj (x)] Z(Λ) B : D → Rk B , b ∈ Rk B Collective classification of 1717 Wikipedia articles with 20% seed documents j=1   ˜ D = D ∩ {x|A(x) = a ∧ B(x) ≤ b} In  Theory…   Setup using tf/idf weighted cosine similarity as baseline and comparing against a m PSL program with learned weights over K-folds cross validation. Z(Λ) = exp − λj φj (x) dx / ˜ f (x) = 0 ∀x ∈ D Why  CCMRF?   Std. Deviation Indicator of D j=1 Folds Improvement P(Null Relative Confidence over baseline Hypothesis) Difference Δ(σ) Probabilistic soft logic (PSL) is a declarative language ∆(σ) = 2 σ− − σ+ 20 41.4% 1.95E-09 38.3% for collective probabilistic reasoning about similarity σ+ + σ− What  does  it  look  like?   or uncertainty in relational domains. PSL focuses on 25 31.7% 2.40E-13 41.2% 30 39.1% 1.00E-16 43.5% Hypothesis X1 statistical relational learning problems with continuous 35 46.1% 4.54E-08 39.0% ∆(σ) 0 1 1 X1 φ3 (x) = max(0, x2 − x3 ) f RVs and supports sets and aggregation. Convergence Analysis φ2 (x) = max(0, x1 − x2 ) 0 1 PSL programs get grounded into CCMRFs for inference. 5 KL Divergence φ1 (x) = x1 x1 + x3 ≤ 1 w1 : class(B,C)  A.text≈B.text class(A,C) Average KL Divergence P(0.4 ≤ X2 ≤ 0.6) 0.5 X3 0 Highest Probability 0 X3 w2 : class(B,C)  link(A,B) class(A,C) Lowest Quartile KL RV) Divergence (322-413 1 1 Highest Quartile KL RV) (174-224X2 Λ = {1, 2, 1} Constraint: functional(class) 0.05 Divergence X = {X1 , X2 , X3 } 30000 300000 Number of Samples 3000000