Learning the Structure of Dynamic Probabilistic Networks Matt Hink March 27, 2012
Overview• Deﬁnitions and Introduction to DPNs• Learning from complete data• Experimental Results• Applications
Regular probabilistic networks (Bayesian networks) are well established for representing probabilistic relationships among many random variables. Dynamic Probabilistic Networks (DPNs), however, extend this representation to the modeling ofstochastic evolution of a set of random variables over time. (Think “probabilistic state machines”)
Notation• Capital letters (X,Y,Z)- sets of variables• X - Random variable of set X i• Val(X ) - Finite set of values of X i i• |X | - Size of Val(X ) i i• Lowercase italic (x,y,z)- set instantiations
DPNs are an extension to the common Bayesian network representation where the probabilitydistribution changes with respect to time according to some stochastic process. Assume that X is a set of variables in a PN which vary according to time. Then Xi[t] is the value of the attribute Xi at time t, and X[t] is the collection of such variables.
For simplicity’s sake, we assume that the stochastic process governing transitions is Markovian: P(X[t+1] | X[0...t]) = P(X[t+1] | X[t])That is, the probability of a certain instantiation isdependent only upon its immediate predecessor.
We also assume the process is stationary, i.e., P(X[t+1] | X[t]) is independent of t.
Given these two assumptions, we can describe a DPN representing the joint distribution over all possible trajectories of a process using two parts:A prior network B0 that speciﬁes a distribution over the initial states X; and A transition network B-> over the variables X ∪ X which speciﬁes the transition probability P(X[t+1] | X[t]) for all t.
A prior network (left) and transition network (right) for a dynamic probabilistic network
In light of this structure, the joint distribution over the entire history of the DPN at time T is given as PB(x[0...T]) = PB0(x) ∏(t=0...T-1) PB->(x[t+1] | x[t]) in other words, the product of all previous distributions.
Learning fromComplete Data
Common traditional methods:search algorithims using scoring methods (BIC, BDe) (given a dataset D) DPN methods: search algorithms using scoring methods! (given a dataset D, consisting of Nseq observations)
So each entry in our dataset consists of an observation of a set of variables over time. The mth such sequence has length Nm and speciﬁes values for the variable set Xm[0...Nm] We then have Nseq instances of the initial state, andN = ∑m Nm instances of transitions. We can use these to learn the structure of the prior network and the transition network, respectively.
BIC scores for DPNs• Let
BIC scores for DPNs• We ﬁnd the log-likelihood using
BIC scores for DPNs• And can then ﬁnd the BIC score using