08 probabilistic inference over time

Bayesian Networks
Unit 8 Probabilistic Inference
over Time
Wang, Yuan-Kai, 王元凱
ykwang@mails.fju.edu.tw
http://www.ykwang.tw

Department of Electrical Engineering, Fu Jen Univ.
輔仁大學電機工程系

2006~2011
Reference this document as:
Wang, Yuan-Kai, “Probabilistic Inference over Time,"
Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Probabilistic Inference over Time p. 2

Goal of This Unit
• Know the uncertainty concept in temporal
models
• Learn four inference types in temporal
models
– Filtering, Prediction, Smoothing,
Most Likely Explanation
• See some temporal models
– HMM, Kalman/Particle filtering
– Dynamic Bayesian networks



Related Units
• Background
– Probabilistic graphical model
– Exact inference in BN
– Approximate inference in BN
• Next units
– HMM
– Kalman filter
– Particle filter
– DBN



Self-Study Reference
• Chapter 15, Sections 15.1-15.2, Artificial
Intelligence - a modern approach, 2nd, by S.
Russel & P. Norvig, Prentice Hall, 2003.



Structure of Related Lecture Notes
Problem Structure Data
Learning
PGM B E
Representation Learning
A
Unit 5 : BN Units 16~ : MLE, EM
Unit 9 : Hybrid BN J M
Units 10~15: Naïve Bayes, MRF,
HMM, DBN,
Kalman filter P(B) Parameter
P(E) Learning
P(A|B,E)
P(J|A)
Query Inference
P(M|A)
Unit 6: Exact inference
Unit 7: Approximate inference
Unit 8: Temporal inference


Contents

1. Time and Uncertainty …………………...... 7
2. Inference in Temporal Models ……...……. 46
3. Various Models .…….................................... 90
4. References …………………………………. 96



1. Time and Uncertainty
• What is probabilistic reasoning over
time
– There are a lot of time-series data
• Ex: Stock data, weather data, radar
signal, ...
– We want to
• Predict its next data
• Recover correct values of its current data
• Recover correct values of its previous data


Example – Stock Data



Example 2 - Visual Tracking
• What is visual tracking
– Continuously detect objects in video
– Time series data
• What kind of objects
– Face,
– Facial features (eye, eyebrow, ...)
– Human body
– Hand
– ...



Why Visual Tracking (1/2)
• A simple idea to detect objects in all frames
of a video
– "Detect object at every frame with the same
detection method
• Disadvantage
– A detection of a frame may be slow
– Detections at all frames become very slow
• So, if you have a very quick detection
method, the simple method is OK?



Why Visual Tracking (2/2)
• A better approach to detect objects in all
frames of a video
– Detect objects at the first frame
– Find objects at succeeding frames with a quick
method
 tracking
• Goal of visual tracking
– Fast and accurate detection of objects



Front-View Face Tracking
Single frame detector

Temporal detector



Side-View Face Tracking

without temporal continuity without temporal continuity



Two Kinds of Approaches
• Neighborhood-based
– Search the neighborhood of the object's
location in previous frame
• Prediction-based
– Search the neighborhood of the predicted
location in current frame



Basic Algorithm
• Basic idea of both the two approaches
1.Read first frame
2.Detect moving object O
Obtain Region of Interest (ROI), usually rectangle or
ellipse
3.Read next frame
4.For all possible ROI candidate Oc
a)Compare the similarity between O and Oc
b)If similarity is high, tracking successfully.
Break



Neighborhood-search Tracking
• Basic idea
1. Read first frame
2. Detect face O
Obtain Region of Interest (ROI), usually rectangle
or ellipse
3. Read next frame
4. For all possible ROI candidate Oc
a) Compare the similarity between O and Oc
b) If similarity is high, break



Basic Ideas
Face
Detection
O

First frame
Face
Tracking

O
Oc
Search Region
Next frame


Prediction-based Tracking
• Three steps
– Predict next position of moving objects
with a probabilistic model (parameters)
– Detect new position around the predicted
position
• Prediction error
– Update
• The correct position
• The probabilistic model with the
prediction error



Predict Next Position

P ( zt | xt )

Current frame
Previous frames
Real position : xt Predicted position
Detected position : zt x-t+1

P ( zt | xt ) P(x t 1 | xt )
Probabilistic
P ( x t | x t 1 ) model


Detect New Position by LSE

Predicted position
Search region

SE = 1032, 2560, LSE = 104
1968, 104, 2223,
... Detected position: zt+1


Update

x-t+1 Prediction Error zt+1
x-t+1-zt+1
Corrected position xt+1

Corrected P '( z t | x t )
Probabilistic
model P '( x t | x t 1 )



Accurate Tracking = Smoothing



Example 3 - Robot Localization
• Localization of AIBO robot in
RoboCup
• The robot has to
– See landmark
• Object detection & object recognition
– Analyze the landmark
• Calculate distance & angle between the
robot and the landmark
– Estimate its location



RoboCup Field

(r , )



Tracking of Robot



Temporal Patterns
• Deterministic patterns :
– Traffic light
– FSM
(Finite State Machine)
–…

• Non-Deterministic patterns :
– Weather
– Speech
– Tracking
– …



How to Do It?
• What we want?
– Prediction: Predict its next data
– Filtering: Recover correct values of its
current data
– Smoothing: Recover correct values of
its previous data
• How to achieve it?
– Statistically model the data



Statistically Modeling

y

x Predict
A set of time-
y = 1.3x + 96 : Model Filter
related data Smooth


State
• There is a set of time-related data
Time t = 0 1 2 3 ...
︵︵︵︵
50, 100 49, 98 50, 96 48, 94
State s = 50, 180 50, 178 50, 176 47, 173
50, 160 49, 158 50, 156 48, 154
︶︶︶︶

• We call each data
– A state of the system, or
– A state of the object



Observable v.s. Unobservable States
• Observable state
– Measurable values
• Sensor values, feature values
– Ex : Localization/Visual Tracking
• Measured position, Measured speed
– Ex : Facial Expression Recognition
• Eyebrow up, eyebrow down, ...
• Unobservable state
– Real state of the system/object
– Ex : Localization/Visual Tracking
• Real position, real speed
– Ex : Facial Expression Recognition
• Smile, Cry, Anger, ...


Observable v.s. Unobservable States
(Math)
• Let
– Xt = set of unobservable state variables at
time t
– Et = set of observable state variables at
time t
• Usually we observe
– E0, E1, ...., Et : time-related data
• But we want to derive
– X0, X1, ..., Xt
• Notation: Xa:b = Xa, Xa+1, ..., Xb


Markov Chain
• Markov chain is an assumption
– A state is dependent on previous state
– Xt depends on X0:t-1
– Xt+1 will not influence Xt
• Markov process
– If we assume that a set of data obeys
Markov assumption,
– We say the data perform Markov
process


Markov Process
• First-order Markov process
– P(Xt |X0:t-1)=P(Xt | Xt-1 )

• Second-order Markov process
– P(Xt |X0:t-1)=P(Xt | Xt-2 , Xt-1 )

• Higher order Markov process ...
– Complicate, seldom used


Transition Model & Sensor Model
• Transition model
– P(Xt | Xt-1 )
– P(Xt | Xt-2 , Xt-1 )
• Sensor model
– We usually assume the evidence
variables (sensor values) at time t, Et,
depend only on the current state Xt
– P(Et|X0:t, E0:t-1) = P(Et|Xt)
– It is also called observation model



Diagram of Transition & Sensor
Models for 1st Order Markov
• P(Xt | Xt-1 ) Transition of
Xt-1 Xt unobservable states
Causal relationship
• P(Et|Xt) Xt Et between observable &
unobservable states
Xt-1 Xt Xt+1 Xt+2

Et Et+1 Et+2
A special Bayesian network


An "Umbrella World" Example (1/2)
• A security guard is always at a secret
underground room, without going out
• He wants to know if it is raining today
• But he can not observe the outside world
• He can only see each morning the
director coming in with, or without, an
umbrella
• Rain is the unobservable state
• Umbrella is the observable state
(sensors values)


An "Umbrella World" Example (2/2)

• For each day t, the set Et contains a single
evidence Ut (whether the umbrella appears)
• The set Xt contains a single state variable Rt
(whether it is raining)


Stationary Process
• The transition model P(Xt | Xt-1) and the
sensor model P(Et | Xt) are all fixed for
all time t
• Stationary process assumption
– Can reduce the complexity of the
algorithm for inference



Inference for the Markov Process (1/2)
X0 X1 X2 Xt

E1 E2 Et
• A Bayesian net with 2 random variables
– X: X0, X1, ..., Xt
– E: E1, ..., Et
• We know that P(X0, X1, ..., Xt, E1, ..., Et),
the FJD, can answer any query
– But it can be reduced t
P( X 0 , X 1 , X t , E1 , , Et )  P( X 0 ) P( X i | X i 1 ) P( Ei | X i )
i 1


Inference for the Markov Process (2/2)
• We need three PDFs
– P(X0), P(Xi|Xi-1), P(Ei|Xi)
• For discrete R.V., we need
– 1 prior probability table P(X0)
– 2 CPTs
• CPT for transition model: P(Xi|Xi-1)
• CPT for sensor model: P(Ei|Xi)
• For continuous R.V., we need
– Gaussian pdf, Gaussian Mixture, ...
• Here we consider only discrete R.V.


Sequence Diagram (1/4)
X P(X0)
S0 0.2
X0 X1 X2 Xt
S1 0.1
... ... E1 E2 Et
Si 0.3
P(X0) probability table ?
• Suppose the unobservable variable X
is a discrete R.V.
•X = S1, S2, S3, ..., Si, ...
• P(X0) is the probability of X=Si at t=0


X0 X1 X2 Xt

E1 E2 Et
P(Xt|Xt-1) conditional probability table ?
Xt+1 S1 S2 ... Si
Xt
Transition S1 0.1 0.2 ... 0.05
probability S2 0.2 0.15 ... 0.18
... ... ... ... ...
Si 0.31 0.03 ... 0.22


X0 X1 X2 Xt

E1 E2 Et
P(Et|Xt) conditional probability table ?
Et v1 v2 ... vj
Xt
Observation S1 0.1 0.2 ... 0.05
probability S2 0.2 0.15 ... 0.18
... ... ... ... ...
Si 0.31 0.03 ... 0.22


X1 X2 X3 X4 X5 X6 X7
S3 S3 S1 S1 S3 S2 S3

v2 v4 v1 v1 v2 v3 v4
E1 E2 E3 E4 E5 E6 E7
t =1 2 3 4 5 6 7

S1 S1 S1 S1 S1 S1 S1
S2 S2 S2 S2 S2 S2 S2
S3 S3 S3 S3 S3 S3 S3


Short Summary
X0 X1 X2 Xt

E1 E2 Et
• If we have the three PDFs/Tables
– P(X0), P(Xi|Xi-1), P(Ei|Xi)
• We can answer any query
– P(X1, X3 | E2, E4), P(X1, E5 | X2, X4), ...
• Do we need to ask many kinds of query?
• Or we have some frequently asked queries?


2. Inference in Temporal Models
• Four common query tasks in
temporal inference/reasoning
– Filtering: P(Xt | e1:t)= P(Xt | E1:t=e1:t)
• Estimate correct current states
– Prediction: P(Xt+k | e1:t) for k > 0
• Predict possible next states
– Smoothing: P(Xk | e1:t) for 1  k < t
• Better estimate of past states
– Most likely explanation:
arg maxX1:t P(X1:t | e1:t)


Subsections
• 2.1 Graphical models of the 4
inferences
• 2.2 Mathematical formula of the 4
inferences



2.1 Graphical Models
of the 4 Inferences
• Use sequence diagram to illustrate
what are
– Filtering
– Prediction
– Smoothing
– Most likely explanation



Graphical Models - Filtering
• P(Xt | e1:t) X0 X1 X2 Xt

E1 E2 Et

A filtering example for 2D
position of robot/WLAN
card



Graphical Models - Prediction
• P(Xt+k | e1:t) for k > 0
For k=1
X0 X1 X2 Xt Xt+1

E1 E2 Et



Graphical Models – Smoothing (1/3)
• P(Xk | e1:t) for 1  k < t
X0 X1 X2 Xk Xt

E1 E2 Ek Et



Smoothing v.s. Filtering



Graphical Models
- Most Likely Explanation (1/2)
• arg maxX1:t P(X1:t | e1:t)

X0 X1 X2 Xt

E1 E2 Et



Graphical Models
- Most Likely Explanation (2/2)
t =1 2 3 4 5 6 7

E1=v2 E2=v4 E3=v1 E4=v1 E5=v2 E6=v3 E7=v4
S1 S1 S1 S1 S1 S1 S1
S2 S2 S2 S2 S2 S2 S2
S3 S3 S3 S3 S3 S3 S3



2.2 Mathematical Formula
of the 4 Inferences
• Derive mathematical formula of
– Prediction
– Filtering
– Smoothing
– Most likely explanation



• Assume the man believes that
P(R0) = <0.5,0.5> = < P(r0), P(r0) >
– The rain probability before the
observation sequence begins
• Now we has the observation sequence:
umbrella1=true, umbrella2=true
• We will use the P(R |U ) P(R2|U1,U2)
1 1
filtering process
Rain1 Rain2
to find rain
probability
Umbrella1 Umbrella2
=true =true


Rain0 Rt-1 P(Rt)
Rain1 Rain2
Rt P(Ut)
P(R1|u1) t 0.7
t 0.9
f 0.3
= P(u1|R1)P(R1) f 0.2
Umbrella1 Umbrella2
P(R1) =true =true
= r0P(R1|r0)P(r0)
= <0.7,0.3>0.5 + <0.3,0.7>0.5 = <0.5,0.5>
P(R1|u1) = P(u1|R1)P(R1)
= <0.9,0.2><0.5,0,5> = <0.45,0.1>
 <0.818, 0.182>


Rain0 Rain1 Rain2
Rt-1 P(Rt) Rt P(Ut)
t 0.7 t 0.9
f 0.3 f 0.2
P(R2|u1,u2)
= P(u2|R2)P(R2|u1) Umbrella1 Umbrella2
=true =true
P(R2|u1)
= r1P(R2|r1)P(r1|u1)
= <0.7,0.3>0.818 + <0.3,0.7>0.182 = <0.627,0.373>
P(R2|u1,u2) = P(u2|R2)P(R2|u1)
= <0.9,0.2><0.627,0,373> = <0.565,0.075>
 <0.883, 0.117>


Backward Procedure (1/2)
• P(ek+1:t | Xk)
= xk+1P(ek+1|xk+1)P(ek+2:t|xk+1)P(xk+1|Xk)
• The formula is rewritten as
bk+1:t = Backward(bk+2:t , ek+1)



Smoothing Example (1/3)
• For the umbrella example
• P(R1 | u1, u2)
– Computing the smoothed estimate for the
probability of rain at t=1,
– Given the umbrella observations on days 1 & 2
Rain0 Rain1 Rain2
Rt-1 P(Rt) Rt P(Ut)
t 0.7 t 0.9
f 0.3 f 0.2
Umbrella1 Umbrella2
=true =true


• P(R1 | u1, u2) = P(R1|u1)P(u2|R1)
– P(R1|u1) = <0.818, 0.182>
– P(u2|R1) = r2P(u2|r2)P(|r2)P(r2|R1)
= (0.91<0.7,0.3>) + (0.21<0.3,0.7>)
= <0.69, 0.41>
• P(R1 | u1, u2) = <0.818,0.182><0.69,0.41>
 <0.883, 0.117>
• Note: P(R1|u1) = <0.818, 0.182>
• With more one observation u2, the
probability of r1 increases  smoothing


Most Likely Explanation (1/2)
• Smoothing P(Xk | e1:t) considers only
one past state at time step k
• Most likely explanation,
– Considers all past states, and
– Choose the best state sequence
X0 X1 X2 Xt

E1 E2 Et


Most Likely Explanation (2/2)
• We will discuss 3 algorithms
– Algorithm 1:
• Very simple, directly using smoothing
• Time complexity O(t2)
– Algorithm 2(forward-backward algo.):
• Improved usage of smoothing
• Time complexity O(t)
• But the result may not be the best state
sequence
– Algorithm 3(Viterbi algorithm):
• Time complexity O(t)


Algorithm 1
• The most simple idea for this
problem
– Call smoothing t times, smoothing one
state each time
– For (i=0; i<t; i++) P(Xi | e1:t)
• Drawback
– Time complexity of O(t2) : too slow
• Improvement
– Apply dynamic programming to
reduce the complexity to O(t)


Algorithm 2 (1/2)
• Forward-backward algorithm
– First, record the results of the forward
filtering over the whole sequence from 1
to t
– Then, run the backward recursion from t
down to 1, and
• Compute the smoothed estimate at each time
step k, from the bk+1:t and the stored f1:k



Algorithm 2 (2/2)

fv[i]= f1:t = P(Xt | e1:t )
forward procedure: f1:t+1 = Forward(f1:t , et+1)

Smoothing: P(Xk | e1:t) =  f1:kbk+1:t
backward procedure: bk+1:t = Backward(bk+2:t , ek+1)

Fu Jen University in previous slidesElectrical Engineering
Department of Wang, Yuan-Kai Copyright


However (1/2)
• For the umbrella example, suppose there is
an observation sequence e1:t=[true, true,
false, true, true] for umbrella's appearance
• What is the weather sequence most likely to
explain this?
– Does the absence of the umbrella on day 3
mean that
• Day 3 wasn't raining, or
• The director forget to bring it?
• If day 3 wasn't raining, day 4 may not be raining
either, but the director brought the umbrella just
in case


However (2/2)
• The forward-backward algorithm
uses smoothing for each single time
step
• But to find the most likely sequence,
we must consider joint probabilities
over all time steps
• To consider joint probabilities of a
sequence, we need to consider path


Path
• A path is a possible sequence
– There are 25 paths
– Each path (sequence) has a probability
– Only one path has the maximum
probability



Probability of Path
t
P( X 1:t | e1:t )   P( X i 1 | X i ) P(ei | X i )
i 1

• arg maxX1:t P(X1:t | e1:t)



Recursive View
• An important idea for finding
– A path in maxX1:t-1 P(X1:t-1 | e1:t-1) must
be the path in maxX1:t P(X1:t | e1:t)



The Viterbi Example



3. Various Models

• Hidden Markov Models
• Kalman Filter
• Particle Filter
• Dynamic Bayesian Networks



Hidden Markov Model (1/2)
Hidden states
eg. Real location X1 X2 X3

Observations
eg. Detected Y1 Y2 Y3
location
n
P( x1 , x2 ,..., xn )   P( xi | pa( xi ))
i 1



Hidden Markov Model (2/2)
 X1 A
X2 X3

Parameter tyeing
B Y1 Y3
Y2

Transition matrix
Observation matrix
Initial state distribution


Kalman Filtering

X1 X2 X3

Y1 Y2 Y3

• The same graphical structure with HMM
• But
•In HMM, Xi and Yi are discrete (CPT)
•In Kalman filter, Xi and Yi are continuous


Particle Filtering
• TBU



Dynamic Bayesian Network (DBN)
• TBU



4. References
• Chapter 15, Sections 15.1-15.2, Artificial
Intelligence - a modern approach, 2nd, by S.
Russel & P. Norvig, Prentice Hall, 2003.


08 probabilistic inference over time

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Recently uploaded

Recently uploaded (20)

08 probabilistic inference over time