2. Syllabus
• Probability and Axioms-Bayes Rule-
Bayesian Networks-Inferences-Temporal
Models- Hidden Markov models-Fuzzy
reasoning-Certainty factors-Bayesian
Theory-Bayesian Network-Dempster
Shafer theory.
• Case study on each algorithm
4. 1. Probability theory
1.1 Uncertain knowledge
p symptom(p, Toothache) disease(p,cavity)
p sympt(p,Toothache)
disease(p,cavity) disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory degree of belief or
plausibility of a statement – a numerical
measure in [0,1]
• Degree of truth – fuzzy logic degree of belief
4
5. 1.2 Definitions
• Unconditional or prior probability of A – the degree of
belief in A in the absence of any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent
has obtained some evidence B for A - P(A|B)
• P(Cavity | Toothache) = 0.8 5
6. Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event
(random variable) A – a function P:S R
satisfying the axioms:
• 0 P(A) 1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A B) = P(A) + P(B) - P(A B)
P(A ~A) = P(A)+P(~A) –P(false) = P(true)
P(~A) = 1 – P(A)
6
7. Definitions - cont
A and B mutually exclusive P(A B) = P(A) +
P(B)
P(e1 e2 e3 … en) = P(e1) + P(e2) + P(e3) + …
+ P(en)
The probability of a proposition a is equal to the
sum of the probabilities of the atomic events in
which a holds
e(a) – the set of atomic events in which a holds
7
8. 1.3 Product rule
Conditional probabilities can be defined in terms of
unconditional probabilities
The condition probability of the occurrence of
A if event B occurs
– P(A|B) = P(A B) / P(B)
This can be written also as:
– P(A B) = P(A|B) * P(B)
For probability distributions
– P(A=a1 B=b1) = P(A=a1|B=b1) * P(B=b1)
– P(A=a1 B=b2) = P(A=a1|B=b2) * P(B=b2)
….
– P(X,Y) = P(X|Y)*P(Y) 8
9. 1.4 Bayes’ rule and its use
P(A B) = P(A|B) *P(B)
P(A B) = P(B|A) *P(A)
Bays’ rule (theorem)
• P(B|A) = P(A | B) * P(B) / P(A)
• P(B|A) = P(A | B) * P(B) / P(A)
10. Bayes Theorem
hi – hypotheses (i=1,k);
e1,…,en - evidence
P(hi)
P(hi | e1,…,en)
P(e1,…,en| hi)
10
P(h |e ,e ,...,e ) =
P(e ,e ,...,e |h ) P(h )
P(e ,e ,...,e |h ) P(h )
, i = 1,k
i 1 2 n
1 2 n i i
1 2 n j j
j 1
k
11. Bayes’ Theorem - cont
If e1,…,en are independent hypotheses
then
PROSPECTOR
11
k
1,
=
j
),
h
|
P(e
...
)
h
|
P(e
)
h
|
P(e
=
)
h
|
e
,...,
e
,
P(e j
n
j
2
j
1
j
n
2
1
14. 2 Bayesian networks
• Represent dependencies among random
variables
• Give a short specification of conditional
probability distribution
• Many random variables are conditionally
independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random 14
15. 2.1 Definition of Bayesian
networks
A BN is a DAG in which each node is annotated
with quantitative probability information, namely:
• Nodes represent random variables (discrete or
continuous)
• Directed links XY: X has a direct influence on
Y, X is said to be a parent of Y
• each node X has an associated conditional
probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch
• Weather, Cavity Toothache, Cavity Catch
15
16. Bayesian network - example
16
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
B E P(A | B, E)
T F
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.0010.999
Conditional probability
table
17. 2.2 Bayesian network semantics
A) Represent a probability distribution
B) Specify conditional independence – build the
network
A) each value of the probability distribution can be
computed as:
P(X1=x1 … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
17
18. 2.3 Building the network
P(X1=x1 … Xn=xn) = P(x1,…, xn) =
P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if
Parents(Xi) { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in
an order consistent with a DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
18
19. Building the network - cont
• Pick a set of random variables that describe the problem
• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi) a minimal set of nodes that
already exists in the network such that the conditional
independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes
DAG
• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
19
20. Compactness of node ordering
• Far more compact than a probability distribution
• Example of locally structured system (or
sparse): each component interacts directly only
with a limited number of other components
• Associated usually with a linear growth in
complexity rather than with an exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add
the “root causes” first, then the variables they
influence, and so on, until we reach the leaves
20
21. 2.4 Probabilistic inferences
21
P(A V B) = P(A) * P(V|A) * P(B|V)
V
A
B
B
V
A
A V B
P(A V B) = P(V) * P(A|V) * P(B|V)
P(A V B) = P(A) * P(B) * P(V|A,B)
25. 3. Certainty factors
• The MYCIN model
• Certainty factors / Confidence coefficients (CF)
• Heuristic model of uncertain knowledge
• In MYCIN – two probabilistic functions to model
the degree of belief and the degree of disbelief in
a hypothesis
– function to measure the degree of belief - MB
– function to measure the degree of disbelief -
MD
• MB[h,e] – how much the belief in h increases
based on evidence e
• MD[h,e] - how much the disbelief in h increases
based on evidence e 25
27. Belief functions - features
• Value range
• If h is sure, i.e. P(h|e) = 1, then
• If the negation of h is sure, i.e. , P(h|e) = 0 then
27
0 MB[h,e] 1
0 MD[h,e] 1
1 CF[h,e] 1
MB[h,e] =
1 P(h)
1 P(h)
= 1
MD[h,e]= 0
CF[h,e]=1
MB[h,e]= 0
1
=
P(h)
0
P(h)
0
=
e]
MD[h,
CF[h,e]= 1
28. Example in MYCIN
• if (1) the type of the organism is gram-positive, and
• (2) the morphology of the organism is coccus, and
• (3) the growth of the organism is chain
• then there is a strong evidence (0.7) that the identity of
the organism is streptococcus
Example of facts in MYCIN :
• (identity organism-1 pseudomonas 0.8)
• (identity organism-2 e.coli 0.15)
• (morphology organism-2 coccus 1.0)
28
29. 3.2 Combining belief functions
29
(1) Incremental gathering of evidence
• The same attribute value, h, is obtained by two separate
paths of inference, with two separate CFs : CF[h,s1] si
CF[h,s2]
• The two different paths, corresponding to hypotheses s1
and s2 may be different braches of the search tree.
• CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2]
• (identity organism-1 pseudomonas 0.8)
30. Combining belief functions
30
(2) Conjunction of hypothesis
• Applied for computing the CF associated to the
premises of a rule which ahs several conditions
if A = a1 and B = b1 then …
WM: (A a1 h1 cf1)(B b1 h2 cf2)
• CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
31. Combining belief functions
31
(3) Combining beliefs
• An uncertain value is deduced based on a rule
which has as input conditions based on uncertain
values (may be obtained by applying other rules
for example).
• Allows the computation of the CF of the fact
deduced by the rule based on the rule’s CF and
the CF of the hypotheses
• CF[s,e] – belief in a hypothesis s based on
previous evidence e
• CF[h,s] - CF in h if s is sure
• CF’[h,s] = CF[h,s] * CF [s,e]
32. Combining belief functions
32
(3) Combining beliefs – cont
if A = a1 and B = b1 then C = c1 0.7
ML: (A a1 0.9) (B b1 0.6)
CF(premises) = min(0.9, 0.6) = 0.6
CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7
ML: (C c1 0.42)
33. 3.3 Limits of CF
33
• CF of MYCIN assumes that that the hypothesis are
sustained by independent evidence
• An example shows what happens if this condition is
violated
A: The sprinkle functioned last night
U: The grass is wet in the morning
P: Last night it rained
34. 34
R1: if the sprinkle functioned last night
then there is a strong evidence (0.9) that the grass is wet in the
morning
R2: if the grass is wet in the morning
then there is a strong evidence (0.8) that it rained last night
• CF[U,A] = 0.9
• therefore the evidence sprinkle sustains the hypothesis wet
grass with CF = 0.9
• CF[P,U] = 0.8
• therefore the evidence wet grass sustains the hypothesis rain
with CF = 0.8
• CF[P,A] = 0.8 * 0.9 = 0.72
• therefore the evidence sprinkle sustains the hypothesis rain
with CF = 0.72
35. Artificial Intelligence 35
Traditional Logic
• Based on predicate logic
• Three important assumptions:
– Predicate descriptions are sufficient w.r.t. to
the domain
– Information is consistent
– Knowledge base grows monotonically
36. Artificial Intelligence 36
Non-monotonic Logic
• Addresses the three assumptions of traditional
logic
– Knowledge is incomplete
• No knowledge about p: true or false?
• Prolog – closed world assumption
– Knowledge is inconsistent
• Based on how the world usually works
• Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
• New observation may contradict the existing knowledge, thus
the existing knowledge may need removal.
• Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
• Three modal operators are introduced
37. Artificial Intelligence 37
Unless Operator
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems
to keep track of the reasoning steps and preserve the
KB consistency
• Introduce Unless operator
– Support inferences based on the belief that its argument is
not true
– Consider
• p(X) unless q(X) r(X)
If p(X) is true and not believe q(X) true then r(X)
• p(Z)
• r(W) s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
• Either unknown or believed false
• Believed or known true
– Monotonocity
38. Artificial Intelligence 38
Is-consistent-with Operator M
• When reason, make sure the premises are
consistent
• Format: M p – p is consistent with KB
• Consider
– X good_student(X) M study_hard(X)
graduates(X)
– For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
graduate
– Not necessary to prove that X study hard.
• How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search
39. Artificial Intelligence 39
Default Logic
• Introduce a new format of inference rules:
– A(Z) :B(Z) C(Z)
– If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
• In default logic, new rules are used to infer sets of plausible
extensions
– Example:
X good_student(X) :study_hard(X) graduates(X)
Y party(Y) :not(study_hard(Y)) not(graduates(X))
40. Artificial Intelligence 40
Fuzzy Sets
• Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
• Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive
43. Artificial Intelligence 43
Fuzzy Set Operations
• Fuzzy set operations are defined as the
operations of membership functions
• Complement: ¬A = C
– mC = 1 – mA
• Union: A B =C
– mC = max(mA, mB)
• Intersection: A B = C
– mC = min(mA, mB)
• Difference: A – B = C
– mC = max(0, mA-mB)
44. Artificial Intelligence 44
Fuzzy Inference Rules
• Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)
45. Artificial Intelligence 45
The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive
46. Artificial Intelligence 46
The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.
48. Artificial Intelligence 48
The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:
49. Artificial Intelligence 49
The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.
50. Artificial Intelligence 50
Dempster-Shafer Theory
• Probability theory limitation
– Assign a single number to measure any situation, no matter how it is
complex
– Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to constraint
the degree of belief for each individual propositions in the set
– The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
– The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]
51. Artificial Intelligence 51
Properties of Dempster-Shafer
• Initially, no support evidence for either competing
hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker
axioms than probability function
• Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief degrees
when they are based on independent evidence
52. Artificial Intelligence 52
An Example
• Two persons M and B with reliabilities detect a computer and claim
the result independently. How you believe their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
• Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
53. Artificial Intelligence 53
Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
• Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317
54. Artificial Intelligence 54
Dempster’s Rule
• Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency may
disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
– After M and B claimed, a repair person is called to check the
computer, and both M and B witnessed this.
– Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual
elements of a set of hypotheses, but often supports
different subsets of hypotheses, in favor of some and
against others
55. Artificial Intelligence 55
General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive
hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief
measure to Z
• Mn(Z) – belief degree to Z, where n is the number of
sources of evidences
56. Artificial Intelligence 56
Discrete Markov Process
• Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a formal
language
• Probabilistic finite state machine
– A finite state machine
– Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state machine
– Ignores its input values
57. Artificial Intelligence 57
A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))
58. Artificial Intelligence 58
Observable Markov Model
• Assume p(S(t)|S(t-1)) is time invariant, that is, transition between
specific states retains the same probabilistic relationship
• State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1
59. Artificial Intelligence 59
S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?
60. Restrictiveness of Markov models
• Are past and future really independent given current state?
• E.g., suppose that when it rains, it rains for at most 2 days
S1 S2 S3 S4 …
• Second-order Markov process
• Workaround: change meaning of “state” to events of last 2 days
S1, S2 …
S2, S3 S3, S4 S4, S5
• Another approach: add more information to the state
• E.g., the full state of the world would include whether the
sky is full of water
– Additional information may not be observable
– Blowup of number of states…
61. Hidden Markov models (HMMs)
• Same as Markov model, except we cannot see the
state
• Instead, we only see an observation each period,
which depends on the current state
S1 S2 S3 … St …
• Still need a transition model: P(St+1 = j | St = i) = aij
• Also need an observation model: P(Ot = k | St = i) = bik
O1 O2 O3 … Ot …
62. Weather example extended to HMM
• Transition probabilities:
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Observation: labmate wet or dry
• bsw = .1, bcw = .3, brw = .8
63. HMM weather example: a question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that it is now raining outside?
• P(S2 = r | O0 = d, O1 = w, O2 = w)
• By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
64. Solving the question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Computationally efficient approach: first compute
P(S1 = i, O0 = d, O1 = w) for all states i
• General case: solve for P(St, O0 = o0, O1 = o1, …, Ot
= ot) for t=1, then t=2, … This is called monitoring
• P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1
P(St-1 = st-1,
O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot =
o | S )
bsw = .1
bcw = .3
brw = .8
65. Predicting further out
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that two days from now it
will be raining outside?
• P(S4 = r | O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
66. Predicting further out, continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w)
• Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w)
• P(S3 = r | O0 = d, O1 = w, O2 = w) =
Σs2
P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w)
Σs2
P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w)
• Etc. for S4
• So: monitoring first, then straightforward Markov process
updates
bsw = .1
bcw = .3
brw = .8
67. Integrating newer information
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for four days (!)
• On those days, your labmate was dry, wet, wet, dry
respectively
• What is the probability that two days ago it was
raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3
= d)
– Smoothing or hindsight problem
bsw = .1
bcw = .3
brw = .8
68. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d)
• “Partial” application of Bayes’ rule:
P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) =
P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) /
P(O2 = w, O3 = d | O0 = d, O1 = w)
• So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w)
bsw = .1
bcw = .3
brw = .8
69. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w)
• P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) =
P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r)
• Already know how to compute P(S1 = r | O0 = d, O1 = w)
• Just need to compute P(O2 = w, O3 = d | S1 = r)
bsw = .1
bcw = .3
brw = .8
70. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Just need to compute P(O2 = w, O3 = d | S1 = r)
• P(O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2, O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2)
• First two factors directly in the model; last factor is a
“smaller” problem of the same kind
• Use dynamic programming, backwards from the future
– Similar to forwards approach from the past
bsw = .1
bcw = .3
brw = .8