Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

19CS308T: Artificial Intelligence
UNIT-III
UNCERTAINITY AND STATISTICAL
REASONING
Faculty:Mr.K.Sundar

Syllabus
• Probability and Axioms-Bayes Rule-
Bayesian Networks-Inferences-Temporal
Models- Hidden Markov models-Fuzzy
reasoning-Certainty factors-Bayesian
Theory-Bayesian Network-Dempster
Shafer theory.
• Case study on each algorithm

• Probability theory
• Bayesian networks
• Certainty factors
3

1. Probability theory
1.1 Uncertain knowledge
p symptom(p, Toothache)  disease(p,cavity)
p sympt(p,Toothache) 
disease(p,cavity)  disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory  degree of belief or
plausibility of a statement – a numerical
measure in [0,1]
• Degree of truth – fuzzy logic  degree of belief
4

1.2 Definitions
• Unconditional or prior probability of A – the degree of
belief in A in the absence of any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent
has obtained some evidence B for A - P(A|B)
• P(Cavity | Toothache) = 0.8 5

Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event
(random variable) A – a function P:S  R
satisfying the axioms:
• 0  P(A)  1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A  B) = P(A) + P(B) - P(A  B)
P(A  ~A) = P(A)+P(~A) –P(false) = P(true)
P(~A) = 1 – P(A)
6

Definitions - cont
A and B mutually exclusive  P(A  B) = P(A) +
P(B)
P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + …
+ P(en)
The probability of a proposition a is equal to the
sum of the probabilities of the atomic events in
which a holds
e(a) – the set of atomic events in which a holds
7

1.3 Product rule
Conditional probabilities can be defined in terms of
unconditional probabilities
The condition probability of the occurrence of
A if event B occurs
– P(A|B) = P(A  B) / P(B)
This can be written also as:
– P(A  B) = P(A|B) * P(B)
For probability distributions
– P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1)
– P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2)
….
– P(X,Y) = P(X|Y)*P(Y) 8

Bayes’ Theorem - cont
If e1,…,en are independent hypotheses
then
PROSPECTOR
11
k
1,
=
j
),
h
|
P(e
...
)
h
|
P(e
)
h
|
P(e
=
)
h
|
e
,...,
e
,
P(e j
n
j
2
j
1
j
n
2
1 



1.5 Inferences
Probability distribution P(Cavity, Tooth)
Tooth  Tooth
Cavity 0.04 0.06
 Cavity 0.01 0.89
P(Cavity) = 0.04 + 0.06 = 0.1
P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 /
0.05
12

Inferences
Probability distributions P(Cavity, Tooth, Catch)
P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2
P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016
+ 0.064 = 0.28
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) =
[P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * /
P(Tooth)
13
Tooth ~ Tooth
Catch ~ Catch Catch ~ Catch
Cavity 0.108 0.012 0.072 0.008
~ Cavity 0.016 0.064 0.144 0.576

2 Bayesian networks
• Represent dependencies among random
variables
• Give a short specification of conditional
probability distribution
• Many random variables are conditionally
independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random 14

2.1 Definition of Bayesian
networks
A BN is a DAG in which each node is annotated
with quantitative probability information, namely:
• Nodes represent random variables (discrete or
continuous)
• Directed links XY: X has a direct influence on
Y, X is said to be a parent of Y
• each node X has an associated conditional
probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch
• Weather, Cavity  Toothache, Cavity  Catch
15

Bayesian network - example
16
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
B E P(A | B, E)
T F
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.0010.999
Conditional probability
table

2.2 Bayesian network semantics
A) Represent a probability distribution
B) Specify conditional independence – build the
network
A) each value of the probability distribution can be
computed as:
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
17

2.3 Building the network
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if
Parents(Xi)  { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in
an order consistent with a DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
18

Building the network - cont
• Pick a set of random variables that describe the problem
• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi)  a minimal set of nodes that
already exists in the network such that the conditional
independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes 
DAG
• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
19

Compactness of node ordering
• Far more compact than a probability distribution
• Example of locally structured system (or
sparse): each component interacts directly only
with a limited number of other components
• Associated usually with a linear growth in
complexity rather than with an exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add
the “root causes” first, then the variables they
influence, and so on, until we reach the leaves
20

2.4 Probabilistic inferences
21
P(A  V  B) = P(A) * P(V|A) * P(B|V)
V
A
B
B
V
A
A V B
P(A  V  B) = P(V) * P(A|V) * P(B|V)
P(A  V  B) = P(A) * P(B) * P(V|A,B)

Probabilistic inferences
22
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
P(J  M  A B E ) =
P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)=
0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062

2.5 Different types of inferences
24
Alarm
Intercausal inferences (between cause and common effects)
P(Burglary | Alarm Earthquake)
Mixed inferences
P(Alarm | JohnCalls  Earthquake)  diag + causal
P(Burglary | JohnCalls   Earthquake)  diag + intercausal
Diagnosis inferences (effect  cause)
P(Burglary | JohnCalls)
Causal inferences (cause  effect)
P(JohnCalls |Burglary), P(MaryCalls |
Burgalry)
Earthquake
JohnCalls MaryCalls
Burglary

3. Certainty factors
• The MYCIN model
• Certainty factors / Confidence coefficients (CF)
• Heuristic model of uncertain knowledge
• In MYCIN – two probabilistic functions to model
the degree of belief and the degree of disbelief in
a hypothesis
– function to measure the degree of belief - MB
– function to measure the degree of disbelief -
MD
• MB[h,e] – how much the belief in h increases
based on evidence e
• MD[h,e] - how much the disbelief in h increases
based on evidence e 25

3.1 Belief functions
• Certainty factor
26







contrar
caz
in
P(h)
max(0,1)
P(h)
P(h))
e),
|
max(P(h
1
=
P(h)
daca
1
=
e]
MB[h,







contrar
caz
in
P(h)
min(0,1)
P(h)
P(h))
e),
|
min(P(h
0
=
P(h)
daca
1
=
e]
MD[h,
CF[h,e]= MB[h,e] MD[h,e]


Belief functions - features
• Value range
• If h is sure, i.e. P(h|e) = 1, then
• If the negation of h is sure, i.e. , P(h|e) = 0 then
27
0 MB[h,e] 1
  0 MD[h,e] 1
    
1 CF[h,e] 1
MB[h,e] =
1 P(h)
1 P(h)
= 1


MD[h,e]= 0
CF[h,e]=1
MB[h,e]= 0
1
=
P(h)
0
P(h)
0
=
e]
MD[h,


CF[h,e]= 1


Example in MYCIN
• if (1) the type of the organism is gram-positive, and
• (2) the morphology of the organism is coccus, and
• (3) the growth of the organism is chain
• then there is a strong evidence (0.7) that the identity of
the organism is streptococcus
Example of facts in MYCIN :
• (identity organism-1 pseudomonas 0.8)
• (identity organism-2 e.coli 0.15)
• (morphology organism-2 coccus 1.0)
28

3.2 Combining belief functions
29
(1) Incremental gathering of evidence
• The same attribute value, h, is obtained by two separate
paths of inference, with two separate CFs : CF[h,s1] si
CF[h,s2]
• The two different paths, corresponding to hypotheses s1
and s2 may be different braches of the search tree.
• CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2]
• (identity organism-1 pseudomonas 0.8)

Combining belief functions
30
(2) Conjunction of hypothesis
• Applied for computing the CF associated to the
premises of a rule which ahs several conditions
if A = a1 and B = b1 then …
WM: (A a1 h1 cf1)(B b1 h2 cf2)
• CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])

31
(3) Combining beliefs
• An uncertain value is deduced based on a rule
which has as input conditions based on uncertain
values (may be obtained by applying other rules
for example).
• Allows the computation of the CF of the fact
deduced by the rule based on the rule’s CF and
the CF of the hypotheses
• CF[s,e] – belief in a hypothesis s based on
previous evidence e
• CF[h,s] - CF in h if s is sure
• CF’[h,s] = CF[h,s] * CF [s,e]

32
(3) Combining beliefs – cont
if A = a1 and B = b1 then C = c1 0.7
ML: (A a1 0.9) (B b1 0.6)
CF(premises) = min(0.9, 0.6) = 0.6
CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7
ML: (C c1 0.42)

3.3 Limits of CF
33
• CF of MYCIN assumes that that the hypothesis are
sustained by independent evidence
• An example shows what happens if this condition is
violated
A: The sprinkle functioned last night
U: The grass is wet in the morning
P: Last night it rained

34
R1: if the sprinkle functioned last night
then there is a strong evidence (0.9) that the grass is wet in the
morning
R2: if the grass is wet in the morning
then there is a strong evidence (0.8) that it rained last night
• CF[U,A] = 0.9
• therefore the evidence sprinkle sustains the hypothesis wet
grass with CF = 0.9
• CF[P,U] = 0.8
• therefore the evidence wet grass sustains the hypothesis rain
with CF = 0.8
• CF[P,A] = 0.8 * 0.9 = 0.72
• therefore the evidence sprinkle sustains the hypothesis rain
with CF = 0.72

Artificial Intelligence 35
Traditional Logic
• Based on predicate logic
• Three important assumptions:
– Predicate descriptions are sufficient w.r.t. to
the domain
– Information is consistent
– Knowledge base grows monotonically

Non-monotonic Logic
• Addresses the three assumptions of traditional
logic
– Knowledge is incomplete
• No knowledge about p: true or false?
• Prolog – closed world assumption
– Knowledge is inconsistent
• Based on how the world usually works
• Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
• New observation may contradict the existing knowledge, thus
the existing knowledge may need removal.
• Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
• Three modal operators are introduced

Unless Operator
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems
to keep track of the reasoning steps and preserve the
KB consistency
• Introduce Unless operator
– Support inferences based on the belief that its argument is
not true
– Consider
• p(X) unless q(X)  r(X)
If p(X) is true and not believe q(X) true then r(X)
• p(Z)
• r(W)  s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
• Either unknown or believed false
• Believed or known true
– Monotonocity

Is-consistent-with Operator M
• When reason, make sure the premises are
consistent
• Format: M p – p is consistent with KB
• Consider
– X good_student(X)  M study_hard(X) 
graduates(X)
– For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
graduate
– Not necessary to prove that X study hard.
• How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search

Default Logic
• Introduce a new format of inference rules:
– A(Z)  :B(Z)  C(Z)
– If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
• In default logic, new rules are used to infer sets of plausible
extensions
– Example:
X good_student(X)  :study_hard(X)  graduates(X)
Y party(Y)  :not(study_hard(Y))  not(graduates(X))

Fuzzy Sets
• Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
• Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive

The fuzzy set representation for “small integers.”

A fuzzy set representation for the sets short, medium,
and tall males.

Fuzzy Set Operations
• Fuzzy set operations are defined as the
operations of membership functions
• Complement: ¬A = C
– mC = 1 – mA
• Union: A  B =C
– mC = max(mA, mB)
• Intersection: A  B = C
– mC = min(mA, mB)
• Difference: A – B = C
– mC = max(0, mA-mB)

Fuzzy Inference Rules
• Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)

The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive

The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.

The fuzzificzation of the input measures
X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0
X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0

The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:

The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.

Dempster-Shafer Theory
• Probability theory limitation
– Assign a single number to measure any situation, no matter how it is
complex
– Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to constraint
the degree of belief for each individual propositions in the set
– The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
– The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]

Properties of Dempster-Shafer
• Initially, no support evidence for either competing
hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker
axioms than probability function
• Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief degrees
when they are based on independent evidence

An Example
• Two persons M and B with reliabilities detect a computer and claim
the result independently. How you believe their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
• Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0

Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
• Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317

Dempster’s Rule
• Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency may
disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
– After M and B claimed, a repair person is called to check the
computer, and both M and B witnessed this.
– Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual
elements of a set of hypotheses, but often supports
different subsets of hypotheses, in favor of some and
against others

General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive
hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief
measure to Z
• Mn(Z) – belief degree to Z, where n is the number of
sources of evidences

Discrete Markov Process
• Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a formal
language
• Probabilistic finite state machine
– A finite state machine
– Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state machine
– Ignores its input values

A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))

Observable Markov Model
• Assume p(S(t)|S(t-1)) is time invariant, that is, transition between
specific states retains the same probabilistic relationship
• State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1

S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?

Restrictiveness of Markov models
• Are past and future really independent given current state?
• E.g., suppose that when it rains, it rains for at most 2 days
S1 S2 S3 S4 …
• Second-order Markov process
• Workaround: change meaning of “state” to events of last 2 days
S1, S2 …
S2, S3 S3, S4 S4, S5
• Another approach: add more information to the state
• E.g., the full state of the world would include whether the
sky is full of water
– Additional information may not be observable
– Blowup of number of states…

Hidden Markov models (HMMs)
• Same as Markov model, except we cannot see the
state
• Instead, we only see an observation each period,
which depends on the current state
S1 S2 S3 … St …
• Still need a transition model: P(St+1 = j | St = i) = aij
• Also need an observation model: P(Ot = k | St = i) = bik
O1 O2 O3 … Ot …

Weather example extended to HMM
• Transition probabilities:
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Observation: labmate wet or dry
• bsw = .1, bcw = .3, brw = .8

HMM weather example: a question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that it is now raining outside?
• P(S2 = r | O0 = d, O1 = w, O2 = w)
• By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8

Solving the question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Computationally efficient approach: first compute
P(S1 = i, O0 = d, O1 = w) for all states i
• General case: solve for P(St, O0 = o0, O1 = o1, …, Ot
= ot) for t=1, then t=2, … This is called monitoring
• P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1
P(St-1 = st-1,
O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot =
o | S )
bsw = .1
bcw = .3
brw = .8

Predicting further out
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that two days from now it
will be raining outside?
• P(S4 = r | O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8

Predicting further out, continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w)
• Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w)
• P(S3 = r | O0 = d, O1 = w, O2 = w) =
Σs2
P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w)
Σs2
P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w)
• Etc. for S4
• So: monitoring first, then straightforward Markov process
updates
bsw = .1
bcw = .3
brw = .8

Integrating newer information
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for four days (!)
• On those days, your labmate was dry, wet, wet, dry
respectively
• What is the probability that two days ago it was
raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3
= d)
– Smoothing or hindsight problem
bsw = .1
bcw = .3
brw = .8

Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d)
• “Partial” application of Bayes’ rule:
P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) =
P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) /
P(O2 = w, O3 = d | O0 = d, O1 = w)
• So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w)
bsw = .1
bcw = .3
brw = .8

s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w)
• P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) =
P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r)
• Already know how to compute P(S1 = r | O0 = d, O1 = w)
• Just need to compute P(O2 = w, O3 = d | S1 = r)
bsw = .1
bcw = .3
brw = .8

s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Just need to compute P(O2 = w, O3 = d | S1 = r)
• P(O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2, O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2)
• First two factors directly in the model; last factor is a
“smaller” problem of the same kind
• Use dynamic programming, backwards from the future
– Similar to forwards approach from the past
bsw = .1
bcw = .3
brw = .8

References
• http://www.cs.duke.edu/courses/fall08/cp
s270/
• https://csc.csudh.edu
• https://inuresearch.tripod.com › ai

Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC (20)

Recently uploaded

Recently uploaded (20)

Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC