Chapter 9

CSC411 Artificial Intelligence 1
Chapter 9
Reasoning in Uncertain Situations
Contents
Uncertain situations
Non-monotonic logic and reasoning
Certainty Factor algebra
Fuzzy logic and reasoning
Dempster-Shafer theory of evidence
Bayesian belief network
Markov models

Traditional Logic
Based on predicate logic
Three important assumptions:
– Predicate descriptions are sufficient
w.r.t. to the domain
– Information is consistent
– Knowledge base grows monotonically

Non-monotonic Logic
Addresses the three assumptions of
traditional logic
– Knowledge is incomplete
No knowledge about p: true or false?
Prolog – closed world assumption
– Knowledge is inconsistent
Based on how the world usually works
Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
New observation may contradict the existing
knowledge, thus the existing knowledge may need
removal.
Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
Three modal operators are introduced

Unless Operator
New information may invalidate previous results
Implemented in TMS – Truth Maintenance
Systems to keep track of the reasoning steps and
preserve the KB consistency
Introduce Unless operator
– Support inferences based on the belief that its argument
is not true
– Consider
p(X) unless q(X)  r(X)
If p(X) is true and not believe q(X) true then r(X)
p(Z)
r(W)  s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
Either unknown or believed false
Believed or known true
– Monotonocity

Is-consistent-with Operator M
When reason, make sure the premises are
consistent
Format: M p – p is consistent with KB
Consider
– X good_student(X)  M study_hard(X) 
graduates(X)
– For all X who is a good student, if the fact that
X studies hard is consistent with KB, then X
will graduate
– Not necessary to prove that X study hard.
How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search

Default Logic
Introduce a new format of inference rules:
– A(Z)  :B(Z)  C(Z)
– If A(Z) is provable, and it is consistent with
what we know to assume B(Z), then conclude
C(Z)
Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
In default logic, new rules are used to infer sets of
plausible extensions
– Example:
X good_student(X)  :study_hard(X) 
graduates(X)
Y party(Y)  :not(study_hard(Y)) 
not(graduates(X))

CF Combination
Premises combination
– CF( P and Q) = min(CF(P), CF(Q))
– CF( P or Q) = max(CF(P), CF(Q))
Rule CF: each rule has a confidence measure
CF propagation
– Rule R: P  Q with CF=CF(R)
– CF(Q) = CF(P)CF(R)
Rule combination
– Rules R1: P1  Q: CF1(Q) = CF(P1)xCF(R1)
– R2: P2  Q: CF2(Q) = CF(P2)xCF(R2)
– CF(Q) =
CF1+CF2 – (CF1xCF2) if both positive
CF1+CF2 + (CF1xCF2) if both negative
(CF1+CF2)/(1-min(|CF1|,|CF2|)) otherwise

Fuzzy Sets
Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or
believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive

The fuzzy set representation for “small integers.”

A fuzzy set representation for the sets short, medium,
and tall males.

Fuzzy Set Operations
Fuzzy set operations are defined as the
operations of membership functions
Complement: ¬A = C
– mC = 1 – mA
Union: A  B =C
– mC = max(mA, mB)
Intersection: A  B = C
– mC = min(mA, mB)
Difference: A – B = C
– mC = max(0, mA-mB)

Fuzzy Inference Rules
Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)

The inverted pendulum and the angle θ and dθ/dt
input values.

The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive

The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.

The fuzzificzation of the input measures
X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0
X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0

The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:

The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.

Dempster-Shafer Theory
Probability theory limitation
– Assign a single number to measure any situation, no matter how
it is complex
– Cannot deal with missing evidence, heuristics, and limited
knowledge
Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to
constraint the degree of belief for each individual propositions in
the set
– The belief measure bel is in [0,1]
0 – no support evidence for a set of propositions
1 – full support evidence for a set of propositions
– The plausibility of p,
pl(p) = 1 – bel(not(p))
Reflect how evidence of not(p) relates to the possibility for belief in p
Bel(not(p))=1: full support for not(p), no possibility for p
Bel(not(p))=0: no support for not(p), full possibility for p
Range is also in [0,1]

Properties of Dempster-Shafer
Initially, no support evidence for either
competing hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
Dempster-Shafer belief functions satisfy
weaker axioms than probability function
Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief
degrees when they are based on independent
evidence

An Example
Two persons M and B with reliabilities detect a computer and
claim the result independently. How you believe their claims?
Question (Q): detection claim
Related question (RQ): detectors’ reliability
Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
– Belief measure for B claim h2 is [0.8,1]

Combining Belief Measure
Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
Reliability M and B: 09x0.8=0.72
Unreliability M and B: 0.1x0.2=0.02
The probability that at least one of two is reliable: 1-0.02=0.98
Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
Cannot be both correct and reliable
At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317
– Belief measure: [0.286, 0.317]

Dempster’s Rule
Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency
may disappear
Two steps
1. Sort the uncertainties into a priori independent pieces of
evidence
2. Carry out Dempster rule
Consider the previous example
– After M and B claimed, a repair person is called to
check the computer, and both M and B witnessed this.
– Three independent items of evidence must be
combined
Not all evidence is directly supportive of
individual elements of a set of hypotheses, but
often supports different subsets of hypotheses,
in favor of some and against others

General Dempster’s Rule
Q – an exhaustive set of mutually exclusive
hypotheses
Z – a subset of Q
M – probability density function to assign a
belief measure to Z
Mn(Z) – belief degree to Z, where n is the
number of sources of evidences

Bayesian Belief Network
A computational model for reasoning to the best
explanation of a data set in the uncertainty
context
Motivation
– Reduce the number of parameters of the full Bayesian
model
– Show how the data can partition and focus reasoning
– Avoid use of a large joint probability table to compute
probabilities for all possible events combination
Assumption
– Events are either conditionally independent or their
correlations are so small that they can be ignored
Directed Graphical Model
– The events and (cause-effect) relationships form a
directed graph, where events are vertices and
relationships are links

The Bayesian representation of the traffic problem with potential
explanations.
The joint probability distribution for the traffic and construction
variables
The Traffic Problem
Given bad traffic, what is the probability of road construction?
p(C|T)=p(C=t, T=t)/(p(C=t, T=t)+p(C=f, T=t))=.3/(.3+.1)=.75

An Example
Traffic problem
– Events:
Road construction C
Accident A
Orange barrels B
Bad traffic T
Flashing lights L
– Joint probability
P(C,A,B,T,L)=p(C)*p(A|C)*p(B|C,A)*p(T|C,A,B)*p(L|C,A,B,T)
Number of parameters: 2^5=32
– Reduction
Assumption: Parameters are only dependent on parents
Calculation of joint probability
– P(C,A,B,T,L)=p(C)*p(A)*p(B|C)*p(T|C,A)*p(L|A)
– Number of parameters: 2+2+4+8+4=20

BBN Definition
Links represent conditional probabilities for causal influence
These influences are directed: presence of some event
causes other events
These influences are not circular
Thus a BBN is a DAG: Directed Acyclic Graph

Discrete Markov Process
Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a
formal language
Probabilistic finite state machine
– A finite state machine
– Transition function represented by a
probability distribution on the current state
Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state
machine
– Ignores its input values

A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))

Observable Markov Model
Assume p(S(t)|S(t-1)) is time invariant, that is, transition
between specific states retains the same probabilistic
relationship
State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1

S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?

Chapter 9

Recommended

Recommended

More Related Content

Similar to Chapter 9

Similar to Chapter 9 (20)

Recently uploaded

Recently uploaded (20)

Chapter 9