SlideShare a Scribd company logo
1 of 71
19CS308T: Artificial Intelligence
UNIT-III
UNCERTAINITY AND STATISTICAL
REASONING
Faculty:Mr.K.Sundar
Syllabus
• Probability and Axioms-Bayes Rule-
Bayesian Networks-Inferences-Temporal
Models- Hidden Markov models-Fuzzy
reasoning-Certainty factors-Bayesian
Theory-Bayesian Network-Dempster
Shafer theory.
• Case study on each algorithm
• Probability theory
• Bayesian networks
• Certainty factors
3
1. Probability theory
1.1 Uncertain knowledge
p symptom(p, Toothache)  disease(p,cavity)
p sympt(p,Toothache) 
disease(p,cavity)  disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory  degree of belief or
plausibility of a statement – a numerical
measure in [0,1]
• Degree of truth – fuzzy logic  degree of belief
4
1.2 Definitions
• Unconditional or prior probability of A – the degree of
belief in A in the absence of any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent
has obtained some evidence B for A - P(A|B)
• P(Cavity | Toothache) = 0.8 5
Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event
(random variable) A – a function P:S  R
satisfying the axioms:
• 0  P(A)  1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A  B) = P(A) + P(B) - P(A  B)
P(A  ~A) = P(A)+P(~A) –P(false) = P(true)
P(~A) = 1 – P(A)
6
Definitions - cont
A and B mutually exclusive  P(A  B) = P(A) +
P(B)
P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + …
+ P(en)
The probability of a proposition a is equal to the
sum of the probabilities of the atomic events in
which a holds
e(a) – the set of atomic events in which a holds
7
1.3 Product rule
Conditional probabilities can be defined in terms of
unconditional probabilities
The condition probability of the occurrence of
A if event B occurs
– P(A|B) = P(A  B) / P(B)
This can be written also as:
– P(A  B) = P(A|B) * P(B)
For probability distributions
– P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1)
– P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2)
….
– P(X,Y) = P(X|Y)*P(Y) 8
1.4 Bayes’ rule and its use
P(A  B) = P(A|B) *P(B)
P(A  B) = P(B|A) *P(A)
Bays’ rule (theorem)
• P(B|A) = P(A | B) * P(B) / P(A)
• P(B|A) = P(A | B) * P(B) / P(A)
Bayes Theorem
hi – hypotheses (i=1,k);
e1,…,en - evidence
P(hi)
P(hi | e1,…,en)
P(e1,…,en| hi)
10
P(h |e ,e ,...,e ) =
P(e ,e ,...,e |h ) P(h )
P(e ,e ,...,e |h ) P(h )
, i = 1,k
i 1 2 n
1 2 n i i
1 2 n j j
j 1
k




Bayes’ Theorem - cont
If e1,…,en are independent hypotheses
then
PROSPECTOR
11
k
1,
=
j
),
h
|
P(e
...
)
h
|
P(e
)
h
|
P(e
=
)
h
|
e
,...,
e
,
P(e j
n
j
2
j
1
j
n
2
1 


1.5 Inferences
Probability distribution P(Cavity, Tooth)
Tooth  Tooth
Cavity 0.04 0.06
 Cavity 0.01 0.89
P(Cavity) = 0.04 + 0.06 = 0.1
P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 /
0.05
12
Inferences
Probability distributions P(Cavity, Tooth, Catch)
P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2
P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016
+ 0.064 = 0.28
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) =
[P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * /
P(Tooth)
13
Tooth ~ Tooth
Catch ~ Catch Catch ~ Catch
Cavity 0.108 0.012 0.072 0.008
~ Cavity 0.016 0.064 0.144 0.576
2 Bayesian networks
• Represent dependencies among random
variables
• Give a short specification of conditional
probability distribution
• Many random variables are conditionally
independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random 14
2.1 Definition of Bayesian
networks
A BN is a DAG in which each node is annotated
with quantitative probability information, namely:
• Nodes represent random variables (discrete or
continuous)
• Directed links XY: X has a direct influence on
Y, X is said to be a parent of Y
• each node X has an associated conditional
probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch
• Weather, Cavity  Toothache, Cavity  Catch
15
Bayesian network - example
16
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
B E P(A | B, E)
T F
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.0010.999
Conditional probability
table
2.2 Bayesian network semantics
A) Represent a probability distribution
B) Specify conditional independence – build the
network
A) each value of the probability distribution can be
computed as:
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
17
2.3 Building the network
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if
Parents(Xi)  { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in
an order consistent with a DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
18
Building the network - cont
• Pick a set of random variables that describe the problem
• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi)  a minimal set of nodes that
already exists in the network such that the conditional
independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes 
DAG
• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
19
Compactness of node ordering
• Far more compact than a probability distribution
• Example of locally structured system (or
sparse): each component interacts directly only
with a limited number of other components
• Associated usually with a linear growth in
complexity rather than with an exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add
the “root causes” first, then the variables they
influence, and so on, until we reach the leaves
20
2.4 Probabilistic inferences
21
P(A  V  B) = P(A) * P(V|A) * P(B|V)
V
A
B
B
V
A
A V B
P(A  V  B) = P(V) * P(A|V) * P(B|V)
P(A  V  B) = P(A) * P(B) * P(V|A,B)
Probabilistic inferences
22
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
P(J  M  A B E ) =
P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)=
0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062
Probabilistic inferences
23
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B)
= P(A|B,E) *P(E) + P(A| B,E)*P(E)
= 0.95 * 0.002 + 0.94 * 0.998 = 0.94002
2.5 Different types of inferences
24
Alarm
Intercausal inferences (between cause and common effects)
P(Burglary | Alarm Earthquake)
Mixed inferences
P(Alarm | JohnCalls  Earthquake)  diag + causal
P(Burglary | JohnCalls   Earthquake)  diag + intercausal
Diagnosis inferences (effect  cause)
P(Burglary | JohnCalls)
Causal inferences (cause  effect)
P(JohnCalls |Burglary), P(MaryCalls |
Burgalry)
Earthquake
JohnCalls MaryCalls
Burglary
3. Certainty factors
• The MYCIN model
• Certainty factors / Confidence coefficients (CF)
• Heuristic model of uncertain knowledge
• In MYCIN – two probabilistic functions to model
the degree of belief and the degree of disbelief in
a hypothesis
– function to measure the degree of belief - MB
– function to measure the degree of disbelief -
MD
• MB[h,e] – how much the belief in h increases
based on evidence e
• MD[h,e] - how much the disbelief in h increases
based on evidence e 25
3.1 Belief functions
• Certainty factor
26







contrar
caz
in
P(h)
max(0,1)
P(h)
P(h))
e),
|
max(P(h
1
=
P(h)
daca
1
=
e]
MB[h,







contrar
caz
in
P(h)
min(0,1)
P(h)
P(h))
e),
|
min(P(h
0
=
P(h)
daca
1
=
e]
MD[h,
CF[h,e]= MB[h,e] MD[h,e]

Belief functions - features
• Value range
• If h is sure, i.e. P(h|e) = 1, then
• If the negation of h is sure, i.e. , P(h|e) = 0 then
27
0 MB[h,e] 1
  0 MD[h,e] 1
    
1 CF[h,e] 1
MB[h,e] =
1 P(h)
1 P(h)
= 1


MD[h,e]= 0
CF[h,e]=1
MB[h,e]= 0
1
=
P(h)
0
P(h)
0
=
e]
MD[h,


CF[h,e]= 1

Example in MYCIN
• if (1) the type of the organism is gram-positive, and
• (2) the morphology of the organism is coccus, and
• (3) the growth of the organism is chain
• then there is a strong evidence (0.7) that the identity of
the organism is streptococcus
Example of facts in MYCIN :
• (identity organism-1 pseudomonas 0.8)
• (identity organism-2 e.coli 0.15)
• (morphology organism-2 coccus 1.0)
28
3.2 Combining belief functions
29
(1) Incremental gathering of evidence
• The same attribute value, h, is obtained by two separate
paths of inference, with two separate CFs : CF[h,s1] si
CF[h,s2]
• The two different paths, corresponding to hypotheses s1
and s2 may be different braches of the search tree.
• CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2]
• (identity organism-1 pseudomonas 0.8)
Combining belief functions
30
(2) Conjunction of hypothesis
• Applied for computing the CF associated to the
premises of a rule which ahs several conditions
if A = a1 and B = b1 then …
WM: (A a1 h1 cf1)(B b1 h2 cf2)
• CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
Combining belief functions
31
(3) Combining beliefs
• An uncertain value is deduced based on a rule
which has as input conditions based on uncertain
values (may be obtained by applying other rules
for example).
• Allows the computation of the CF of the fact
deduced by the rule based on the rule’s CF and
the CF of the hypotheses
• CF[s,e] – belief in a hypothesis s based on
previous evidence e
• CF[h,s] - CF in h if s is sure
• CF’[h,s] = CF[h,s] * CF [s,e]
Combining belief functions
32
(3) Combining beliefs – cont
if A = a1 and B = b1 then C = c1 0.7
ML: (A a1 0.9) (B b1 0.6)
CF(premises) = min(0.9, 0.6) = 0.6
CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7
ML: (C c1 0.42)
3.3 Limits of CF
33
• CF of MYCIN assumes that that the hypothesis are
sustained by independent evidence
• An example shows what happens if this condition is
violated
A: The sprinkle functioned last night
U: The grass is wet in the morning
P: Last night it rained
34
R1: if the sprinkle functioned last night
then there is a strong evidence (0.9) that the grass is wet in the
morning
R2: if the grass is wet in the morning
then there is a strong evidence (0.8) that it rained last night
• CF[U,A] = 0.9
• therefore the evidence sprinkle sustains the hypothesis wet
grass with CF = 0.9
• CF[P,U] = 0.8
• therefore the evidence wet grass sustains the hypothesis rain
with CF = 0.8
• CF[P,A] = 0.8 * 0.9 = 0.72
• therefore the evidence sprinkle sustains the hypothesis rain
with CF = 0.72
Artificial Intelligence 35
Traditional Logic
• Based on predicate logic
• Three important assumptions:
– Predicate descriptions are sufficient w.r.t. to
the domain
– Information is consistent
– Knowledge base grows monotonically
Artificial Intelligence 36
Non-monotonic Logic
• Addresses the three assumptions of traditional
logic
– Knowledge is incomplete
• No knowledge about p: true or false?
• Prolog – closed world assumption
– Knowledge is inconsistent
• Based on how the world usually works
• Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
• New observation may contradict the existing knowledge, thus
the existing knowledge may need removal.
• Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
• Three modal operators are introduced
Artificial Intelligence 37
Unless Operator
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems
to keep track of the reasoning steps and preserve the
KB consistency
• Introduce Unless operator
– Support inferences based on the belief that its argument is
not true
– Consider
• p(X) unless q(X)  r(X)
If p(X) is true and not believe q(X) true then r(X)
• p(Z)
• r(W)  s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
• Either unknown or believed false
• Believed or known true
– Monotonocity
Artificial Intelligence 38
Is-consistent-with Operator M
• When reason, make sure the premises are
consistent
• Format: M p – p is consistent with KB
• Consider
– X good_student(X)  M study_hard(X) 
graduates(X)
– For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
graduate
– Not necessary to prove that X study hard.
• How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search
Artificial Intelligence 39
Default Logic
• Introduce a new format of inference rules:
– A(Z)  :B(Z)  C(Z)
– If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
• In default logic, new rules are used to infer sets of plausible
extensions
– Example:
X good_student(X)  :study_hard(X)  graduates(X)
Y party(Y)  :not(study_hard(Y))  not(graduates(X))
Artificial Intelligence 40
Fuzzy Sets
• Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
• Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive
Artificial Intelligence 41
The fuzzy set representation for “small integers.”
Artificial Intelligence 42
A fuzzy set representation for the sets short, medium,
and tall males.
Artificial Intelligence 43
Fuzzy Set Operations
• Fuzzy set operations are defined as the
operations of membership functions
• Complement: ¬A = C
– mC = 1 – mA
• Union: A  B =C
– mC = max(mA, mB)
• Intersection: A  B = C
– mC = min(mA, mB)
• Difference: A – B = C
– mC = max(0, mA-mB)
Artificial Intelligence 44
Fuzzy Inference Rules
• Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)
Artificial Intelligence 45
The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive
Artificial Intelligence 46
The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.
Artificial Intelligence 47
The fuzzificzation of the input measures
X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0
X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0
Artificial Intelligence 48
The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:
Artificial Intelligence 49
The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.
Artificial Intelligence 50
Dempster-Shafer Theory
• Probability theory limitation
– Assign a single number to measure any situation, no matter how it is
complex
– Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to constraint
the degree of belief for each individual propositions in the set
– The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
– The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]
Artificial Intelligence 51
Properties of Dempster-Shafer
• Initially, no support evidence for either competing
hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker
axioms than probability function
• Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief degrees
when they are based on independent evidence
Artificial Intelligence 52
An Example
• Two persons M and B with reliabilities detect a computer and claim
the result independently. How you believe their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
• Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
Artificial Intelligence 53
Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
• Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317
Artificial Intelligence 54
Dempster’s Rule
• Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency may
disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
– After M and B claimed, a repair person is called to check the
computer, and both M and B witnessed this.
– Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual
elements of a set of hypotheses, but often supports
different subsets of hypotheses, in favor of some and
against others
Artificial Intelligence 55
General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive
hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief
measure to Z
• Mn(Z) – belief degree to Z, where n is the number of
sources of evidences
Artificial Intelligence 56
Discrete Markov Process
• Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a formal
language
• Probabilistic finite state machine
– A finite state machine
– Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state machine
– Ignores its input values
Artificial Intelligence 57
A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))
Artificial Intelligence 58
Observable Markov Model
• Assume p(S(t)|S(t-1)) is time invariant, that is, transition between
specific states retains the same probabilistic relationship
• State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1
Artificial Intelligence 59
S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?
Restrictiveness of Markov models
• Are past and future really independent given current state?
• E.g., suppose that when it rains, it rains for at most 2 days
S1 S2 S3 S4 …
• Second-order Markov process
• Workaround: change meaning of “state” to events of last 2 days
S1, S2 …
S2, S3 S3, S4 S4, S5
• Another approach: add more information to the state
• E.g., the full state of the world would include whether the
sky is full of water
– Additional information may not be observable
– Blowup of number of states…
Hidden Markov models (HMMs)
• Same as Markov model, except we cannot see the
state
• Instead, we only see an observation each period,
which depends on the current state
S1 S2 S3 … St …
• Still need a transition model: P(St+1 = j | St = i) = aij
• Also need an observation model: P(Ot = k | St = i) = bik
O1 O2 O3 … Ot …
Weather example extended to HMM
• Transition probabilities:
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Observation: labmate wet or dry
• bsw = .1, bcw = .3, brw = .8
HMM weather example: a question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that it is now raining outside?
• P(S2 = r | O0 = d, O1 = w, O2 = w)
• By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
Solving the question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Computationally efficient approach: first compute
P(S1 = i, O0 = d, O1 = w) for all states i
• General case: solve for P(St, O0 = o0, O1 = o1, …, Ot
= ot) for t=1, then t=2, … This is called monitoring
• P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1
P(St-1 = st-1,
O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot =
o | S )
bsw = .1
bcw = .3
brw = .8
Predicting further out
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that two days from now it
will be raining outside?
• P(S4 = r | O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
Predicting further out, continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w)
• Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w)
• P(S3 = r | O0 = d, O1 = w, O2 = w) =
Σs2
P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w)
Σs2
P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w)
• Etc. for S4
• So: monitoring first, then straightforward Markov process
updates
bsw = .1
bcw = .3
brw = .8
Integrating newer information
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for four days (!)
• On those days, your labmate was dry, wet, wet, dry
respectively
• What is the probability that two days ago it was
raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3
= d)
– Smoothing or hindsight problem
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d)
• “Partial” application of Bayes’ rule:
P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) =
P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) /
P(O2 = w, O3 = d | O0 = d, O1 = w)
• So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w)
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w)
• P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) =
P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r)
• Already know how to compute P(S1 = r | O0 = d, O1 = w)
• Just need to compute P(O2 = w, O3 = d | S1 = r)
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Just need to compute P(O2 = w, O3 = d | S1 = r)
• P(O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2, O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2)
• First two factors directly in the model; last factor is a
“smaller” problem of the same kind
• Use dynamic programming, backwards from the future
– Similar to forwards approach from the past
bsw = .1
bcw = .3
brw = .8
References
• http://www.cs.duke.edu/courses/fall08/cp
s270/
• https://csc.csudh.edu
• https://inuresearch.tripod.com › ai

More Related Content

What's hot

Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
JayaramB11
 
Production System in AI
Production System in AIProduction System in AI
Production System in AI
Bharat Bhushan
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 

What's hot (20)

lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
Kevin Knight, Elaine Rich, B. Nair - Artificial Intelligence (2010, Tata McGr...
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
Production System in AI
Production System in AIProduction System in AI
Production System in AI
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Computational intelligence an introduction
Computational intelligence an introductionComputational intelligence an introduction
Computational intelligence an introduction
 
Unit 2 ai
Unit 2 aiUnit 2 ai
Unit 2 ai
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network Analysis
 

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
Ranjan Kumar
 

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC (20)

AIML unit-2(1).ppt
AIML unit-2(1).pptAIML unit-2(1).ppt
AIML unit-2(1).ppt
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Uncertainity
Uncertainity Uncertainity
Uncertainity
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
Uncertainty
UncertaintyUncertainty
Uncertainty
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Deep learning .pdf
Deep learning .pdfDeep learning .pdf
Deep learning .pdf
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledgProbability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledg
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 

Recently uploaded

Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
hublikarsn
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 

Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

  • 1. 19CS308T: Artificial Intelligence UNIT-III UNCERTAINITY AND STATISTICAL REASONING Faculty:Mr.K.Sundar
  • 2. Syllabus • Probability and Axioms-Bayes Rule- Bayesian Networks-Inferences-Temporal Models- Hidden Markov models-Fuzzy reasoning-Certainty factors-Bayesian Theory-Bayesian Network-Dempster Shafer theory. • Case study on each algorithm
  • 3. • Probability theory • Bayesian networks • Certainty factors 3
  • 4. 1. Probability theory 1.1 Uncertain knowledge p symptom(p, Toothache)  disease(p,cavity) p sympt(p,Toothache)  disease(p,cavity)  disease(p,gum_disease) … • PL - laziness - theoretical ignorance - practical ignorance • Probability theory  degree of belief or plausibility of a statement – a numerical measure in [0,1] • Degree of truth – fuzzy logic  degree of belief 4
  • 5. 1.2 Definitions • Unconditional or prior probability of A – the degree of belief in A in the absence of any other information – P(A) • A – random variable • Probability distribution – P(A), P(A,B) Example P(Weather = Sunny) = 0.1 P(Weather = Rain) = 0.7 P(Weather = Snow) = 0.2 Weather – random variable • P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution • Conditional probability – posterior – once the agent has obtained some evidence B for A - P(A|B) • P(Cavity | Toothache) = 0.8 5
  • 6. Definitions - cont • Axioms of probability • The measure of the occurrence of an event (random variable) A – a function P:S  R satisfying the axioms: • 0  P(A)  1 • P(S) = 1 ( or P(true) = 1 and P(false) = 0) • P(A  B) = P(A) + P(B) - P(A  B) P(A  ~A) = P(A)+P(~A) –P(false) = P(true) P(~A) = 1 – P(A) 6
  • 7. Definitions - cont A and B mutually exclusive  P(A  B) = P(A) + P(B) P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + … + P(en) The probability of a proposition a is equal to the sum of the probabilities of the atomic events in which a holds e(a) – the set of atomic events in which a holds 7
  • 8. 1.3 Product rule Conditional probabilities can be defined in terms of unconditional probabilities The condition probability of the occurrence of A if event B occurs – P(A|B) = P(A  B) / P(B) This can be written also as: – P(A  B) = P(A|B) * P(B) For probability distributions – P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1) – P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2) …. – P(X,Y) = P(X|Y)*P(Y) 8
  • 9. 1.4 Bayes’ rule and its use P(A  B) = P(A|B) *P(B) P(A  B) = P(B|A) *P(A) Bays’ rule (theorem) • P(B|A) = P(A | B) * P(B) / P(A) • P(B|A) = P(A | B) * P(B) / P(A)
  • 10. Bayes Theorem hi – hypotheses (i=1,k); e1,…,en - evidence P(hi) P(hi | e1,…,en) P(e1,…,en| hi) 10 P(h |e ,e ,...,e ) = P(e ,e ,...,e |h ) P(h ) P(e ,e ,...,e |h ) P(h ) , i = 1,k i 1 2 n 1 2 n i i 1 2 n j j j 1 k    
  • 11. Bayes’ Theorem - cont If e1,…,en are independent hypotheses then PROSPECTOR 11 k 1, = j ), h | P(e ... ) h | P(e ) h | P(e = ) h | e ,..., e , P(e j n j 2 j 1 j n 2 1   
  • 12. 1.5 Inferences Probability distribution P(Cavity, Tooth) Tooth  Tooth Cavity 0.04 0.06  Cavity 0.01 0.89 P(Cavity) = 0.04 + 0.06 = 0.1 P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 / 0.05 12
  • 13. Inferences Probability distributions P(Cavity, Tooth, Catch) P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2 P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = [P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * / P(Tooth) 13 Tooth ~ Tooth Catch ~ Catch Catch ~ Catch Cavity 0.108 0.012 0.072 0.008 ~ Cavity 0.016 0.064 0.144 0.576
  • 14. 2 Bayesian networks • Represent dependencies among random variables • Give a short specification of conditional probability distribution • Many random variables are conditionally independent • Simplifies computations • Graphical representation • DAG – causal relationships among random 14
  • 15. 2.1 Definition of Bayesian networks A BN is a DAG in which each node is annotated with quantitative probability information, namely: • Nodes represent random variables (discrete or continuous) • Directed links XY: X has a direct influence on Y, X is said to be a parent of Y • each node X has an associated conditional probability table, P(Xi | Parents(Xi)) that quantify the effects of the parents on the node Example: Weather, Cavity, Toothache, Catch • Weather, Cavity  Toothache, Cavity  Catch 15
  • 16. Bayesian network - example 16 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 B E P(A | B, E) T F T T 0.95 0.05 T F 0.94 0.06 F T 0.29 0.71 F F 0.0010.999 Conditional probability table
  • 17. 2.2 Bayesian network semantics A) Represent a probability distribution B) Specify conditional independence – build the network A) each value of the probability distribution can be computed as: P(X1=x1  … Xn=xn) = P(x1,…, xn) = i=1,n P(xi | Parents(xi)) 17
  • 18. 2.3 Building the network P(X1=x1  … Xn=xn) = P(x1,…, xn) = P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … = P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) = i=1,n P(xi | xi-1,…, x1) • We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if Parents(Xi)  { Xi-1,…, X1} • The condition may be satisfied by labeling the nodes in an order consistent with a DAG • Intuitively, the parents of a node Xi must be all the nodes Xi-1,…, X1 which have a direct influence on Xi. 18
  • 19. Building the network - cont • Pick a set of random variables that describe the problem • Pick an ordering of those variables • while there are still variables repeat (a) choose a variable Xi and add a node associated to Xi (b) assign Parents(Xi)  a minimal set of nodes that already exists in the network such that the conditional independence property is satisfied (c) define the conditional probability table for Xi • Because each node is linked only to previous nodes  DAG • P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) = P(MaryCalls | Alarm) 19
  • 20. Compactness of node ordering • Far more compact than a probability distribution • Example of locally structured system (or sparse): each component interacts directly only with a limited number of other components • Associated usually with a linear growth in complexity rather than with an exponential one • The order of adding the nodes is important • The correct order in which to add nodes is to add the “root causes” first, then the variables they influence, and so on, until we reach the leaves 20
  • 21. 2.4 Probabilistic inferences 21 P(A  V  B) = P(A) * P(V|A) * P(B|V) V A B B V A A V B P(A  V  B) = P(V) * P(A|V) * P(B|V) P(A  V  B) = P(A) * P(B) * P(V|A,B)
  • 22. Probabilistic inferences 22 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(J  M  A B E ) = P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)= 0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062
  • 23. Probabilistic inferences 23 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B) = P(A|B,E) *P(E) + P(A| B,E)*P(E) = 0.95 * 0.002 + 0.94 * 0.998 = 0.94002
  • 24. 2.5 Different types of inferences 24 Alarm Intercausal inferences (between cause and common effects) P(Burglary | Alarm Earthquake) Mixed inferences P(Alarm | JohnCalls  Earthquake)  diag + causal P(Burglary | JohnCalls   Earthquake)  diag + intercausal Diagnosis inferences (effect  cause) P(Burglary | JohnCalls) Causal inferences (cause  effect) P(JohnCalls |Burglary), P(MaryCalls | Burgalry) Earthquake JohnCalls MaryCalls Burglary
  • 25. 3. Certainty factors • The MYCIN model • Certainty factors / Confidence coefficients (CF) • Heuristic model of uncertain knowledge • In MYCIN – two probabilistic functions to model the degree of belief and the degree of disbelief in a hypothesis – function to measure the degree of belief - MB – function to measure the degree of disbelief - MD • MB[h,e] – how much the belief in h increases based on evidence e • MD[h,e] - how much the disbelief in h increases based on evidence e 25
  • 26. 3.1 Belief functions • Certainty factor 26        contrar caz in P(h) max(0,1) P(h) P(h)) e), | max(P(h 1 = P(h) daca 1 = e] MB[h,        contrar caz in P(h) min(0,1) P(h) P(h)) e), | min(P(h 0 = P(h) daca 1 = e] MD[h, CF[h,e]= MB[h,e] MD[h,e] 
  • 27. Belief functions - features • Value range • If h is sure, i.e. P(h|e) = 1, then • If the negation of h is sure, i.e. , P(h|e) = 0 then 27 0 MB[h,e] 1   0 MD[h,e] 1      1 CF[h,e] 1 MB[h,e] = 1 P(h) 1 P(h) = 1   MD[h,e]= 0 CF[h,e]=1 MB[h,e]= 0 1 = P(h) 0 P(h) 0 = e] MD[h,   CF[h,e]= 1 
  • 28. Example in MYCIN • if (1) the type of the organism is gram-positive, and • (2) the morphology of the organism is coccus, and • (3) the growth of the organism is chain • then there is a strong evidence (0.7) that the identity of the organism is streptococcus Example of facts in MYCIN : • (identity organism-1 pseudomonas 0.8) • (identity organism-2 e.coli 0.15) • (morphology organism-2 coccus 1.0) 28
  • 29. 3.2 Combining belief functions 29 (1) Incremental gathering of evidence • The same attribute value, h, is obtained by two separate paths of inference, with two separate CFs : CF[h,s1] si CF[h,s2] • The two different paths, corresponding to hypotheses s1 and s2 may be different braches of the search tree. • CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2] • (identity organism-1 pseudomonas 0.8)
  • 30. Combining belief functions 30 (2) Conjunction of hypothesis • Applied for computing the CF associated to the premises of a rule which ahs several conditions if A = a1 and B = b1 then … WM: (A a1 h1 cf1)(B b1 h2 cf2) • CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
  • 31. Combining belief functions 31 (3) Combining beliefs • An uncertain value is deduced based on a rule which has as input conditions based on uncertain values (may be obtained by applying other rules for example). • Allows the computation of the CF of the fact deduced by the rule based on the rule’s CF and the CF of the hypotheses • CF[s,e] – belief in a hypothesis s based on previous evidence e • CF[h,s] - CF in h if s is sure • CF’[h,s] = CF[h,s] * CF [s,e]
  • 32. Combining belief functions 32 (3) Combining beliefs – cont if A = a1 and B = b1 then C = c1 0.7 ML: (A a1 0.9) (B b1 0.6) CF(premises) = min(0.9, 0.6) = 0.6 CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7 ML: (C c1 0.42)
  • 33. 3.3 Limits of CF 33 • CF of MYCIN assumes that that the hypothesis are sustained by independent evidence • An example shows what happens if this condition is violated A: The sprinkle functioned last night U: The grass is wet in the morning P: Last night it rained
  • 34. 34 R1: if the sprinkle functioned last night then there is a strong evidence (0.9) that the grass is wet in the morning R2: if the grass is wet in the morning then there is a strong evidence (0.8) that it rained last night • CF[U,A] = 0.9 • therefore the evidence sprinkle sustains the hypothesis wet grass with CF = 0.9 • CF[P,U] = 0.8 • therefore the evidence wet grass sustains the hypothesis rain with CF = 0.8 • CF[P,A] = 0.8 * 0.9 = 0.72 • therefore the evidence sprinkle sustains the hypothesis rain with CF = 0.72
  • 35. Artificial Intelligence 35 Traditional Logic • Based on predicate logic • Three important assumptions: – Predicate descriptions are sufficient w.r.t. to the domain – Information is consistent – Knowledge base grows monotonically
  • 36. Artificial Intelligence 36 Non-monotonic Logic • Addresses the three assumptions of traditional logic – Knowledge is incomplete • No knowledge about p: true or false? • Prolog – closed world assumption – Knowledge is inconsistent • Based on how the world usually works • Most birds fly, but Ostrich doesn’t – Knowledge base grows non-monotonically • New observation may contradict the existing knowledge, thus the existing knowledge may need removal. • Inference based on assumptions, how come if the assumptions are later shown to be incorrect • Three modal operators are introduced
  • 37. Artificial Intelligence 37 Unless Operator • New information may invalidate previous results • Implemented in TMS – Truth Maintenance Systems to keep track of the reasoning steps and preserve the KB consistency • Introduce Unless operator – Support inferences based on the belief that its argument is not true – Consider • p(X) unless q(X)  r(X) If p(X) is true and not believe q(X) true then r(X) • p(Z) • r(W)  s(W) From above, conclude s(X). Later, change believe or find q(X) true, what happens? Retract r(X) and s(X) – Unless deals with believe, not truth • Either unknown or believed false • Believed or known true – Monotonocity
  • 38. Artificial Intelligence 38 Is-consistent-with Operator M • When reason, make sure the premises are consistent • Format: M p – p is consistent with KB • Consider – X good_student(X)  M study_hard(X)  graduates(X) – For all X who is a good student, if the fact that X studies hard is consistent with KB, then X will graduate – Not necessary to prove that X study hard. • How to decide p is consistent with KB – Negation as failure – Heuristic-based and limited search
  • 39. Artificial Intelligence 39 Default Logic • Introduce a new format of inference rules: – A(Z)  :B(Z)  C(Z) – If A(Z) is provable, and it is consistent with what we know to assume B(Z), then conclude C(Z) • Compare with is-consistent-with operator – Similar – Difference is the reasoning method • In default logic, new rules are used to infer sets of plausible extensions – Example: X good_student(X)  :study_hard(X)  graduates(X) Y party(Y)  :not(study_hard(Y))  not(graduates(X))
  • 40. Artificial Intelligence 40 Fuzzy Sets • Classic sets – Completeness: x in either A or ¬A – Exclusive: can not be in both A and ¬A • Fuzzy sets – Violate the two assumptions – Possibility theory -- measure of confidence or believe – Probability theory – randomness – Process imprecision – Introduce membership function – Believe xA in some degree between 0 and 1, inclusive
  • 41. Artificial Intelligence 41 The fuzzy set representation for “small integers.”
  • 42. Artificial Intelligence 42 A fuzzy set representation for the sets short, medium, and tall males.
  • 43. Artificial Intelligence 43 Fuzzy Set Operations • Fuzzy set operations are defined as the operations of membership functions • Complement: ¬A = C – mC = 1 – mA • Union: A  B =C – mC = max(mA, mB) • Intersection: A  B = C – mC = min(mA, mB) • Difference: A – B = C – mC = max(0, mA-mB)
  • 44. Artificial Intelligence 44 Fuzzy Inference Rules • Rule format and computation – If x is A and y is B then z is C mC(z) = min(mA(x), mB(y)) – If x is A or y is B then z is C mC(z) = max(mA(x), mB(y)) – If x is not A then z is C mC(z) = 1 – mA(x)
  • 45. Artificial Intelligence 45 The fuzzy regions for the input values θ (a) and dθ/dt (b). N – Negative, Z – Zero, P – Positive
  • 46. Artificial Intelligence 46 The fuzzy regions of the output value u, indicating the movement of the pendulum base: Negative Big, Negative, Zero, Positive, Positive Big.
  • 47. Artificial Intelligence 47 The fuzzificzation of the input measures X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0 X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0
  • 48. Artificial Intelligence 48 The Fuzzy Associative Matrix (FAM) for the pendulum problem. The input values are on the left and top. Fuzzy Rules:
  • 49. Artificial Intelligence 49 The fuzzy consequents (a) and their union (b). The centroid of the union (-2) is the crisp output.
  • 50. Artificial Intelligence 50 Dempster-Shafer Theory • Probability theory limitation – Assign a single number to measure any situation, no matter how it is complex – Cannot deal with missing evidence, heuristics, and limited knowledge • Dempster-Shafer theory – Extend probability theory – Consider a set of propositions as a whole – Assign a set of propositions an interval [believe, plausibility] to constraint the degree of belief for each individual propositions in the set – The belief measure bel is in [0,1] • 0 – no support evidence for a set of propositions • 1 – full support evidence for a set of propositions – The plausibility of p, • pl(p) = 1 – bel(not(p)) • Reflect how evidence of not(p) relates to the possibility for belief in p • Bel(not(p))=1: full support for not(p), no possibility for p • Bel(not(p))=0: no support for not(p), full possibility for p • Range is also in [0,1]
  • 51. Artificial Intelligence 51 Properties of Dempster-Shafer • Initially, no support evidence for either competing hypotheses, say h1 and h2 – Dempster-Shafer: [bel, pl] = [0, 1] – Probability theory: p(h1)=p(h2)=0.5 • Dempster-Shafer belief functions satisfy weaker axioms than probability function • Two fundamental ideas: – Obtaining belief degrees for one question from subjective probabilities for related questions – Using Dempster rule to combine these belief degrees when they are based on independent evidence
  • 52. Artificial Intelligence 52 An Example • Two persons M and B with reliabilities detect a computer and claim the result independently. How you believe their claims? • Question (Q): detection claim • Related question (RQ): detectors’ reliability • Dempster-Shafer approach – Obtain belief degrees for Q from subjective (prior) probabilities for RQ for each person – Combine belief degrees from two persons • Person M: – reliability 0.9, unreliability 0.1 – Claim h1 – Belief degree of h1 is bel(h1)=0.9 – Belief degree of not(h1) is bel(not(h1))=0.0, different from probability theory, since no evidence supporting not(h1) – pl(h1) = 1 – bel(not(h1)) = 1-0 =1 – Thus belief measure for M claim h1 is [0.9, 1] • Person B: – Reliability 0.8, unreliability 0.2 – Claim h2 – bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
  • 53. Artificial Intelligence 53 Combining Belief Measure • Set of propositions: M claim h1 and B claim h2 – Case 1: h1 = h2 • Reliability M and B: 09x0.8=0.72 • Unreliability M and B: 0.1x0.2=0.02 • The probability that at least one of two is reliable: 1-0.02=0.98 • Belief measure for h1=h2 is [0.98,1] – Case 2: h1 = not(h2) • Cannot be both correct and reliable • At least one is unreliable – Reliable M and unreliable B: 0.9x(1-0.8)=0.18 – Reliable B and unreliable M: 0.8x(1-0.1)=0.08 – Unreliable M and B: (1-0.9)x(1-0.8)=0.02 – At least one is unreliable: 0.18+0.08+0.02=0.28 • Given at least one is unreliable, posterior probabilities – Reliable M and unreliable B: 0.18/0.28=0.643 – Reliable B and unreliable M: 0.08/0.28=0.286 • Belief measure for h1 – Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286 – Pl(h1)=1-bel(not(h1))=1-0.286=0.714 – Belief measure: [0.643, 0.714] • Belief measure for h2 – Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683 – Pl(h2)=1-bel(not(h2))=1-0.683=0.317
  • 54. Artificial Intelligence 54 Dempster’s Rule • Assumption: – probable questions are independent a priori – As new evidence collected and conflicts, independency may disappear • Two steps 1. Sort the uncertainties into a priori independent pieces of evidence 2. Carry out Dempster rule • Consider the previous example – After M and B claimed, a repair person is called to check the computer, and both M and B witnessed this. – Three independent items of evidence must be combined • Not all evidence is directly supportive of individual elements of a set of hypotheses, but often supports different subsets of hypotheses, in favor of some and against others
  • 55. Artificial Intelligence 55 General Dempster’s Rule • Q – an exhaustive set of mutually exclusive hypotheses • Z – a subset of Q • M – probability density function to assign a belief measure to Z • Mn(Z) – belief degree to Z, where n is the number of sources of evidences
  • 56. Artificial Intelligence 56 Discrete Markov Process • Finite state machine – A graphical representation – State transition depends on input stream – States and transitions reflect properties of a formal language • Probabilistic finite state machine – A finite state machine – Transition function represented by a probability distribution on the current state • Discrete Markov process (chain, machine) – A specialization of probabilistic finite state machine – Ignores its input values
  • 57. Artificial Intelligence 57 A Markov state machine or Markov chain with four states, s1, ..., s4 At any time the system is in one of distinct states The system undergoes state change or remain Divide time into discrete intervals: t1, t2, …, tn Change state according to the probability distribution of each state S(t) – the actual state at time t p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …) First-order markov chain – Only depends on the direct predecessor state – P(S(t)) = p(S(t)|S(t-1))
  • 58. Artificial Intelligence 58 Observable Markov Model • Assume p(S(t)|S(t-1)) is time invariant, that is, transition between specific states retains the same probabilistic relationship • State transition probability aij between si and sj: – aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N – If i=j, no transition (remain the same state) – Properties: aij >=0, iaij=1
  • 59. Artificial Intelligence 59 S1 – sun S2 – cloudy S3 – fog S4 – precipitation Time intervals: noon to noon Question: suppose that today is sunny, what is the probability of the next five days being sunny, sunny, cloudy, cloudy, precipitation?
  • 60. Restrictiveness of Markov models • Are past and future really independent given current state? • E.g., suppose that when it rains, it rains for at most 2 days S1 S2 S3 S4 … • Second-order Markov process • Workaround: change meaning of “state” to events of last 2 days S1, S2 … S2, S3 S3, S4 S4, S5 • Another approach: add more information to the state • E.g., the full state of the world would include whether the sky is full of water – Additional information may not be observable – Blowup of number of states…
  • 61. Hidden Markov models (HMMs) • Same as Markov model, except we cannot see the state • Instead, we only see an observation each period, which depends on the current state S1 S2 S3 … St … • Still need a transition model: P(St+1 = j | St = i) = aij • Also need an observation model: P(Ot = k | St = i) = bik O1 O2 O3 … Ot …
  • 62. Weather example extended to HMM • Transition probabilities: s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Observation: labmate wet or dry • bsw = .1, bcw = .3, brw = .8
  • 63. HMM weather example: a question s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • P(S2 = r | O0 = d, O1 = w, O2 = w) • By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w) bsw = .1 bcw = .3 brw = .8
  • 64. Solving the question s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Computationally efficient approach: first compute P(S1 = i, O0 = d, O1 = w) for all states i • General case: solve for P(St, O0 = o0, O1 = o1, …, Ot = ot) for t=1, then t=2, … This is called monitoring • P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1 P(St-1 = st-1, O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot = o | S ) bsw = .1 bcw = .3 brw = .8
  • 65. Predicting further out s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for three days • On those days, your labmate was dry, wet, wet, respectively • What is the probability that two days from now it will be raining outside? • P(S4 = r | O0 = d, O1 = w, O2 = w) bsw = .1 bcw = .3 brw = .8
  • 66. Predicting further out, continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w) • Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w) • P(S3 = r | O0 = d, O1 = w, O2 = w) = Σs2 P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w) Σs2 P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w) • Etc. for S4 • So: monitoring first, then straightforward Markov process updates bsw = .1 bcw = .3 brw = .8
  • 67. Integrating newer information s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for four days (!) • On those days, your labmate was dry, wet, wet, dry respectively • What is the probability that two days ago it was raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) – Smoothing or hindsight problem bsw = .1 bcw = .3 brw = .8
  • 68. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) • “Partial” application of Bayes’ rule: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) = P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) / P(O2 = w, O3 = d | O0 = d, O1 = w) • So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w) bsw = .1 bcw = .3 brw = .8
  • 69. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) • P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) = P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r) • Already know how to compute P(S1 = r | O0 = d, O1 = w) • Just need to compute P(O2 = w, O3 = d | S1 = r) bsw = .1 bcw = .3 brw = .8
  • 70. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Just need to compute P(O2 = w, O3 = d | S1 = r) • P(O2 = w, O3 = d | S1 = r) = Σs2 P(S2 = s2, O2 = w, O3 = d | S1 = r) = Σs2 P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2) • First two factors directly in the model; last factor is a “smaller” problem of the same kind • Use dynamic programming, backwards from the future – Similar to forwards approach from the past bsw = .1 bcw = .3 brw = .8