SlideShare a Scribd company logo
19CS308T: Artificial Intelligence
UNIT-III
UNCERTAINITY AND STATISTICAL
REASONING
Faculty:Mr.K.Sundar
Syllabus
• Probability and Axioms-Bayes Rule-
Bayesian Networks-Inferences-Temporal
Models- Hidden Markov models-Fuzzy
reasoning-Certainty factors-Bayesian
Theory-Bayesian Network-Dempster
Shafer theory.
• Case study on each algorithm
• Probability theory
• Bayesian networks
• Certainty factors
3
1. Probability theory
1.1 Uncertain knowledge
p symptom(p, Toothache)  disease(p,cavity)
p sympt(p,Toothache) 
disease(p,cavity)  disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory  degree of belief or
plausibility of a statement – a numerical
measure in [0,1]
• Degree of truth – fuzzy logic  degree of belief
4
1.2 Definitions
• Unconditional or prior probability of A – the degree of
belief in A in the absence of any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent
has obtained some evidence B for A - P(A|B)
• P(Cavity | Toothache) = 0.8 5
Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event
(random variable) A – a function P:S  R
satisfying the axioms:
• 0  P(A)  1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A  B) = P(A) + P(B) - P(A  B)
P(A  ~A) = P(A)+P(~A) –P(false) = P(true)
P(~A) = 1 – P(A)
6
Definitions - cont
A and B mutually exclusive  P(A  B) = P(A) +
P(B)
P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + …
+ P(en)
The probability of a proposition a is equal to the
sum of the probabilities of the atomic events in
which a holds
e(a) – the set of atomic events in which a holds
7
1.3 Product rule
Conditional probabilities can be defined in terms of
unconditional probabilities
The condition probability of the occurrence of
A if event B occurs
– P(A|B) = P(A  B) / P(B)
This can be written also as:
– P(A  B) = P(A|B) * P(B)
For probability distributions
– P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1)
– P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2)
….
– P(X,Y) = P(X|Y)*P(Y) 8
1.4 Bayes’ rule and its use
P(A  B) = P(A|B) *P(B)
P(A  B) = P(B|A) *P(A)
Bays’ rule (theorem)
• P(B|A) = P(A | B) * P(B) / P(A)
• P(B|A) = P(A | B) * P(B) / P(A)
Bayes Theorem
hi – hypotheses (i=1,k);
e1,…,en - evidence
P(hi)
P(hi | e1,…,en)
P(e1,…,en| hi)
10
P(h |e ,e ,...,e ) =
P(e ,e ,...,e |h ) P(h )
P(e ,e ,...,e |h ) P(h )
, i = 1,k
i 1 2 n
1 2 n i i
1 2 n j j
j 1
k




Bayes’ Theorem - cont
If e1,…,en are independent hypotheses
then
PROSPECTOR
11
k
1,
=
j
),
h
|
P(e
...
)
h
|
P(e
)
h
|
P(e
=
)
h
|
e
,...,
e
,
P(e j
n
j
2
j
1
j
n
2
1 


1.5 Inferences
Probability distribution P(Cavity, Tooth)
Tooth  Tooth
Cavity 0.04 0.06
 Cavity 0.01 0.89
P(Cavity) = 0.04 + 0.06 = 0.1
P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 /
0.05
12
Inferences
Probability distributions P(Cavity, Tooth, Catch)
P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2
P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016
+ 0.064 = 0.28
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) =
[P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * /
P(Tooth)
13
Tooth ~ Tooth
Catch ~ Catch Catch ~ Catch
Cavity 0.108 0.012 0.072 0.008
~ Cavity 0.016 0.064 0.144 0.576
2 Bayesian networks
• Represent dependencies among random
variables
• Give a short specification of conditional
probability distribution
• Many random variables are conditionally
independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random 14
2.1 Definition of Bayesian
networks
A BN is a DAG in which each node is annotated
with quantitative probability information, namely:
• Nodes represent random variables (discrete or
continuous)
• Directed links XY: X has a direct influence on
Y, X is said to be a parent of Y
• each node X has an associated conditional
probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch
• Weather, Cavity  Toothache, Cavity  Catch
15
Bayesian network - example
16
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
B E P(A | B, E)
T F
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.0010.999
Conditional probability
table
2.2 Bayesian network semantics
A) Represent a probability distribution
B) Specify conditional independence – build the
network
A) each value of the probability distribution can be
computed as:
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
17
2.3 Building the network
P(X1=x1  … Xn=xn) = P(x1,…, xn) =
P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if
Parents(Xi)  { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in
an order consistent with a DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
18
Building the network - cont
• Pick a set of random variables that describe the problem
• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi)  a minimal set of nodes that
already exists in the network such that the conditional
independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes 
DAG
• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
19
Compactness of node ordering
• Far more compact than a probability distribution
• Example of locally structured system (or
sparse): each component interacts directly only
with a limited number of other components
• Associated usually with a linear growth in
complexity rather than with an exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add
the “root causes” first, then the variables they
influence, and so on, until we reach the leaves
20
2.4 Probabilistic inferences
21
P(A  V  B) = P(A) * P(V|A) * P(B|V)
V
A
B
B
V
A
A V B
P(A  V  B) = P(V) * P(A|V) * P(B|V)
P(A  V  B) = P(A) * P(B) * P(V|A,B)
Probabilistic inferences
22
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
P(J  M  A B E ) =
P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)=
0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062
Probabilistic inferences
23
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B)
= P(A|B,E) *P(E) + P(A| B,E)*P(E)
= 0.95 * 0.002 + 0.94 * 0.998 = 0.94002
2.5 Different types of inferences
24
Alarm
Intercausal inferences (between cause and common effects)
P(Burglary | Alarm Earthquake)
Mixed inferences
P(Alarm | JohnCalls  Earthquake)  diag + causal
P(Burglary | JohnCalls   Earthquake)  diag + intercausal
Diagnosis inferences (effect  cause)
P(Burglary | JohnCalls)
Causal inferences (cause  effect)
P(JohnCalls |Burglary), P(MaryCalls |
Burgalry)
Earthquake
JohnCalls MaryCalls
Burglary
3. Certainty factors
• The MYCIN model
• Certainty factors / Confidence coefficients (CF)
• Heuristic model of uncertain knowledge
• In MYCIN – two probabilistic functions to model
the degree of belief and the degree of disbelief in
a hypothesis
– function to measure the degree of belief - MB
– function to measure the degree of disbelief -
MD
• MB[h,e] – how much the belief in h increases
based on evidence e
• MD[h,e] - how much the disbelief in h increases
based on evidence e 25
3.1 Belief functions
• Certainty factor
26







contrar
caz
in
P(h)
max(0,1)
P(h)
P(h))
e),
|
max(P(h
1
=
P(h)
daca
1
=
e]
MB[h,







contrar
caz
in
P(h)
min(0,1)
P(h)
P(h))
e),
|
min(P(h
0
=
P(h)
daca
1
=
e]
MD[h,
CF[h,e]= MB[h,e] MD[h,e]

Belief functions - features
• Value range
• If h is sure, i.e. P(h|e) = 1, then
• If the negation of h is sure, i.e. , P(h|e) = 0 then
27
0 MB[h,e] 1
  0 MD[h,e] 1
    
1 CF[h,e] 1
MB[h,e] =
1 P(h)
1 P(h)
= 1


MD[h,e]= 0
CF[h,e]=1
MB[h,e]= 0
1
=
P(h)
0
P(h)
0
=
e]
MD[h,


CF[h,e]= 1

Example in MYCIN
• if (1) the type of the organism is gram-positive, and
• (2) the morphology of the organism is coccus, and
• (3) the growth of the organism is chain
• then there is a strong evidence (0.7) that the identity of
the organism is streptococcus
Example of facts in MYCIN :
• (identity organism-1 pseudomonas 0.8)
• (identity organism-2 e.coli 0.15)
• (morphology organism-2 coccus 1.0)
28
3.2 Combining belief functions
29
(1) Incremental gathering of evidence
• The same attribute value, h, is obtained by two separate
paths of inference, with two separate CFs : CF[h,s1] si
CF[h,s2]
• The two different paths, corresponding to hypotheses s1
and s2 may be different braches of the search tree.
• CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2]
• (identity organism-1 pseudomonas 0.8)
Combining belief functions
30
(2) Conjunction of hypothesis
• Applied for computing the CF associated to the
premises of a rule which ahs several conditions
if A = a1 and B = b1 then …
WM: (A a1 h1 cf1)(B b1 h2 cf2)
• CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
Combining belief functions
31
(3) Combining beliefs
• An uncertain value is deduced based on a rule
which has as input conditions based on uncertain
values (may be obtained by applying other rules
for example).
• Allows the computation of the CF of the fact
deduced by the rule based on the rule’s CF and
the CF of the hypotheses
• CF[s,e] – belief in a hypothesis s based on
previous evidence e
• CF[h,s] - CF in h if s is sure
• CF’[h,s] = CF[h,s] * CF [s,e]
Combining belief functions
32
(3) Combining beliefs – cont
if A = a1 and B = b1 then C = c1 0.7
ML: (A a1 0.9) (B b1 0.6)
CF(premises) = min(0.9, 0.6) = 0.6
CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7
ML: (C c1 0.42)
3.3 Limits of CF
33
• CF of MYCIN assumes that that the hypothesis are
sustained by independent evidence
• An example shows what happens if this condition is
violated
A: The sprinkle functioned last night
U: The grass is wet in the morning
P: Last night it rained
34
R1: if the sprinkle functioned last night
then there is a strong evidence (0.9) that the grass is wet in the
morning
R2: if the grass is wet in the morning
then there is a strong evidence (0.8) that it rained last night
• CF[U,A] = 0.9
• therefore the evidence sprinkle sustains the hypothesis wet
grass with CF = 0.9
• CF[P,U] = 0.8
• therefore the evidence wet grass sustains the hypothesis rain
with CF = 0.8
• CF[P,A] = 0.8 * 0.9 = 0.72
• therefore the evidence sprinkle sustains the hypothesis rain
with CF = 0.72
Artificial Intelligence 35
Traditional Logic
• Based on predicate logic
• Three important assumptions:
– Predicate descriptions are sufficient w.r.t. to
the domain
– Information is consistent
– Knowledge base grows monotonically
Artificial Intelligence 36
Non-monotonic Logic
• Addresses the three assumptions of traditional
logic
– Knowledge is incomplete
• No knowledge about p: true or false?
• Prolog – closed world assumption
– Knowledge is inconsistent
• Based on how the world usually works
• Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
• New observation may contradict the existing knowledge, thus
the existing knowledge may need removal.
• Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
• Three modal operators are introduced
Artificial Intelligence 37
Unless Operator
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems
to keep track of the reasoning steps and preserve the
KB consistency
• Introduce Unless operator
– Support inferences based on the belief that its argument is
not true
– Consider
• p(X) unless q(X)  r(X)
If p(X) is true and not believe q(X) true then r(X)
• p(Z)
• r(W)  s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
• Either unknown or believed false
• Believed or known true
– Monotonocity
Artificial Intelligence 38
Is-consistent-with Operator M
• When reason, make sure the premises are
consistent
• Format: M p – p is consistent with KB
• Consider
– X good_student(X)  M study_hard(X) 
graduates(X)
– For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
graduate
– Not necessary to prove that X study hard.
• How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search
Artificial Intelligence 39
Default Logic
• Introduce a new format of inference rules:
– A(Z)  :B(Z)  C(Z)
– If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
• In default logic, new rules are used to infer sets of plausible
extensions
– Example:
X good_student(X)  :study_hard(X)  graduates(X)
Y party(Y)  :not(study_hard(Y))  not(graduates(X))
Artificial Intelligence 40
Fuzzy Sets
• Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
• Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive
Artificial Intelligence 41
The fuzzy set representation for “small integers.”
Artificial Intelligence 42
A fuzzy set representation for the sets short, medium,
and tall males.
Artificial Intelligence 43
Fuzzy Set Operations
• Fuzzy set operations are defined as the
operations of membership functions
• Complement: ¬A = C
– mC = 1 – mA
• Union: A  B =C
– mC = max(mA, mB)
• Intersection: A  B = C
– mC = min(mA, mB)
• Difference: A – B = C
– mC = max(0, mA-mB)
Artificial Intelligence 44
Fuzzy Inference Rules
• Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)
Artificial Intelligence 45
The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive
Artificial Intelligence 46
The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.
Artificial Intelligence 47
The fuzzificzation of the input measures
X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0
X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0
Artificial Intelligence 48
The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:
Artificial Intelligence 49
The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.
Artificial Intelligence 50
Dempster-Shafer Theory
• Probability theory limitation
– Assign a single number to measure any situation, no matter how it is
complex
– Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to constraint
the degree of belief for each individual propositions in the set
– The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
– The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]
Artificial Intelligence 51
Properties of Dempster-Shafer
• Initially, no support evidence for either competing
hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker
axioms than probability function
• Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief degrees
when they are based on independent evidence
Artificial Intelligence 52
An Example
• Two persons M and B with reliabilities detect a computer and claim
the result independently. How you believe their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
• Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
Artificial Intelligence 53
Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
• Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317
Artificial Intelligence 54
Dempster’s Rule
• Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency may
disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
– After M and B claimed, a repair person is called to check the
computer, and both M and B witnessed this.
– Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual
elements of a set of hypotheses, but often supports
different subsets of hypotheses, in favor of some and
against others
Artificial Intelligence 55
General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive
hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief
measure to Z
• Mn(Z) – belief degree to Z, where n is the number of
sources of evidences
Artificial Intelligence 56
Discrete Markov Process
• Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a formal
language
• Probabilistic finite state machine
– A finite state machine
– Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state machine
– Ignores its input values
Artificial Intelligence 57
A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))
Artificial Intelligence 58
Observable Markov Model
• Assume p(S(t)|S(t-1)) is time invariant, that is, transition between
specific states retains the same probabilistic relationship
• State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1
Artificial Intelligence 59
S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?
Restrictiveness of Markov models
• Are past and future really independent given current state?
• E.g., suppose that when it rains, it rains for at most 2 days
S1 S2 S3 S4 …
• Second-order Markov process
• Workaround: change meaning of “state” to events of last 2 days
S1, S2 …
S2, S3 S3, S4 S4, S5
• Another approach: add more information to the state
• E.g., the full state of the world would include whether the
sky is full of water
– Additional information may not be observable
– Blowup of number of states…
Hidden Markov models (HMMs)
• Same as Markov model, except we cannot see the
state
• Instead, we only see an observation each period,
which depends on the current state
S1 S2 S3 … St …
• Still need a transition model: P(St+1 = j | St = i) = aij
• Also need an observation model: P(Ot = k | St = i) = bik
O1 O2 O3 … Ot …
Weather example extended to HMM
• Transition probabilities:
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Observation: labmate wet or dry
• bsw = .1, bcw = .3, brw = .8
HMM weather example: a question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that it is now raining outside?
• P(S2 = r | O0 = d, O1 = w, O2 = w)
• By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
Solving the question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Computationally efficient approach: first compute
P(S1 = i, O0 = d, O1 = w) for all states i
• General case: solve for P(St, O0 = o0, O1 = o1, …, Ot
= ot) for t=1, then t=2, … This is called monitoring
• P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1
P(St-1 = st-1,
O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot =
o | S )
bsw = .1
bcw = .3
brw = .8
Predicting further out
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that two days from now it
will be raining outside?
• P(S4 = r | O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
Predicting further out, continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w)
• Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w)
• P(S3 = r | O0 = d, O1 = w, O2 = w) =
Σs2
P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w)
Σs2
P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w)
• Etc. for S4
• So: monitoring first, then straightforward Markov process
updates
bsw = .1
bcw = .3
brw = .8
Integrating newer information
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for four days (!)
• On those days, your labmate was dry, wet, wet, dry
respectively
• What is the probability that two days ago it was
raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3
= d)
– Smoothing or hindsight problem
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d)
• “Partial” application of Bayes’ rule:
P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) =
P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) /
P(O2 = w, O3 = d | O0 = d, O1 = w)
• So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w)
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w)
• P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) =
P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r)
• Already know how to compute P(S1 = r | O0 = d, O1 = w)
• Just need to compute P(O2 = w, O3 = d | S1 = r)
bsw = .1
bcw = .3
brw = .8
Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Just need to compute P(O2 = w, O3 = d | S1 = r)
• P(O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2, O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2)
• First two factors directly in the model; last factor is a
“smaller” problem of the same kind
• Use dynamic programming, backwards from the future
– Similar to forwards approach from the past
bsw = .1
bcw = .3
brw = .8
References
• http://www.cs.duke.edu/courses/fall08/cp
s270/
• https://csc.csudh.edu
• https://inuresearch.tripod.com › ai

More Related Content

What's hot

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
Dr. C.V. Suresh Babu
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
Megha Sharma
 
Lecture 14 Heuristic Search-A star algorithm
Lecture 14 Heuristic Search-A star algorithmLecture 14 Heuristic Search-A star algorithm
Lecture 14 Heuristic Search-A star algorithm
Hema Kashyap
 
Genetic algorithms vs Traditional algorithms
Genetic algorithms vs Traditional algorithmsGenetic algorithms vs Traditional algorithms
Genetic algorithms vs Traditional algorithms
Dr. C.V. Suresh Babu
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
Tajim Md. Niamat Ullah Akhund
 
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Ashish Duggal
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Ai 8 puzzle problem
Ai 8 puzzle problemAi 8 puzzle problem
Ai 8 puzzle problem
Sanad Bhowmik
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
Raktim Halder
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
Amruth Veerabhadraiah
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
Sahil Kumar
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
Soheil Khodayari
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
CloudxLab
 
First order logic
First order logicFirst order logic
First order logic
Chinmay Patel
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
Shiwani Gupta
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
Harini Balamurugan
 
I. Alpha-Beta Pruning in ai
I. Alpha-Beta Pruning in aiI. Alpha-Beta Pruning in ai
I. Alpha-Beta Pruning in ai
vikas dhakane
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
Junya Tanaka
 
Artificial Intelligence- TicTacToe game
Artificial Intelligence- TicTacToe gameArtificial Intelligence- TicTacToe game
Artificial Intelligence- TicTacToe game
manika kumari
 

What's hot (20)

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
 
Lecture 14 Heuristic Search-A star algorithm
Lecture 14 Heuristic Search-A star algorithmLecture 14 Heuristic Search-A star algorithm
Lecture 14 Heuristic Search-A star algorithm
 
Genetic algorithms vs Traditional algorithms
Genetic algorithms vs Traditional algorithmsGenetic algorithms vs Traditional algorithms
Genetic algorithms vs Traditional algorithms
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Ai 8 puzzle problem
Ai 8 puzzle problemAi 8 puzzle problem
Ai 8 puzzle problem
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
First order logic
First order logicFirst order logic
First order logic
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
I. Alpha-Beta Pruning in ai
I. Alpha-Beta Pruning in aiI. Alpha-Beta Pruning in ai
I. Alpha-Beta Pruning in ai
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
 
Artificial Intelligence- TicTacToe game
Artificial Intelligence- TicTacToe gameArtificial Intelligence- TicTacToe game
Artificial Intelligence- TicTacToe game
 

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

AIML unit-2(1).ppt
AIML unit-2(1).pptAIML unit-2(1).ppt
AIML unit-2(1).ppt
ashudhanraj
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
uddingias
 
Uncertainity
Uncertainity Uncertainity
Uncertainity
Yasir Khan
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
Duyen Do
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
goutamkrsahoo
 
Uncertainty
UncertaintyUncertainty
Uncertainty
Digvijay Singh
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
Digvijay Singh
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
Ranjan Kumar
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
MohammadMoattar2
 
Deep learning .pdf
Deep learning .pdfDeep learning .pdf
Deep learning .pdf
AlHayyan
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Joachim Gwoke
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
ssuserd329601
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
sarahfarhin
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
Sameer607695
 
Probability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.pptProbability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.ppt
ShamshadAli58
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
Yonas992841
 
Probability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledgProbability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledg
nsnayak03
 

Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC (20)

AIML unit-2(1).ppt
AIML unit-2(1).pptAIML unit-2(1).ppt
AIML unit-2(1).ppt
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Uncertainity
Uncertainity Uncertainity
Uncertainity
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
Uncertainty
UncertaintyUncertainty
Uncertainty
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Deep learning .pdf
Deep learning .pdfDeep learning .pdf
Deep learning .pdf
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.pptProbability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledgProbability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledg
 

Recently uploaded

OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
CE19KaushlendraKumar
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 

Recently uploaded (20)

OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 

Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC

  • 1. 19CS308T: Artificial Intelligence UNIT-III UNCERTAINITY AND STATISTICAL REASONING Faculty:Mr.K.Sundar
  • 2. Syllabus • Probability and Axioms-Bayes Rule- Bayesian Networks-Inferences-Temporal Models- Hidden Markov models-Fuzzy reasoning-Certainty factors-Bayesian Theory-Bayesian Network-Dempster Shafer theory. • Case study on each algorithm
  • 3. • Probability theory • Bayesian networks • Certainty factors 3
  • 4. 1. Probability theory 1.1 Uncertain knowledge p symptom(p, Toothache)  disease(p,cavity) p sympt(p,Toothache)  disease(p,cavity)  disease(p,gum_disease) … • PL - laziness - theoretical ignorance - practical ignorance • Probability theory  degree of belief or plausibility of a statement – a numerical measure in [0,1] • Degree of truth – fuzzy logic  degree of belief 4
  • 5. 1.2 Definitions • Unconditional or prior probability of A – the degree of belief in A in the absence of any other information – P(A) • A – random variable • Probability distribution – P(A), P(A,B) Example P(Weather = Sunny) = 0.1 P(Weather = Rain) = 0.7 P(Weather = Snow) = 0.2 Weather – random variable • P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution • Conditional probability – posterior – once the agent has obtained some evidence B for A - P(A|B) • P(Cavity | Toothache) = 0.8 5
  • 6. Definitions - cont • Axioms of probability • The measure of the occurrence of an event (random variable) A – a function P:S  R satisfying the axioms: • 0  P(A)  1 • P(S) = 1 ( or P(true) = 1 and P(false) = 0) • P(A  B) = P(A) + P(B) - P(A  B) P(A  ~A) = P(A)+P(~A) –P(false) = P(true) P(~A) = 1 – P(A) 6
  • 7. Definitions - cont A and B mutually exclusive  P(A  B) = P(A) + P(B) P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + … + P(en) The probability of a proposition a is equal to the sum of the probabilities of the atomic events in which a holds e(a) – the set of atomic events in which a holds 7
  • 8. 1.3 Product rule Conditional probabilities can be defined in terms of unconditional probabilities The condition probability of the occurrence of A if event B occurs – P(A|B) = P(A  B) / P(B) This can be written also as: – P(A  B) = P(A|B) * P(B) For probability distributions – P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1) – P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2) …. – P(X,Y) = P(X|Y)*P(Y) 8
  • 9. 1.4 Bayes’ rule and its use P(A  B) = P(A|B) *P(B) P(A  B) = P(B|A) *P(A) Bays’ rule (theorem) • P(B|A) = P(A | B) * P(B) / P(A) • P(B|A) = P(A | B) * P(B) / P(A)
  • 10. Bayes Theorem hi – hypotheses (i=1,k); e1,…,en - evidence P(hi) P(hi | e1,…,en) P(e1,…,en| hi) 10 P(h |e ,e ,...,e ) = P(e ,e ,...,e |h ) P(h ) P(e ,e ,...,e |h ) P(h ) , i = 1,k i 1 2 n 1 2 n i i 1 2 n j j j 1 k    
  • 11. Bayes’ Theorem - cont If e1,…,en are independent hypotheses then PROSPECTOR 11 k 1, = j ), h | P(e ... ) h | P(e ) h | P(e = ) h | e ,..., e , P(e j n j 2 j 1 j n 2 1   
  • 12. 1.5 Inferences Probability distribution P(Cavity, Tooth) Tooth  Tooth Cavity 0.04 0.06  Cavity 0.01 0.89 P(Cavity) = 0.04 + 0.06 = 0.1 P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 / 0.05 12
  • 13. Inferences Probability distributions P(Cavity, Tooth, Catch) P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2 P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = [P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * / P(Tooth) 13 Tooth ~ Tooth Catch ~ Catch Catch ~ Catch Cavity 0.108 0.012 0.072 0.008 ~ Cavity 0.016 0.064 0.144 0.576
  • 14. 2 Bayesian networks • Represent dependencies among random variables • Give a short specification of conditional probability distribution • Many random variables are conditionally independent • Simplifies computations • Graphical representation • DAG – causal relationships among random 14
  • 15. 2.1 Definition of Bayesian networks A BN is a DAG in which each node is annotated with quantitative probability information, namely: • Nodes represent random variables (discrete or continuous) • Directed links XY: X has a direct influence on Y, X is said to be a parent of Y • each node X has an associated conditional probability table, P(Xi | Parents(Xi)) that quantify the effects of the parents on the node Example: Weather, Cavity, Toothache, Catch • Weather, Cavity  Toothache, Cavity  Catch 15
  • 16. Bayesian network - example 16 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 B E P(A | B, E) T F T T 0.95 0.05 T F 0.94 0.06 F T 0.29 0.71 F F 0.0010.999 Conditional probability table
  • 17. 2.2 Bayesian network semantics A) Represent a probability distribution B) Specify conditional independence – build the network A) each value of the probability distribution can be computed as: P(X1=x1  … Xn=xn) = P(x1,…, xn) = i=1,n P(xi | Parents(xi)) 17
  • 18. 2.3 Building the network P(X1=x1  … Xn=xn) = P(x1,…, xn) = P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … = P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) = i=1,n P(xi | xi-1,…, x1) • We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if Parents(Xi)  { Xi-1,…, X1} • The condition may be satisfied by labeling the nodes in an order consistent with a DAG • Intuitively, the parents of a node Xi must be all the nodes Xi-1,…, X1 which have a direct influence on Xi. 18
  • 19. Building the network - cont • Pick a set of random variables that describe the problem • Pick an ordering of those variables • while there are still variables repeat (a) choose a variable Xi and add a node associated to Xi (b) assign Parents(Xi)  a minimal set of nodes that already exists in the network such that the conditional independence property is satisfied (c) define the conditional probability table for Xi • Because each node is linked only to previous nodes  DAG • P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) = P(MaryCalls | Alarm) 19
  • 20. Compactness of node ordering • Far more compact than a probability distribution • Example of locally structured system (or sparse): each component interacts directly only with a limited number of other components • Associated usually with a linear growth in complexity rather than with an exponential one • The order of adding the nodes is important • The correct order in which to add nodes is to add the “root causes” first, then the variables they influence, and so on, until we reach the leaves 20
  • 21. 2.4 Probabilistic inferences 21 P(A  V  B) = P(A) * P(V|A) * P(B|V) V A B B V A A V B P(A  V  B) = P(V) * P(A|V) * P(B|V) P(A  V  B) = P(A) * P(B) * P(V|A,B)
  • 22. Probabilistic inferences 22 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(J  M  A B E ) = P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)= 0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062
  • 23. Probabilistic inferences 23 Earthquake Alarm JohnCalls MaryCalls Burglary P(B) 0.001 P(E) 0.002 B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B) = P(A|B,E) *P(E) + P(A| B,E)*P(E) = 0.95 * 0.002 + 0.94 * 0.998 = 0.94002
  • 24. 2.5 Different types of inferences 24 Alarm Intercausal inferences (between cause and common effects) P(Burglary | Alarm Earthquake) Mixed inferences P(Alarm | JohnCalls  Earthquake)  diag + causal P(Burglary | JohnCalls   Earthquake)  diag + intercausal Diagnosis inferences (effect  cause) P(Burglary | JohnCalls) Causal inferences (cause  effect) P(JohnCalls |Burglary), P(MaryCalls | Burgalry) Earthquake JohnCalls MaryCalls Burglary
  • 25. 3. Certainty factors • The MYCIN model • Certainty factors / Confidence coefficients (CF) • Heuristic model of uncertain knowledge • In MYCIN – two probabilistic functions to model the degree of belief and the degree of disbelief in a hypothesis – function to measure the degree of belief - MB – function to measure the degree of disbelief - MD • MB[h,e] – how much the belief in h increases based on evidence e • MD[h,e] - how much the disbelief in h increases based on evidence e 25
  • 26. 3.1 Belief functions • Certainty factor 26        contrar caz in P(h) max(0,1) P(h) P(h)) e), | max(P(h 1 = P(h) daca 1 = e] MB[h,        contrar caz in P(h) min(0,1) P(h) P(h)) e), | min(P(h 0 = P(h) daca 1 = e] MD[h, CF[h,e]= MB[h,e] MD[h,e] 
  • 27. Belief functions - features • Value range • If h is sure, i.e. P(h|e) = 1, then • If the negation of h is sure, i.e. , P(h|e) = 0 then 27 0 MB[h,e] 1   0 MD[h,e] 1      1 CF[h,e] 1 MB[h,e] = 1 P(h) 1 P(h) = 1   MD[h,e]= 0 CF[h,e]=1 MB[h,e]= 0 1 = P(h) 0 P(h) 0 = e] MD[h,   CF[h,e]= 1 
  • 28. Example in MYCIN • if (1) the type of the organism is gram-positive, and • (2) the morphology of the organism is coccus, and • (3) the growth of the organism is chain • then there is a strong evidence (0.7) that the identity of the organism is streptococcus Example of facts in MYCIN : • (identity organism-1 pseudomonas 0.8) • (identity organism-2 e.coli 0.15) • (morphology organism-2 coccus 1.0) 28
  • 29. 3.2 Combining belief functions 29 (1) Incremental gathering of evidence • The same attribute value, h, is obtained by two separate paths of inference, with two separate CFs : CF[h,s1] si CF[h,s2] • The two different paths, corresponding to hypotheses s1 and s2 may be different braches of the search tree. • CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2] • (identity organism-1 pseudomonas 0.8)
  • 30. Combining belief functions 30 (2) Conjunction of hypothesis • Applied for computing the CF associated to the premises of a rule which ahs several conditions if A = a1 and B = b1 then … WM: (A a1 h1 cf1)(B b1 h2 cf2) • CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
  • 31. Combining belief functions 31 (3) Combining beliefs • An uncertain value is deduced based on a rule which has as input conditions based on uncertain values (may be obtained by applying other rules for example). • Allows the computation of the CF of the fact deduced by the rule based on the rule’s CF and the CF of the hypotheses • CF[s,e] – belief in a hypothesis s based on previous evidence e • CF[h,s] - CF in h if s is sure • CF’[h,s] = CF[h,s] * CF [s,e]
  • 32. Combining belief functions 32 (3) Combining beliefs – cont if A = a1 and B = b1 then C = c1 0.7 ML: (A a1 0.9) (B b1 0.6) CF(premises) = min(0.9, 0.6) = 0.6 CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7 ML: (C c1 0.42)
  • 33. 3.3 Limits of CF 33 • CF of MYCIN assumes that that the hypothesis are sustained by independent evidence • An example shows what happens if this condition is violated A: The sprinkle functioned last night U: The grass is wet in the morning P: Last night it rained
  • 34. 34 R1: if the sprinkle functioned last night then there is a strong evidence (0.9) that the grass is wet in the morning R2: if the grass is wet in the morning then there is a strong evidence (0.8) that it rained last night • CF[U,A] = 0.9 • therefore the evidence sprinkle sustains the hypothesis wet grass with CF = 0.9 • CF[P,U] = 0.8 • therefore the evidence wet grass sustains the hypothesis rain with CF = 0.8 • CF[P,A] = 0.8 * 0.9 = 0.72 • therefore the evidence sprinkle sustains the hypothesis rain with CF = 0.72
  • 35. Artificial Intelligence 35 Traditional Logic • Based on predicate logic • Three important assumptions: – Predicate descriptions are sufficient w.r.t. to the domain – Information is consistent – Knowledge base grows monotonically
  • 36. Artificial Intelligence 36 Non-monotonic Logic • Addresses the three assumptions of traditional logic – Knowledge is incomplete • No knowledge about p: true or false? • Prolog – closed world assumption – Knowledge is inconsistent • Based on how the world usually works • Most birds fly, but Ostrich doesn’t – Knowledge base grows non-monotonically • New observation may contradict the existing knowledge, thus the existing knowledge may need removal. • Inference based on assumptions, how come if the assumptions are later shown to be incorrect • Three modal operators are introduced
  • 37. Artificial Intelligence 37 Unless Operator • New information may invalidate previous results • Implemented in TMS – Truth Maintenance Systems to keep track of the reasoning steps and preserve the KB consistency • Introduce Unless operator – Support inferences based on the belief that its argument is not true – Consider • p(X) unless q(X)  r(X) If p(X) is true and not believe q(X) true then r(X) • p(Z) • r(W)  s(W) From above, conclude s(X). Later, change believe or find q(X) true, what happens? Retract r(X) and s(X) – Unless deals with believe, not truth • Either unknown or believed false • Believed or known true – Monotonocity
  • 38. Artificial Intelligence 38 Is-consistent-with Operator M • When reason, make sure the premises are consistent • Format: M p – p is consistent with KB • Consider – X good_student(X)  M study_hard(X)  graduates(X) – For all X who is a good student, if the fact that X studies hard is consistent with KB, then X will graduate – Not necessary to prove that X study hard. • How to decide p is consistent with KB – Negation as failure – Heuristic-based and limited search
  • 39. Artificial Intelligence 39 Default Logic • Introduce a new format of inference rules: – A(Z)  :B(Z)  C(Z) – If A(Z) is provable, and it is consistent with what we know to assume B(Z), then conclude C(Z) • Compare with is-consistent-with operator – Similar – Difference is the reasoning method • In default logic, new rules are used to infer sets of plausible extensions – Example: X good_student(X)  :study_hard(X)  graduates(X) Y party(Y)  :not(study_hard(Y))  not(graduates(X))
  • 40. Artificial Intelligence 40 Fuzzy Sets • Classic sets – Completeness: x in either A or ¬A – Exclusive: can not be in both A and ¬A • Fuzzy sets – Violate the two assumptions – Possibility theory -- measure of confidence or believe – Probability theory – randomness – Process imprecision – Introduce membership function – Believe xA in some degree between 0 and 1, inclusive
  • 41. Artificial Intelligence 41 The fuzzy set representation for “small integers.”
  • 42. Artificial Intelligence 42 A fuzzy set representation for the sets short, medium, and tall males.
  • 43. Artificial Intelligence 43 Fuzzy Set Operations • Fuzzy set operations are defined as the operations of membership functions • Complement: ¬A = C – mC = 1 – mA • Union: A  B =C – mC = max(mA, mB) • Intersection: A  B = C – mC = min(mA, mB) • Difference: A – B = C – mC = max(0, mA-mB)
  • 44. Artificial Intelligence 44 Fuzzy Inference Rules • Rule format and computation – If x is A and y is B then z is C mC(z) = min(mA(x), mB(y)) – If x is A or y is B then z is C mC(z) = max(mA(x), mB(y)) – If x is not A then z is C mC(z) = 1 – mA(x)
  • 45. Artificial Intelligence 45 The fuzzy regions for the input values θ (a) and dθ/dt (b). N – Negative, Z – Zero, P – Positive
  • 46. Artificial Intelligence 46 The fuzzy regions of the output value u, indicating the movement of the pendulum base: Negative Big, Negative, Zero, Positive, Positive Big.
  • 47. Artificial Intelligence 47 The fuzzificzation of the input measures X1 = 1: mZ(X1) = mP(X1) = 0.5, mN(X1) = 0 X2 = -4: mZ(X2) = 0.2, mN(X2) = 0.8 , mP(X2) = 0
  • 48. Artificial Intelligence 48 The Fuzzy Associative Matrix (FAM) for the pendulum problem. The input values are on the left and top. Fuzzy Rules:
  • 49. Artificial Intelligence 49 The fuzzy consequents (a) and their union (b). The centroid of the union (-2) is the crisp output.
  • 50. Artificial Intelligence 50 Dempster-Shafer Theory • Probability theory limitation – Assign a single number to measure any situation, no matter how it is complex – Cannot deal with missing evidence, heuristics, and limited knowledge • Dempster-Shafer theory – Extend probability theory – Consider a set of propositions as a whole – Assign a set of propositions an interval [believe, plausibility] to constraint the degree of belief for each individual propositions in the set – The belief measure bel is in [0,1] • 0 – no support evidence for a set of propositions • 1 – full support evidence for a set of propositions – The plausibility of p, • pl(p) = 1 – bel(not(p)) • Reflect how evidence of not(p) relates to the possibility for belief in p • Bel(not(p))=1: full support for not(p), no possibility for p • Bel(not(p))=0: no support for not(p), full possibility for p • Range is also in [0,1]
  • 51. Artificial Intelligence 51 Properties of Dempster-Shafer • Initially, no support evidence for either competing hypotheses, say h1 and h2 – Dempster-Shafer: [bel, pl] = [0, 1] – Probability theory: p(h1)=p(h2)=0.5 • Dempster-Shafer belief functions satisfy weaker axioms than probability function • Two fundamental ideas: – Obtaining belief degrees for one question from subjective probabilities for related questions – Using Dempster rule to combine these belief degrees when they are based on independent evidence
  • 52. Artificial Intelligence 52 An Example • Two persons M and B with reliabilities detect a computer and claim the result independently. How you believe their claims? • Question (Q): detection claim • Related question (RQ): detectors’ reliability • Dempster-Shafer approach – Obtain belief degrees for Q from subjective (prior) probabilities for RQ for each person – Combine belief degrees from two persons • Person M: – reliability 0.9, unreliability 0.1 – Claim h1 – Belief degree of h1 is bel(h1)=0.9 – Belief degree of not(h1) is bel(not(h1))=0.0, different from probability theory, since no evidence supporting not(h1) – pl(h1) = 1 – bel(not(h1)) = 1-0 =1 – Thus belief measure for M claim h1 is [0.9, 1] • Person B: – Reliability 0.8, unreliability 0.2 – Claim h2 – bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
  • 53. Artificial Intelligence 53 Combining Belief Measure • Set of propositions: M claim h1 and B claim h2 – Case 1: h1 = h2 • Reliability M and B: 09x0.8=0.72 • Unreliability M and B: 0.1x0.2=0.02 • The probability that at least one of two is reliable: 1-0.02=0.98 • Belief measure for h1=h2 is [0.98,1] – Case 2: h1 = not(h2) • Cannot be both correct and reliable • At least one is unreliable – Reliable M and unreliable B: 0.9x(1-0.8)=0.18 – Reliable B and unreliable M: 0.8x(1-0.1)=0.08 – Unreliable M and B: (1-0.9)x(1-0.8)=0.02 – At least one is unreliable: 0.18+0.08+0.02=0.28 • Given at least one is unreliable, posterior probabilities – Reliable M and unreliable B: 0.18/0.28=0.643 – Reliable B and unreliable M: 0.08/0.28=0.286 • Belief measure for h1 – Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286 – Pl(h1)=1-bel(not(h1))=1-0.286=0.714 – Belief measure: [0.643, 0.714] • Belief measure for h2 – Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683 – Pl(h2)=1-bel(not(h2))=1-0.683=0.317
  • 54. Artificial Intelligence 54 Dempster’s Rule • Assumption: – probable questions are independent a priori – As new evidence collected and conflicts, independency may disappear • Two steps 1. Sort the uncertainties into a priori independent pieces of evidence 2. Carry out Dempster rule • Consider the previous example – After M and B claimed, a repair person is called to check the computer, and both M and B witnessed this. – Three independent items of evidence must be combined • Not all evidence is directly supportive of individual elements of a set of hypotheses, but often supports different subsets of hypotheses, in favor of some and against others
  • 55. Artificial Intelligence 55 General Dempster’s Rule • Q – an exhaustive set of mutually exclusive hypotheses • Z – a subset of Q • M – probability density function to assign a belief measure to Z • Mn(Z) – belief degree to Z, where n is the number of sources of evidences
  • 56. Artificial Intelligence 56 Discrete Markov Process • Finite state machine – A graphical representation – State transition depends on input stream – States and transitions reflect properties of a formal language • Probabilistic finite state machine – A finite state machine – Transition function represented by a probability distribution on the current state • Discrete Markov process (chain, machine) – A specialization of probabilistic finite state machine – Ignores its input values
  • 57. Artificial Intelligence 57 A Markov state machine or Markov chain with four states, s1, ..., s4 At any time the system is in one of distinct states The system undergoes state change or remain Divide time into discrete intervals: t1, t2, …, tn Change state according to the probability distribution of each state S(t) – the actual state at time t p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …) First-order markov chain – Only depends on the direct predecessor state – P(S(t)) = p(S(t)|S(t-1))
  • 58. Artificial Intelligence 58 Observable Markov Model • Assume p(S(t)|S(t-1)) is time invariant, that is, transition between specific states retains the same probabilistic relationship • State transition probability aij between si and sj: – aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N – If i=j, no transition (remain the same state) – Properties: aij >=0, iaij=1
  • 59. Artificial Intelligence 59 S1 – sun S2 – cloudy S3 – fog S4 – precipitation Time intervals: noon to noon Question: suppose that today is sunny, what is the probability of the next five days being sunny, sunny, cloudy, cloudy, precipitation?
  • 60. Restrictiveness of Markov models • Are past and future really independent given current state? • E.g., suppose that when it rains, it rains for at most 2 days S1 S2 S3 S4 … • Second-order Markov process • Workaround: change meaning of “state” to events of last 2 days S1, S2 … S2, S3 S3, S4 S4, S5 • Another approach: add more information to the state • E.g., the full state of the world would include whether the sky is full of water – Additional information may not be observable – Blowup of number of states…
  • 61. Hidden Markov models (HMMs) • Same as Markov model, except we cannot see the state • Instead, we only see an observation each period, which depends on the current state S1 S2 S3 … St … • Still need a transition model: P(St+1 = j | St = i) = aij • Also need an observation model: P(Ot = k | St = i) = bik O1 O2 O3 … Ot …
  • 62. Weather example extended to HMM • Transition probabilities: s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Observation: labmate wet or dry • bsw = .1, bcw = .3, brw = .8
  • 63. HMM weather example: a question s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • P(S2 = r | O0 = d, O1 = w, O2 = w) • By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w) bsw = .1 bcw = .3 brw = .8
  • 64. Solving the question s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Computationally efficient approach: first compute P(S1 = i, O0 = d, O1 = w) for all states i • General case: solve for P(St, O0 = o0, O1 = o1, …, Ot = ot) for t=1, then t=2, … This is called monitoring • P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1 P(St-1 = st-1, O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot = o | S ) bsw = .1 bcw = .3 brw = .8
  • 65. Predicting further out s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for three days • On those days, your labmate was dry, wet, wet, respectively • What is the probability that two days from now it will be raining outside? • P(S4 = r | O0 = d, O1 = w, O2 = w) bsw = .1 bcw = .3 brw = .8
  • 66. Predicting further out, continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w) • Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w) • P(S3 = r | O0 = d, O1 = w, O2 = w) = Σs2 P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w) Σs2 P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w) • Etc. for S4 • So: monitoring first, then straightforward Markov process updates bsw = .1 bcw = .3 brw = .8
  • 67. Integrating newer information s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • You have been stuck in the lab for four days (!) • On those days, your labmate was dry, wet, wet, dry respectively • What is the probability that two days ago it was raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) – Smoothing or hindsight problem bsw = .1 bcw = .3 brw = .8
  • 68. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) • “Partial” application of Bayes’ rule: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) = P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) / P(O2 = w, O3 = d | O0 = d, O1 = w) • So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w) bsw = .1 bcw = .3 brw = .8
  • 69. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) • P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) = P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r) • Already know how to compute P(S1 = r | O0 = d, O1 = w) • Just need to compute P(O2 = w, O3 = d | S1 = r) bsw = .1 bcw = .3 brw = .8
  • 70. Hindsight problem continued… s c r .1 .2 .6 .3 .4 .3 .3 .5 .3 • Just need to compute P(O2 = w, O3 = d | S1 = r) • P(O2 = w, O3 = d | S1 = r) = Σs2 P(S2 = s2, O2 = w, O3 = d | S1 = r) = Σs2 P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2) • First two factors directly in the model; last factor is a “smaller” problem of the same kind • Use dynamic programming, backwards from the future – Similar to forwards approach from the past bsw = .1 bcw = .3 brw = .8