Chapter 13

Quantifying Uncertainty
Team Members :
1-Ahmed Talaat
2-Eman Mostafa
3-Haron Shihab
4-Hesham Hamdy
Cairo University
Faculty of Engineering
4th year Computer Department
1

 Acting under uncertainty
 Basic probability notation
 Inference using full join distribution
 Independence
 Bayes Rule
 Wumpus world
2

 Why use uncertainty ?
 In logical agents We have to consider all possible
explanations in
 This leads to very large and complex belief stat
representations
 Uncertainty provides a solution
4

 Uncertainty arises because of laziness and
ignorance.
 Our main tool for dealing with degrees of
belief is probability theory.
 The probability statements are made with
respect to a knowledge state, not with
respect to the real world.
5

 For the agent to make good choices , The
agent must have preferences between the
different suitable outputs
 Such choice is based on the utility of the
agent
6

 The use of the probability theory and the
utility theory combined is called the decision
theory
 Decision Theory = Probability +Utility
7

 Here are some probability notations that
we’re going to use
 Sample Space : The set of all possible worlds
 The Greek letter Omega  to represent the
sample space
 The lowercase Omega  to represents elements
in the sample space
9

 Unconditional probabilities (Priors )
 When rolling fair dice , if we assume that that
each die is fair and the rolls don’t interfere with
each others
 The set of possible worlds
 (1,1) , (1,2) , (1,3)….(2,1),(2,2),……(6,5),(6,6)
 P(Dice)=1/36
10

 Conditional probability
 The probability of a certain event happening ,
given the effect of another event (called
evidence)
 For example , The first die may be already
showing 5 and we are waiting for the other die to
settle down
 In that case , we are interested in the probability
of the other die given the first one is 5
 P(Dice |Die1=5)
11

 Semantics of a proposition
 the probability model is determined by the joint
distribution for all the random variables: the full
joint probability distribution
 for the Cavity, Toothache, Weather domain, the
notation is:
 P(Cavity, Toothache, Weather)
 this can be represented as a 2x2x4 table
 Given the definition of the probability of a
proposition as a sum over possible worlds, the
full joint distribution allows calculating the
probability of any proposition over its variables
by summing entries in the FJD
12

 The basic axioms of probability
1- 0  P()  1,
2-
So we can prove that P(⌐a)=1-P(a)
P(⌐a)=
= + -
= - = 1-P(a)



 1)(P



)()( PP
 a
P

)(
 a
P

)( a
P

)(

)(P
a
P

)(
a
P

)(
13

 Inclusion-Exclusion principle
P(a˅b)=P(a)+P(b)-P(a˄b)
 Kolmogorov’s axioms
Andrei Kolmogorov, who showed who to build up the rest of the
rest of probability theory
for example for his rules :
the fact that the logical agent cannot simultaneously believe
A,B, ⌐(A˄B),because there is no possible world in which all
three are true
14

 De Finetti example:
Agent 1 Agent 2
15

• P(a˅b)=P(a)+P(b)-P(a˄b)
P(a˅b)=P(a)+P(b)-P(a˄b)
=0.7!=0.8
Agent 1 probabilities
P(a)=0.4 P(a˅b)=0.8
P(b)=0.3 P(a˄b)=0
Agent 1
16

Agent 2 is the winner
Agent 1 Agent 2 Payoffs to Agent 1
A,b a, ⌐b ⌐a,b ⌐a, ⌐b
a 0.4 a 4 to 6 -6 -6 4 4
b 0.3 b 3 to 7 -7 3 -7 3
a˅b 0.8 ⌐(a˅b) 2 to 8 2 2 2 -8
-11 -1 -1 -1
17

Inference using full joint
distribution
18

toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
The probabilities in the joint distribution sum to 1
P( cavity ˅ toothache)=0.108+0.012+0.016+0.064=0.28
P(cavity)=0.108+0.012+0.072+0.008=0.2
19

cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
P(cavity)=0.108+0.012+0.072+0.008
marginalization or summing out :we sum
up the probabilities for each possible value of other
variables.
20

 Marginalization rule for any set of variables
Y and Z :


Zz
zYPYP ),()(
ZzWhere means to sum over all the possible
combinations of values of the set of variables Z
we abbreviate this as leaving Z implicitz


},{
),()(
ToothacheCatchz
zCavityPCavityP
21

 A variant of this rule involves conditional
probabilities instead of join distribution,
using the product rule as

z
zPzYPYP )()|()(
22

cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
P (cavity  toothache)
P (toothache )
(0.108+0.012)
(0.108 + 0.012 + 0.016 + 0.064)
= = 0.6
P(cavity| toothache)
23

cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
P(⌐cavity| toothache)
P(⌐cavity  toothache)
P (toothache )
(0.016+0.064)
(0.108 + 0.012 + 0.016 + 0.064)
= = 0.4
24

 P( cavity | toothache ) = p( cavity  toothache)
p( toothache )
• P(⌐ cavity | toothache ) =p(⌐ cavity  toothache)
p( toothache )
• 1/ P( toothache) is const α
cavity ⌐cavity
Cavity
25

cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
P( Cavity | toothache) = α P( Cavity, toothache)
= α[P( Cavity, toothache, catch)+ P( Cavity, toothache, catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>
26

 we begin with the case in which the
query involves a single variable
X(Cavity).Let E be the list of evidence
variables (just toothache), let e be the
list of observed values from them, let Y
be the list of unobserved variables (just
catch) the query is P(X|e) and can be
evaluated as

y
yeXPaeXaPeXP ),,(),()|(
27

 Full joint distribution requires an input table
of size O(2^n) and takes O(2^n) time to
process the table. In a realistic problem we
could easily have n>100, making O(2^n)
impractical.
28

 Adding new variable to the
Tootache,Catch,Cavity problem by adding
Weather.
 Now P(Toothache,Catch,Cavity,Weather)
have 2 x 2 x 2 x 4 entries if Weather has 4
values
 Is P(Toothache,Catch,Cavity,Weather=cloudy)
realted to P(Toothache,Catch,Cavity) ?
30

 Weather has nothing to do with one’s dental
problems.
 Then:
 P(Toothache,Catch,Cavity,Weather)
= P(Weather)P(Tootache,Catch,Cavity)
 Same thing with coin flips. Each flip isn’t
dependent on the next one.
 P(a|b) = P(a) , P(b|a) = P(b) independent
events a & b
31

Cavity
Toothache Catch
Weather
Decomposes into
Cavity
Toothache Catch
Weather
Coin1 ………. Coin n
Decomposes into
Coin1 ……….. Coin n
32

)()|()( bPbaPbaP 
)()|()( aPabPbaP 
)()|()()|()( aPabPbPbaPbaP 
evidence
priorlikelihood
aP
bpbaP
abp


)(
)()|(
)|(

34

 We use this when we have effect of some
unknown cause and we would like to
determine the cause.
 Bayes’ rule becomes:
)(
)()|(
)|(
effectP
causepcauseeffectP
effectcauseP 
• P(effect|cause) describes the Causal
relationship
• P(cause|effect) describes the Diagnostic
relationship
35

 In medical diagnosis:
 The Doctor knows P(Symptoms|disease)
 Wants to derive a diagnosis P(disease|symptoms)
)symptom(
)()disease|symptom(
)symptom|disease(
P
diseasepP
P 
36

 Patient has Stiff Neck (Symptom)
 Doctor tries to relate it to meningitis
 P(stiff neck|meningitis) = 0.7
 P(meningitis) = 1/50000
 P(stiff neck) = 0.01
37

 We expect with the given data that almost 1 in
700 patient with neck stiffness alone to have
meningitis.
)neckstiff(
)()meningitis|neckstiff(
)neckstiff|meningitis(
P
meningitispP
P 
0014.0
01.0
5000017.0



38

 Where α is the normalization factor the make
P(y|x) + P(x) = 1
)()|(
)(
)()|(
)|( xpxyp
yP
xpxyP
yxp 
39

 What happens when the dentist’s probe
catches the aching tooth of a patient?
 Might be feasible for just 2 evidence but with
n evidence we can have 2n possible
combination of evidence
)()|(
)|(
cavitypcavitycatchtoothachep
catchtoothachecavityp



40

 When the probe catches in the tooth, this
means the probably has cavity and that
means that it causes toothache.
 Then Catch and Toothache are not absolutely
independent but the are independent given
the presence or the absence of the cavity.
41

 The Wumpus world is uncertain because the
agent’s sensors give only partial information
about the world
46

 How would the logical agent tackle the
wumpus world ?
47

 The logical agent gets stuck after finding a
breeze in both [1,2] and [2,1]
 There is no more safe place to explore so it
has to chose randomly!
48

 Choses to explore the square with the
highest likelihood of being safe
 We will see soon that a probabilistic agent
can do much better than a logical agent
49

 Let’s define some random variables:
 Pi,j True iff square [i,j] contains a pit
 Bi,j True iff square [i,j] is breezy
 Facts:
 b = b1,1 ^ b1,2 ^ b2,1
 known = p1,1 ^ p1,2 ^ p2,1
 Our Goal:
 Answer Queries like P(P1,3 | known, b)
 That is how likely is it that [1,3] contains a bit
given the observations so far
50

 The next step is to specify the full joint
probability distribution
 That is P(P1,1 , … , P4,4 , B1,1 , B1,2 , B2,1)
 Applying the product rule, we have
P(P1,1 , … , P4,4 , B1,1 , B1,2 , B2,1) =
P(B1,1 , B1,2 , B2,1 | P1,1 , … , P4,4)P(P1,1 , … , P4,4)
 P(B1,1 , B1,2 , B2,1 | P1,1 , … , P4,4) = 1 if breezes
are adjacent to pits and 0 otherwise.
51

 Assume that each square contains a pit with
p = 0.2, independently of other squares
 So P(P1,1 , … , P4,4) = 𝑖,𝑗=1,1
4,4
P(Pi,j)
 Generally for n pits:
P(P1,1 , … , P4,4) = 0.2n * 0.816-n
52

 P(P1,3 | known, b) = α 𝑢𝑛𝑘𝑜𝑤𝑛 P(𝑃1,3 ,
unkown, b)
 We seem to have reached our goal, but
there is a big problem
 There are 12 unknown squares, hence the
summation contains 212 = 4096 terms
 Summation grows exponentially with the
number of squares
53

 Intuition: Other squares are irrelevant
 [4,4] does NOT affect whether [1,3] has a pit
or not
 This intuition helps in reducing the
summation terms
 Remember that:
frontier is the pit variables other than the
query variable that are adjacent to visited
squares
54

P(P1,3 | known, b)
= α 𝑢𝑛𝑘𝑜𝑤𝑛 P(𝑃1,3 , known, unkown, b)
= α 𝑢𝑛𝑘𝑜𝑤𝑛 P b 𝑃1,3 , known, unknown)P(P1,3 ,
known, unknown) (by product rule)
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 𝑜𝑡ℎ𝑒𝑟 P b 𝑃1,3 , known , frontier,
other) P(P1,3 , known, frontier, other)
 Since b is independent of other given known, P1,3 , and
frontier
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 𝑜𝑡ℎ𝑒𝑟 P b 𝑃1,3 , known , frontier)
P(P1,3 , known, frontier, other)
55

P(P1,3 | known, b)
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 P 𝑏 𝑘𝑛𝑜𝑤𝑛, 𝑃1,3 , frontier)
𝑜𝑡ℎ𝑒𝑟 P(𝑃1,3 , known, frontier, other)
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 P 𝑏 𝑘𝑛𝑜𝑤𝑛, 𝑃1,3 , frontier)
𝑜𝑡ℎ𝑒𝑟 P(𝑃1,3)P(known)P(frontier)P(other)
= α P(known)P(P1,3) 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 P 𝑏 𝑘𝑛𝑜𝑤𝑛, 𝑃1,3 ,
frontier)P(frontier) 𝑜𝑡ℎ𝑒𝑟 𝑃(𝑜𝑡ℎ𝑒𝑟)
= α’ P(P1,3) 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 P 𝑏 𝑘𝑛𝑜𝑤𝑛, 𝑃1,3 ,
frontier)P(frontier)
56

 P(P1,3 | known, b) = α’(0.2(0.04 + 0.16 +
0.16), 0.8(0.04 + 0.16)) ≈ (0.31, 0.69)
57

 That is [1,3] contains a pit with roughly 31%
probability
 Similarly [2,2] contains a pit with roughly
86% probability.
 The agent should definitely avoid [2,2]
 That’s why probabilistic agent is much better
than the logical agent in the wumpus world
58

Chapter 13

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chapter 13

Similar to Chapter 13 (20)

Recently uploaded

Recently uploaded (20)

Chapter 13