1. Quantifying Uncertainty
Team Members :
1-Ahmed Talaat
2-Eman Mostafa
3-Haron Shihab
4-Hesham Hamdy
Cairo University
Faculty of Engineering
4th year Computer Department
1
2. Acting under uncertainty
Basic probability notation
Inference using full join distribution
Independence
Bayes Rule
Wumpus world
2
4. Why use uncertainty ?
In logical agents We have to consider all possible
explanations in
This leads to very large and complex belief stat
representations
Uncertainty provides a solution
4
5. Uncertainty arises because of laziness and
ignorance.
Our main tool for dealing with degrees of
belief is probability theory.
The probability statements are made with
respect to a knowledge state, not with
respect to the real world.
5
6. For the agent to make good choices , The
agent must have preferences between the
different suitable outputs
Such choice is based on the utility of the
agent
6
7. The use of the probability theory and the
utility theory combined is called the decision
theory
Decision Theory = Probability +Utility
7
9. Here are some probability notations that
we’re going to use
Sample Space : The set of all possible worlds
The Greek letter Omega to represent the
sample space
The lowercase Omega to represents elements
in the sample space
9
10. Unconditional probabilities (Priors )
When rolling fair dice , if we assume that that
each die is fair and the rolls don’t interfere with
each others
The set of possible worlds
(1,1) , (1,2) , (1,3)….(2,1),(2,2),……(6,5),(6,6)
P(Dice)=1/36
10
11. Conditional probability
The probability of a certain event happening ,
given the effect of another event (called
evidence)
For example , The first die may be already
showing 5 and we are waiting for the other die to
settle down
In that case , we are interested in the probability
of the other die given the first one is 5
P(Dice |Die1=5)
11
12. Semantics of a proposition
the probability model is determined by the joint
distribution for all the random variables: the full
joint probability distribution
for the Cavity, Toothache, Weather domain, the
notation is:
P(Cavity, Toothache, Weather)
this can be represented as a 2x2x4 table
Given the definition of the probability of a
proposition as a sum over possible worlds, the
full joint distribution allows calculating the
probability of any proposition over its variables
by summing entries in the FJD
12
13. The basic axioms of probability
1- 0 P() 1,
2-
So we can prove that P(⌐a)=1-P(a)
P(⌐a)=
= + -
= - = 1-P(a)
1)(P
)()( PP
a
P
)(
a
P
)( a
P
)(
)(P
a
P
)(
a
P
)(
13
14. Inclusion-Exclusion principle
P(a˅b)=P(a)+P(b)-P(a˄b)
Kolmogorov’s axioms
Andrei Kolmogorov, who showed who to build up the rest of the
rest of probability theory
for example for his rules :
the fact that the logical agent cannot simultaneously believe
A,B, ⌐(A˄B),because there is no possible world in which all
three are true
14
17. Agent 2 is the winner
Agent 1 Agent 2 Payoffs to Agent 1
A,b a, ⌐b ⌐a,b ⌐a, ⌐b
a 0.4 a 4 to 6 -6 -6 4 4
b 0.3 b 3 to 7 -7 3 -7 3
a˅b 0.8 ⌐(a˅b) 2 to 8 2 2 2 -8
-11 -1 -1 -1
17
19. toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
The probabilities in the joint distribution sum to 1
P( cavity ˅ toothache)=0.108+0.012+0.016+0.064=0.28
P(cavity)=0.108+0.012+0.072+0.008=0.2
19
20. toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
P(cavity)=0.108+0.012+0.072+0.008
marginalization or summing out :we sum
up the probabilities for each possible value of other
variables.
20
21. Marginalization rule for any set of variables
Y and Z :
Zz
zYPYP ),()(
ZzWhere means to sum over all the possible
combinations of values of the set of variables Z
we abbreviate this as leaving Z implicitz
},{
),()(
ToothacheCatchz
zCavityPCavityP
21
22. A variant of this rule involves conditional
probabilities instead of join distribution,
using the product rule as
z
zPzYPYP )()|()(
22
23. toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
P (cavity toothache)
P (toothache )
(0.108+0.012)
(0.108 + 0.012 + 0.016 + 0.064)
= = 0.6
P(cavity| toothache)
23
24. toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
P(⌐cavity| toothache)
P(⌐cavity toothache)
P (toothache )
(0.016+0.064)
(0.108 + 0.012 + 0.016 + 0.064)
= = 0.4
24
26. toothache ⌐toothache
catch ⌐catch catch ⌐catch
cavity 0.108 0.012 0.072 0.008
⌐cavity 0.016 0.064 0.144 0.576
A full joint distribution for the Toothache , Cavity, Catch world
P( Cavity | toothache) = α P( Cavity, toothache)
= α[P( Cavity, toothache, catch)+ P( Cavity, toothache, catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>
26
27. we begin with the case in which the
query involves a single variable
X(Cavity).Let E be the list of evidence
variables (just toothache), let e be the
list of observed values from them, let Y
be the list of unobserved variables (just
catch) the query is P(X|e) and can be
evaluated as
y
yeXPaeXaPeXP ),,(),()|(
27
28. Full joint distribution requires an input table
of size O(2^n) and takes O(2^n) time to
process the table. In a realistic problem we
could easily have n>100, making O(2^n)
impractical.
28
30. Adding new variable to the
Tootache,Catch,Cavity problem by adding
Weather.
Now P(Toothache,Catch,Cavity,Weather)
have 2 x 2 x 2 x 4 entries if Weather has 4
values
Is P(Toothache,Catch,Cavity,Weather=cloudy)
realted to P(Toothache,Catch,Cavity) ?
30
31. Weather has nothing to do with one’s dental
problems.
Then:
P(Toothache,Catch,Cavity,Weather)
= P(Weather)P(Tootache,Catch,Cavity)
Same thing with coin flips. Each flip isn’t
dependent on the next one.
P(a|b) = P(a) , P(b|a) = P(b) independent
events a & b
31
35. We use this when we have effect of some
unknown cause and we would like to
determine the cause.
Bayes’ rule becomes:
)(
)()|(
)|(
effectP
causepcauseeffectP
effectcauseP
• P(effect|cause) describes the Causal
relationship
• P(cause|effect) describes the Diagnostic
relationship
35
36. In medical diagnosis:
The Doctor knows P(Symptoms|disease)
Wants to derive a diagnosis P(disease|symptoms)
)symptom(
)()disease|symptom(
)symptom|disease(
P
diseasepP
P
36
37. Patient has Stiff Neck (Symptom)
Doctor tries to relate it to meningitis
P(stiff neck|meningitis) = 0.7
P(meningitis) = 1/50000
P(stiff neck) = 0.01
37
38. We expect with the given data that almost 1 in
700 patient with neck stiffness alone to have
meningitis.
)neckstiff(
)()meningitis|neckstiff(
)neckstiff|meningitis(
P
meningitispP
P
0014.0
01.0
5000017.0
38
39. Where α is the normalization factor the make
P(y|x) + P(x) = 1
)()|(
)(
)()|(
)|( xpxyp
yP
xpxyP
yxp
39
40. What happens when the dentist’s probe
catches the aching tooth of a patient?
Might be feasible for just 2 evidence but with
n evidence we can have 2n possible
combination of evidence
)()|(
)|(
cavitypcavitycatchtoothachep
catchtoothachecavityp
40
41. When the probe catches in the tooth, this
means the probably has cavity and that
means that it causes toothache.
Then Catch and Toothache are not absolutely
independent but the are independent given
the presence or the absence of the cavity.
41
46. The Wumpus world is uncertain because the
agent’s sensors give only partial information
about the world
46
47. How would the logical agent tackle the
wumpus world ?
47
48. The logical agent gets stuck after finding a
breeze in both [1,2] and [2,1]
There is no more safe place to explore so it
has to chose randomly!
48
49. Choses to explore the square with the
highest likelihood of being safe
We will see soon that a probabilistic agent
can do much better than a logical agent
49
50. Let’s define some random variables:
Pi,j True iff square [i,j] contains a pit
Bi,j True iff square [i,j] is breezy
Facts:
b = b1,1 ^ b1,2 ^ b2,1
known = p1,1 ^ p1,2 ^ p2,1
Our Goal:
Answer Queries like P(P1,3 | known, b)
That is how likely is it that [1,3] contains a bit
given the observations so far
50
51. The next step is to specify the full joint
probability distribution
That is P(P1,1 , … , P4,4 , B1,1 , B1,2 , B2,1)
Applying the product rule, we have
P(P1,1 , … , P4,4 , B1,1 , B1,2 , B2,1) =
P(B1,1 , B1,2 , B2,1 | P1,1 , … , P4,4)P(P1,1 , … , P4,4)
P(B1,1 , B1,2 , B2,1 | P1,1 , … , P4,4) = 1 if breezes
are adjacent to pits and 0 otherwise.
51
52. Assume that each square contains a pit with
p = 0.2, independently of other squares
So P(P1,1 , … , P4,4) = 𝑖,𝑗=1,1
4,4
P(Pi,j)
Generally for n pits:
P(P1,1 , … , P4,4) = 0.2n * 0.816-n
52
53. P(P1,3 | known, b) = α 𝑢𝑛𝑘𝑜𝑤𝑛 P(𝑃1,3 ,
unkown, b)
We seem to have reached our goal, but
there is a big problem
There are 12 unknown squares, hence the
summation contains 212 = 4096 terms
Summation grows exponentially with the
number of squares
53
54. Intuition: Other squares are irrelevant
[4,4] does NOT affect whether [1,3] has a pit
or not
This intuition helps in reducing the
summation terms
Remember that:
frontier is the pit variables other than the
query variable that are adjacent to visited
squares
54
55. P(P1,3 | known, b)
= α 𝑢𝑛𝑘𝑜𝑤𝑛 P(𝑃1,3 , known, unkown, b)
= α 𝑢𝑛𝑘𝑜𝑤𝑛 P b 𝑃1,3 , known, unknown)P(P1,3 ,
known, unknown) (by product rule)
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 𝑜𝑡ℎ𝑒𝑟 P b 𝑃1,3 , known , frontier,
other) P(P1,3 , known, frontier, other)
Since b is independent of other given known, P1,3 , and
frontier
= α 𝑓𝑟𝑜𝑛𝑡𝑖𝑒𝑟 𝑜𝑡ℎ𝑒𝑟 P b 𝑃1,3 , known , frontier)
P(P1,3 , known, frontier, other)
55
58. That is [1,3] contains a pit with roughly 31%
probability
Similarly [2,2] contains a pit with roughly
86% probability.
The agent should definitely avoid [2,2]
That’s why probabilistic agent is much better
than the logical agent in the wumpus world
58