SlideShare a Scribd company logo
1 of 34
Download to read offline
CS:4420 Artificial Intelligence
Spring 2019
Uncertainty
Cesare Tinelli
The University of Iowa
Copyright 2004–19, Cesare Tinelli and Stuart Russell a
a These notes were originally developed by Stuart Russell and are used with permission. They are
copyrighted material and may not be used in other course settings outside of the University of Iowa in their
current or modified form without the express written consent of the copyright holders.
CS:4420 Spring 2019 – p.1/33
Readings
• Chap. 13 of [Russell and Norvig, 3rd Edition]
CS:4420 Spring 2019 – p.2/33
Logic and Uncertainty
Major problem with logical-agent approaches:
Agents almost never have access to the whole truth about
their environments
• Very often, even in simple worlds, there are important questions
for which there is no yes/no answer
• In that case, an agent must reason under uncertainty
• Uncertainty also arises because of an agent’s incomplete or
incorrect understanding of its environment
CS:4420 Spring 2019 – p.3/33
Uncertainty
Let action At =“leave for airport t minutes before flight”
Will At get me there on time?
Problems
• partial observability (road state, other drivers’ plans, etc.)
• noisy sensors (unreliable traffic reports)
• uncertainty in action outcomes (flat tire, etc.)
• immense complexity of modeling and predicting traffic
CS:4420 Spring 2019 – p.4/33
Uncertainty
Let action At =“leave for airport t minutes before flight”
Will At get me there on time?
A purely logical approach either
1. risks falsehood (“A25 will get me there on time”), or
2. leads to conclusions that are too weak for decision making
(“A25 will get me there on time if there’s no accident on the way,
it doesn’t rain, my tires remain intact, . . . ”)
(A1440 might reasonably be said to get me there on time
but I’d have to stay overnight in the airport . . .)
CS:4420 Spring 2019 – p.4/33
Reasoning under Uncertainty
A rational agent is one that makes rational decisions — in order to
maximize its performance measure)
A rational decision depends on
• the relative importance of various goals
• the likelihood they will be achieved
• the degree to which they will be achieved
CS:4420 Spring 2019 – p.5/33
Handling Uncertain Knowledge
Reasons FOL-based approaches fail to cope with domains like, for
instance, medical diagnosis:
• Laziness: too much work to write complete axioms, or too hard
to work with the enormous sentences that result
• Theoretical Ignorance: The available knowledge of the domain is
incomplete
• Practical Ignorance: The theoretical knowledge of the domain is
complete but some evidential facts are missing
CS:4420 Spring 2019 – p.6/33
Degrees of Belief
In several real-world domains the agent’s knowledge can only provide a
degree of belief in the relevant sentences
The agent cannot say whether a sentence is true, but only that is true
x% of the times
The main tool for handling degrees of belief is Probability Theory
The use of probability summarizes the uncertainty that stems from our
laziness or ignorance about the domain
CS:4420 Spring 2019 – p.7/33
Probability Theory
Probability Theory makes the same ontological commitments as
First-order Logic:
Every sentence ϕ is either true or false
The degree of belief that ϕ is true is a number P between 0 and 1
P(ϕ) = 1 −→ ϕ is certainly true
P(ϕ) = 0 −→ ϕ is certainly not true
P(ϕ) = 0.65 −→ ϕ is true with a 65% chance
CS:4420 Spring 2019 – p.8/33
Probability of Facts
Let A be a propositional variable, a symbol denoting a proposition
that is either true or false
P(A) denotes the probability that A is true in the absence of any
other information
Similarly,
P(¬A) = probability that A is false
P(A ∧ B) = probability that both A and B are true
P(A ∨ B) = probability that either A or B (or both) are true
Examples:
P(¬Blonde) P(Blonde ∧ BlueEyed) P(Blonde ∨ BlueEyed)
where Blonde/BlueEyed denotes that a given person is
blonde/blue-eyed
CS:4420 Spring 2019 – p.9/33
Subjective/Bayesian Probability
Probabilities relate propositions to one’s own state of knowledge
E.g., P(A25 | no reported accidents) = 0.06
Probabilities of propositions change with new evidence:
E.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
Note: This is analogous to logical entailment status KB |= α
(which changes with more knowledge), not truth
CS:4420 Spring 2019 – p.10/33
Conditional/Unconditional Probability
P(A) is the unconditional (or prior) probability of fact A
An agent can use the unconditional probability of A to reason about
A in the absence of further information
If further evidence B becomes available, the agent must use the
conditional (or posterior) probability:
P(A | B)
the probability of A given that (all) the agent knows (is) B
Note: P(A) can be thought as the conditional probability of A with
respect to the empty evidence: P(A) = P(A | )
CS:4420 Spring 2019 – p.11/33
Conditional Probabilities
The probability of a fact may change as the agent acquires more, or
different, information:
1. P(Blonde) 2. P(Blonde | Swedish)
3. P(Blonde | Kenian) 4. P(Blonde | Kenian ∧ ¬EuroDescent)
1. If we know nothing about a person, the probability that s/he is
blonde equals a certain value, say 0.2
2. If we know that a person is Swedish the probability that s/he is
blonde is much higher, say 0.9
3. If we know that the person is Kenyan, the probability s/he is
blonde much lower, say 0.00003
4. If we know that the person is Kenyan and not of European
descent, the probability s/he is blonde is 0
CS:4420 Spring 2019 – p.12/33
The Axioms of Probability
Probability Theory is governed by the following axioms:
1. All probabilities are real values between 0 and 1:
for all ϕ, 0 ≤ P(ϕ) ≤ 1
2. Valid propositions have probability 1
P(True) = P(α ∨ ¬α) = 1
3. The probability of disjunction is defined as follows:
P(α ∨ β) = P(α) + P(β) − P(α ∧ β)
CS:4420 Spring 2019 – p.13/33
Understanding Axiom 3
>
A B
True
A B
P(A ∨ B) = P(A) + P(B) − P(A ∧ B)
CS:4420 Spring 2019 – p.14/33
Conditional Probabilities
Conditional probabilities are defined in terms of unconditional ones
Whenever P(B) > 0,
P(A | B) =
P(A ∧ B)
P(B)
This can be equivalently expressed as the product rule:
P(A ∧ B) = P(A | B) P(B)
= P(B | A) P(A)
A and B are independent iff P(A | B) = P(A)
iff P(B | A) = P(B)
iff P(A ∧ B) = P(A)P(B)
CS:4420 Spring 2019 – p.15/33
Random Variable
A random variable is a variable ranging over a certain domain of
values
It is
• discrete if it ranges over a discrete (that is, countable) domain
• continuous if it ranges over the real numbers
We will only consider discrete random variables with finite domains
Note: Propositional variables can be seen as random variables over
the Boolean domain
CS:4420 Spring 2019 – p.16/33
Random Variables
Variable Domain
Age {1, 2, . . . , 120}
Weather {sunny, dry, cloudy, rain, snow}
Size {small, medium, large}
Blonde {true, false}
The probability that a random variable X has value val is written as
P(X = val)
Note 1: P(A = true) is written shortly as P(a) while P(A = false)
is written as P(¬a)
Note 2: Traditionally, in Probability Theory variables are capitalized
and constant values are not
CS:4420 Spring 2019 – p.17/33
Probability Distribution
If X is a random variable, we use the bold case P(X) to denote a
vector of values for the probabilities of each individual element that X
can take.
Example:
P(Weather = sunny) = 0.6
P(Weather = rain) = 0.2
P(Weather = cloudy) = 0.18
P(Weather = snow) = 0.02
Then P(Weather) = h0.6, 0.2, 0.18, 0.02i
(the value order of“sunny”,“rain”, cloudy”,“snow”is assumed)
P(Weather) is called a probability distribution for the random
variable Weather
CS:4420 Spring 2019 – p.18/33
Joint Probability Distribution
If X1, . . . , Xn are random variables,
P(X1, . . . , Xn)
denotes their joint probability distribution (JPD), an n-dimensional
matrix specifying the probability of every possible combination of
values for X1, . . . , Xn
Example
Sky : {sunny, cloudy, rain, snow}
Wind : {true, false}
P(Wind, Sky) =
sunny cloudy rain snow
true 0.30 0.15 0.17 0.01
false 0.30 0.05 0.01 0.01
CS:4420 Spring 2019 – p.19/33
Joint Probability Distribution
All relevant probabilities about a vector hX1, . . . , Xni of random
variables can be computed from P(X1, . . . , Xn)
S = sunny S = cloudy S = rain S = snow P(W)
W 0.30 0.15 0.17 0.01 0.63
¬W 0.30 0.05 0.01 0.01 0.37
P(S) 0.60 0.20 0.18 0.02 1.00
P(S = rain ∧ W) = 0.17
P(S = rain) = P(S = rain ∧ W) + P(S = rain ∧ ¬W)
= 0.17 + 0.01 = 0.18
P(W) = 0.30 + 0.15 + 0.17 + 0.01 = 0.63
P(S = rain | W) = P(S = rain ∧ W)/P(W)
= 0.17/0.63 = 0.27
CS:4420 Spring 2019 – p.20/33
Joint Probability Distribution
A joint probability distribution P(X1, . . . , Xn) provides complete
information about the probabilities of its random variables.
However, JPD’s are often hard to create (again because of incomplete
knowledge of the domain).
Even when available, JPD tables are very expensive, or impossible, to
store because of their size.
A JPD table for n random variables, each ranging over k distinct
values, has kn entries!
A better approach is to come up with conditional probabilities as
needed and compute the others from them.
CS:4420 Spring 2019 – p.21/33
An Alternative to JPD: The Bayes Rule
Recall that for any fact A and B,
P(A ∧ B) = P(A | B)P(B) = P(B | A)P(A)
From this we obtain the Bayes Rule:
P(B | A) =
P(A | B) P(B)
P(A)
The rule is useful in practice because it is often easier to
compute/estimate P(A | B), P(B), and P(A) than to
compute/estimate P(B | A) directly
CS:4420 Spring 2019 – p.22/33
Applying the Bayes Rule
What is the probability that a patient has meningitis (M) given that
he has a stiff neck (S)?
P(m | s) =
P(s | m) P(m)
P(s)
• P(s | m) is easier to estimate than P(m | s) because it refers to
causal knowledge: meningitis typically causes stiff neck
• P(s | m) can be estimated from past medical cases and the
knowledge about how meningitis works
• Similarly, P(m) and P(s) can be estimated from statistical
information
CS:4420 Spring 2019 – p.23/33
Applying the Bayes Rule
The Bayes rule is helpful even in absence of (immediate) causal
relationships
What is the probability that a blonde person (B) is Swedish (S)?
P(s | b) =
P(b | s)P(s)
P(b)
All P(b | s), P(s), P(b) are more easily estimated from statistical
information
P(b | s) ≈ # of blonde Swedish
|Swedish population| = 9
10
P(s) ≈ |Swedish population|
|world population| = . . .
P(b) ≈ # of blondes
|world population| = . . .
CS:4420 Spring 2019 – p.24/33
Conditional Independence
In terms of exponential explosion, conditional probabilities do not
seem any better than JPD’s for computing the probability of a fact,
given n > 1 pieces of evidence
P(meningitis | stiffNeck ∧ nausea ∧ · · · ∧ doubleVision)
However, facts do not always depend on all the evidence.
Example:
P(meningitis | stiffNeck ∧ astigmatic) = P(meningitis | stiffNeck)
Meningitis and Astigmatic are conditionally independent, given
knowledge of StiffNeck
CS:4420 Spring 2019 – p.25/33
Conditional Probability Notation
Recall:
P(A ∧ B) = P(A | B) P(B) = P(B | A) P(A)
A general version holds for whole joint probability distributions, e.g.,
P(Sky, Wind) = P(Sky | Wind) P(Wind)
stands for (with S for Sky and W for Wind):
P(S = sunny, W = true) = P(S = sunny | W = true) P(W = true)
P(S = sunny, W = false) = P(S = sunny | W = false) P(W = false)
.
.
.
P(S = snow, W = false) = P(S = snow | W = false) P(W = false)
I.e., a 4 × 2 set of equations, not matrix multiplication
CS:4420 Spring 2019 – p.26/33
Conditional Probability Notation
The chain rule is derived by successive application of product rule:
P(X1, . . . , Xn)
= P(X1, . . . , Xn−1) P(Xn|X1, . . . , Xn−1)
= P(X1, . . . , Xn−2) P(Xn−1|X1, . . . , Xn−2) P(Xn|X1, . . . , Xn−1)
.
.
.
= Πn
i = 1P(Xi|X1, . . . , Xi−1)
CS:4420 Spring 2019 – p.27/33
Inference by enumeration
Let X be all the variables. Typically, we want
the posterior joint distribution of the query variables Y
given specific values e for the evidence variables E
Let the hidden variables be H = X − Y − E. Then,
P(Y | E = e) = αP(Y, E = e) = αΣhP(Y, E = e, H = h)
where α = P(E = e)−1
Problems:
1. Worst-case time complexity O(dn
) where d is the largest arity
2. Space complexity O(dn) to store the joint distribution
3. How to find the numbers for O(dn) entries?
CS:4420 Spring 2019 – p.28/33
Independence
A and B are independent iff
P(A | B) = P(A) iff P(B | A) = P(B) iff P(A, B) = P(A) P(B)
Example:
P(Toothache, Catch, Cavity, Weather)
= P(Toothache, Catch, Cavity) P(Weather)
32 entries reduced to 12; for n independent biased coins, 2n → n
Problem: Absolute independence is powerful but rare
For instance, dentistry is a large field with hundreds of variables, none
of which are independent.
What to do? Exploit conditional independence
CS:4420 Spring 2019 – p.29/33
Conditional independence
P(Toothache, Cavity, Catch) has 23 − 1 = 7 independent entries
If I have a cavity, the probability that the probe catches in it doesn’t
depend on whether I have a toothache or not:
P(catch | toothache, cavity) = P(catch | cavity)
The same independence holds if I have no cavity:
P(catch | toothache, ¬cavity) = P(catch | ¬cavity)
Catch is conditionally independent on Toothache given Cavity:
P(Catch | Toothache, Cavity) = P(Catch | Cavity)
Similarly:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
CS:4420 Spring 2019 – p.30/33
Conditional independence contd.
Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
Now we have only 2 + 2 + 1 = 5 independent entries
In most cases, conditional independence reduces the size of the
representation of the JPD from exponential to linear in n
Conditional independence is our most basic and robust form of
knowledge about uncertain environments
CS:4420 Spring 2019 – p.31/33
Bayes’ Rule and cond. independence
P(Cavity | toothache ∧ catch)
= α P(toothache ∧ catch | Cavity) P(Cavity)
= α P(toothache | Cavity) P(catch | Cavity) P(Cavity)
This is an example of a naive Bayes model:
P(Cause, Effect1, . . . , Effectn) = Πn
i=1P(Effecti | Cause) P(Cause)
Toothache
Cavity
Catch
Cause
Effect1 Effectn
Total number of parameters is linear in n
CS:4420 Spring 2019 – p.32/33
Bayesian Networks
Exploiting conditional independence information is crucial in making
(automated) probabilistic reasoning feasible
Bayesian Networks are a successful example of probabilistic systems
that exploit conditional independence to reason efficiently under
uncertainty
CS:4420 Spring 2019 – p.33/33

More Related Content

Similar to 13-uncertainty.pdf

Probability Theory.pdf
Probability Theory.pdfProbability Theory.pdf
Probability Theory.pdfcodewithmensah
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Tish997
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data AnalyticsSSaudia
 
Probability Review Additions
Probability Review AdditionsProbability Review Additions
Probability Review Additionsrishi.indian
 
Machine Learning - Naive bayes
Machine Learning - Naive bayesMachine Learning - Naive bayes
Machine Learning - Naive bayeskishanthkumaar
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxpavantech57
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributionsRajaKrishnan M
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docxaryan532920
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMarina Santini
 
Chapter 05
Chapter 05Chapter 05
Chapter 05bmcfad01
 
chap03--Discrete random variables probability ai and ml R2021.pdf
chap03--Discrete random variables probability ai and ml R2021.pdfchap03--Discrete random variables probability ai and ml R2021.pdf
chap03--Discrete random variables probability ai and ml R2021.pdfmitopof121
 
Probabilistic decision making
Probabilistic decision makingProbabilistic decision making
Probabilistic decision makingshri1984
 

Similar to 13-uncertainty.pdf (20)

Probability Theory.pdf
Probability Theory.pdfProbability Theory.pdf
Probability Theory.pdf
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
 
Probability Review Additions
Probability Review AdditionsProbability Review Additions
Probability Review Additions
 
Machine Learning - Naive bayes
Machine Learning - Naive bayesMachine Learning - Naive bayes
Machine Learning - Naive bayes
 
Lecture3
Lecture3Lecture3
Lecture3
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
Chap03 probability
Chap03 probabilityChap03 probability
Chap03 probability
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
3.2 probablity
3.2 probablity3.2 probablity
3.2 probablity
 
1 Probability Please read sections 3.1 – 3.3 in your .docx
 1 Probability   Please read sections 3.1 – 3.3 in your .docx 1 Probability   Please read sections 3.1 – 3.3 in your .docx
1 Probability Please read sections 3.1 – 3.3 in your .docx
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
 
Chapter 05
Chapter 05Chapter 05
Chapter 05
 
chap03--Discrete random variables probability ai and ml R2021.pdf
chap03--Discrete random variables probability ai and ml R2021.pdfchap03--Discrete random variables probability ai and ml R2021.pdf
chap03--Discrete random variables probability ai and ml R2021.pdf
 
Probabilistic decision making
Probabilistic decision makingProbabilistic decision making
Probabilistic decision making
 
AI_7 Statistical Reasoning
AI_7 Statistical Reasoning AI_7 Statistical Reasoning
AI_7 Statistical Reasoning
 
Lesson 5.ppt
Lesson 5.pptLesson 5.ppt
Lesson 5.ppt
 

Recently uploaded

EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 

Recently uploaded (20)

EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 

13-uncertainty.pdf

  • 1. CS:4420 Artificial Intelligence Spring 2019 Uncertainty Cesare Tinelli The University of Iowa Copyright 2004–19, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart Russell and are used with permission. They are copyrighted material and may not be used in other course settings outside of the University of Iowa in their current or modified form without the express written consent of the copyright holders. CS:4420 Spring 2019 – p.1/33
  • 2. Readings • Chap. 13 of [Russell and Norvig, 3rd Edition] CS:4420 Spring 2019 – p.2/33
  • 3. Logic and Uncertainty Major problem with logical-agent approaches: Agents almost never have access to the whole truth about their environments • Very often, even in simple worlds, there are important questions for which there is no yes/no answer • In that case, an agent must reason under uncertainty • Uncertainty also arises because of an agent’s incomplete or incorrect understanding of its environment CS:4420 Spring 2019 – p.3/33
  • 4. Uncertainty Let action At =“leave for airport t minutes before flight” Will At get me there on time? Problems • partial observability (road state, other drivers’ plans, etc.) • noisy sensors (unreliable traffic reports) • uncertainty in action outcomes (flat tire, etc.) • immense complexity of modeling and predicting traffic CS:4420 Spring 2019 – p.4/33
  • 5. Uncertainty Let action At =“leave for airport t minutes before flight” Will At get me there on time? A purely logical approach either 1. risks falsehood (“A25 will get me there on time”), or 2. leads to conclusions that are too weak for decision making (“A25 will get me there on time if there’s no accident on the way, it doesn’t rain, my tires remain intact, . . . ”) (A1440 might reasonably be said to get me there on time but I’d have to stay overnight in the airport . . .) CS:4420 Spring 2019 – p.4/33
  • 6. Reasoning under Uncertainty A rational agent is one that makes rational decisions — in order to maximize its performance measure) A rational decision depends on • the relative importance of various goals • the likelihood they will be achieved • the degree to which they will be achieved CS:4420 Spring 2019 – p.5/33
  • 7. Handling Uncertain Knowledge Reasons FOL-based approaches fail to cope with domains like, for instance, medical diagnosis: • Laziness: too much work to write complete axioms, or too hard to work with the enormous sentences that result • Theoretical Ignorance: The available knowledge of the domain is incomplete • Practical Ignorance: The theoretical knowledge of the domain is complete but some evidential facts are missing CS:4420 Spring 2019 – p.6/33
  • 8. Degrees of Belief In several real-world domains the agent’s knowledge can only provide a degree of belief in the relevant sentences The agent cannot say whether a sentence is true, but only that is true x% of the times The main tool for handling degrees of belief is Probability Theory The use of probability summarizes the uncertainty that stems from our laziness or ignorance about the domain CS:4420 Spring 2019 – p.7/33
  • 9. Probability Theory Probability Theory makes the same ontological commitments as First-order Logic: Every sentence ϕ is either true or false The degree of belief that ϕ is true is a number P between 0 and 1 P(ϕ) = 1 −→ ϕ is certainly true P(ϕ) = 0 −→ ϕ is certainly not true P(ϕ) = 0.65 −→ ϕ is true with a 65% chance CS:4420 Spring 2019 – p.8/33
  • 10. Probability of Facts Let A be a propositional variable, a symbol denoting a proposition that is either true or false P(A) denotes the probability that A is true in the absence of any other information Similarly, P(¬A) = probability that A is false P(A ∧ B) = probability that both A and B are true P(A ∨ B) = probability that either A or B (or both) are true Examples: P(¬Blonde) P(Blonde ∧ BlueEyed) P(Blonde ∨ BlueEyed) where Blonde/BlueEyed denotes that a given person is blonde/blue-eyed CS:4420 Spring 2019 – p.9/33
  • 11. Subjective/Bayesian Probability Probabilities relate propositions to one’s own state of knowledge E.g., P(A25 | no reported accidents) = 0.06 Probabilities of propositions change with new evidence: E.g., P(A25 | no reported accidents, 5 a.m.) = 0.15 Note: This is analogous to logical entailment status KB |= α (which changes with more knowledge), not truth CS:4420 Spring 2019 – p.10/33
  • 12. Conditional/Unconditional Probability P(A) is the unconditional (or prior) probability of fact A An agent can use the unconditional probability of A to reason about A in the absence of further information If further evidence B becomes available, the agent must use the conditional (or posterior) probability: P(A | B) the probability of A given that (all) the agent knows (is) B Note: P(A) can be thought as the conditional probability of A with respect to the empty evidence: P(A) = P(A | ) CS:4420 Spring 2019 – p.11/33
  • 13. Conditional Probabilities The probability of a fact may change as the agent acquires more, or different, information: 1. P(Blonde) 2. P(Blonde | Swedish) 3. P(Blonde | Kenian) 4. P(Blonde | Kenian ∧ ¬EuroDescent) 1. If we know nothing about a person, the probability that s/he is blonde equals a certain value, say 0.2 2. If we know that a person is Swedish the probability that s/he is blonde is much higher, say 0.9 3. If we know that the person is Kenyan, the probability s/he is blonde much lower, say 0.00003 4. If we know that the person is Kenyan and not of European descent, the probability s/he is blonde is 0 CS:4420 Spring 2019 – p.12/33
  • 14. The Axioms of Probability Probability Theory is governed by the following axioms: 1. All probabilities are real values between 0 and 1: for all ϕ, 0 ≤ P(ϕ) ≤ 1 2. Valid propositions have probability 1 P(True) = P(α ∨ ¬α) = 1 3. The probability of disjunction is defined as follows: P(α ∨ β) = P(α) + P(β) − P(α ∧ β) CS:4420 Spring 2019 – p.13/33
  • 15. Understanding Axiom 3 > A B True A B P(A ∨ B) = P(A) + P(B) − P(A ∧ B) CS:4420 Spring 2019 – p.14/33
  • 16. Conditional Probabilities Conditional probabilities are defined in terms of unconditional ones Whenever P(B) > 0, P(A | B) = P(A ∧ B) P(B) This can be equivalently expressed as the product rule: P(A ∧ B) = P(A | B) P(B) = P(B | A) P(A) A and B are independent iff P(A | B) = P(A) iff P(B | A) = P(B) iff P(A ∧ B) = P(A)P(B) CS:4420 Spring 2019 – p.15/33
  • 17. Random Variable A random variable is a variable ranging over a certain domain of values It is • discrete if it ranges over a discrete (that is, countable) domain • continuous if it ranges over the real numbers We will only consider discrete random variables with finite domains Note: Propositional variables can be seen as random variables over the Boolean domain CS:4420 Spring 2019 – p.16/33
  • 18. Random Variables Variable Domain Age {1, 2, . . . , 120} Weather {sunny, dry, cloudy, rain, snow} Size {small, medium, large} Blonde {true, false} The probability that a random variable X has value val is written as P(X = val) Note 1: P(A = true) is written shortly as P(a) while P(A = false) is written as P(¬a) Note 2: Traditionally, in Probability Theory variables are capitalized and constant values are not CS:4420 Spring 2019 – p.17/33
  • 19. Probability Distribution If X is a random variable, we use the bold case P(X) to denote a vector of values for the probabilities of each individual element that X can take. Example: P(Weather = sunny) = 0.6 P(Weather = rain) = 0.2 P(Weather = cloudy) = 0.18 P(Weather = snow) = 0.02 Then P(Weather) = h0.6, 0.2, 0.18, 0.02i (the value order of“sunny”,“rain”, cloudy”,“snow”is assumed) P(Weather) is called a probability distribution for the random variable Weather CS:4420 Spring 2019 – p.18/33
  • 20. Joint Probability Distribution If X1, . . . , Xn are random variables, P(X1, . . . , Xn) denotes their joint probability distribution (JPD), an n-dimensional matrix specifying the probability of every possible combination of values for X1, . . . , Xn Example Sky : {sunny, cloudy, rain, snow} Wind : {true, false} P(Wind, Sky) = sunny cloudy rain snow true 0.30 0.15 0.17 0.01 false 0.30 0.05 0.01 0.01 CS:4420 Spring 2019 – p.19/33
  • 21. Joint Probability Distribution All relevant probabilities about a vector hX1, . . . , Xni of random variables can be computed from P(X1, . . . , Xn) S = sunny S = cloudy S = rain S = snow P(W) W 0.30 0.15 0.17 0.01 0.63 ¬W 0.30 0.05 0.01 0.01 0.37 P(S) 0.60 0.20 0.18 0.02 1.00 P(S = rain ∧ W) = 0.17 P(S = rain) = P(S = rain ∧ W) + P(S = rain ∧ ¬W) = 0.17 + 0.01 = 0.18 P(W) = 0.30 + 0.15 + 0.17 + 0.01 = 0.63 P(S = rain | W) = P(S = rain ∧ W)/P(W) = 0.17/0.63 = 0.27 CS:4420 Spring 2019 – p.20/33
  • 22. Joint Probability Distribution A joint probability distribution P(X1, . . . , Xn) provides complete information about the probabilities of its random variables. However, JPD’s are often hard to create (again because of incomplete knowledge of the domain). Even when available, JPD tables are very expensive, or impossible, to store because of their size. A JPD table for n random variables, each ranging over k distinct values, has kn entries! A better approach is to come up with conditional probabilities as needed and compute the others from them. CS:4420 Spring 2019 – p.21/33
  • 23. An Alternative to JPD: The Bayes Rule Recall that for any fact A and B, P(A ∧ B) = P(A | B)P(B) = P(B | A)P(A) From this we obtain the Bayes Rule: P(B | A) = P(A | B) P(B) P(A) The rule is useful in practice because it is often easier to compute/estimate P(A | B), P(B), and P(A) than to compute/estimate P(B | A) directly CS:4420 Spring 2019 – p.22/33
  • 24. Applying the Bayes Rule What is the probability that a patient has meningitis (M) given that he has a stiff neck (S)? P(m | s) = P(s | m) P(m) P(s) • P(s | m) is easier to estimate than P(m | s) because it refers to causal knowledge: meningitis typically causes stiff neck • P(s | m) can be estimated from past medical cases and the knowledge about how meningitis works • Similarly, P(m) and P(s) can be estimated from statistical information CS:4420 Spring 2019 – p.23/33
  • 25. Applying the Bayes Rule The Bayes rule is helpful even in absence of (immediate) causal relationships What is the probability that a blonde person (B) is Swedish (S)? P(s | b) = P(b | s)P(s) P(b) All P(b | s), P(s), P(b) are more easily estimated from statistical information P(b | s) ≈ # of blonde Swedish |Swedish population| = 9 10 P(s) ≈ |Swedish population| |world population| = . . . P(b) ≈ # of blondes |world population| = . . . CS:4420 Spring 2019 – p.24/33
  • 26. Conditional Independence In terms of exponential explosion, conditional probabilities do not seem any better than JPD’s for computing the probability of a fact, given n > 1 pieces of evidence P(meningitis | stiffNeck ∧ nausea ∧ · · · ∧ doubleVision) However, facts do not always depend on all the evidence. Example: P(meningitis | stiffNeck ∧ astigmatic) = P(meningitis | stiffNeck) Meningitis and Astigmatic are conditionally independent, given knowledge of StiffNeck CS:4420 Spring 2019 – p.25/33
  • 27. Conditional Probability Notation Recall: P(A ∧ B) = P(A | B) P(B) = P(B | A) P(A) A general version holds for whole joint probability distributions, e.g., P(Sky, Wind) = P(Sky | Wind) P(Wind) stands for (with S for Sky and W for Wind): P(S = sunny, W = true) = P(S = sunny | W = true) P(W = true) P(S = sunny, W = false) = P(S = sunny | W = false) P(W = false) . . . P(S = snow, W = false) = P(S = snow | W = false) P(W = false) I.e., a 4 × 2 set of equations, not matrix multiplication CS:4420 Spring 2019 – p.26/33
  • 28. Conditional Probability Notation The chain rule is derived by successive application of product rule: P(X1, . . . , Xn) = P(X1, . . . , Xn−1) P(Xn|X1, . . . , Xn−1) = P(X1, . . . , Xn−2) P(Xn−1|X1, . . . , Xn−2) P(Xn|X1, . . . , Xn−1) . . . = Πn i = 1P(Xi|X1, . . . , Xi−1) CS:4420 Spring 2019 – p.27/33
  • 29. Inference by enumeration Let X be all the variables. Typically, we want the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X − Y − E. Then, P(Y | E = e) = αP(Y, E = e) = αΣhP(Y, E = e, H = h) where α = P(E = e)−1 Problems: 1. Worst-case time complexity O(dn ) where d is the largest arity 2. Space complexity O(dn) to store the joint distribution 3. How to find the numbers for O(dn) entries? CS:4420 Spring 2019 – p.28/33
  • 30. Independence A and B are independent iff P(A | B) = P(A) iff P(B | A) = P(B) iff P(A, B) = P(A) P(B) Example: P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) 32 entries reduced to 12; for n independent biased coins, 2n → n Problem: Absolute independence is powerful but rare For instance, dentistry is a large field with hundreds of variables, none of which are independent. What to do? Exploit conditional independence CS:4420 Spring 2019 – p.29/33
  • 31. Conditional independence P(Toothache, Cavity, Catch) has 23 − 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache or not: P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I have no cavity: P(catch | toothache, ¬cavity) = P(catch | ¬cavity) Catch is conditionally independent on Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) Similarly: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) CS:4420 Spring 2019 – p.30/33
  • 32. Conditional independence contd. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) Now we have only 2 + 2 + 1 = 5 independent entries In most cases, conditional independence reduces the size of the representation of the JPD from exponential to linear in n Conditional independence is our most basic and robust form of knowledge about uncertain environments CS:4420 Spring 2019 – p.31/33
  • 33. Bayes’ Rule and cond. independence P(Cavity | toothache ∧ catch) = α P(toothache ∧ catch | Cavity) P(Cavity) = α P(toothache | Cavity) P(catch | Cavity) P(Cavity) This is an example of a naive Bayes model: P(Cause, Effect1, . . . , Effectn) = Πn i=1P(Effecti | Cause) P(Cause) Toothache Cavity Catch Cause Effect1 Effectn Total number of parameters is linear in n CS:4420 Spring 2019 – p.32/33
  • 34. Bayesian Networks Exploiting conditional independence information is crucial in making (automated) probabilistic reasoning feasible Bayesian Networks are a successful example of probabilistic systems that exploit conditional independence to reason efficiently under uncertainty CS:4420 Spring 2019 – p.33/33