SlideShare a Scribd company logo
1 of 70
Bayesian Learning and Reasoning
Mohammad Reza Samsami
1
Sharif University of Technology
Fall 2019
Outline
• A brief history: from Symbolic to Connectionist AI
• Motivations
• Introduction to Bayesian Inference
• Bayesian vs Frequentist
• Bayesian method
• Point estimation
• Meaning of probability
• Bayesian linear regression
• Bayesian model comparison and averaging
• New approaches
• Troubles with Bayesianism
2
A brief history: from Symbolic to Connectionist AI
General Problem Solver
3
Content Technique
Back to 1959
A brief history: from Symbolic to Connectionist AI
4
{𝑋 ∨ 𝑌, ¬𝑌, 𝑋 → 𝑍} ⊢ 𝑍
Logic
A brief history: from Symbolic to Connectionist AI
5
Logic
Tree Number Pizza
A brief history: from Symbolic to Connectionist AI
6
Logic
Tree Number Pizza
Symbols
A brief history: from Symbolic to Connectionist AI
7
Tree Number Pizza
Symbols
Symbolic AI
A brief history: from Symbolic to Connectionist AI
8
A brief history: from Symbolic to Connectionist AI
9
Can machines think?
“Thinking is manipulation of symbols and Reasoning is computation.”
Thomas Hobbes
A brief history: from Symbolic to Connectionist AI
10
is our problem-solving procedure, and are how we
represent the world. are verbs explaining how symbols
interact with each other, or adjectives describing symbols.
Show(MohammadReza, Slides)
A brief history: from Symbolic to Connectionist AI
11
The set of all true things about our universe is called a knowledge
base, and we can use logic to examine our knowledge bases to answer
questions and discover new things.
The process of coming up with new propositions and checking
whether they fit with the logic of a knowledge base is called inference.
A brief history: from Symbolic to Connectionist AI
12
Problems with Symbolic AI
Perception
The computer itself doesn’t know what the symbols mean; which
means they are not necessarily linked to any other representations of
the world in a non-symbolic way.
A brief history: from Symbolic to Connectionist AI
13
Problems with Symbolic AI
Monotonicity
Reasoning based on classical deductive logic is monotonic. The new
knowledge cannot undo old knowledge.
A ⋃ ⊢ 𝑋Γ
A brief history: from Symbolic to Connectionist AI
14
Problems with Symbolic AI
Uncertainty
A brief history: from Symbolic to Connectionist AI
15
Intelligence as
Reasoning
Learning
A brief history: from Symbolic to Connectionist AI
16
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P if its performance at tasks in T, as
measured by P, improves with experience E
A brief history: from Symbolic to Connectionist AI
17
Statistics
Optimization
A brief history: from Symbolic to Connectionist AI
18
A brief history: from Symbolic to Connectionist AI
19
A brief history: from Symbolic to Connectionist AI
20
A brief history: from Symbolic to Connectionist AI
21
A brief history: from Symbolic to Connectionist AI
22
Motivation
23
Motivation
24
Motivation
25
Introduction
26
Concept Learning
f(x) = 1 if x is an example of the concept C, and otherwise f(x) = 0
The goal is to learn the indicator function f, which just defines
which elements are in the set C.
Number Game
Arithmetical concept C 𝒟 = {𝑥1, … , 𝑥 𝑛} 𝑥 belongs to C?
{1, 2, … , 100}
Introduction
27
𝒟 = {16}
𝑃 𝑥 𝒟 : Posterior predictive distribution
Introduction
28
𝒟 = {16, 2, 64, 8}
Induction
Powers of two
Introduction
29
How can we explain this behavior and emulate it in a machine?
The classic approach to induction is to suppose we have a hypothesis
space of concepts, ℋ, such as: odd numbers, even numbers, all numbers
between 1 and 100, powers of two, all numbers ending in 8.
The subset of ℋ that is consistent with the data 𝒟 is called the version
space. As we see more examples, the version space shrinks and we
become increasingly certain about the concept.
Introduction
30
Why powers of 2 and not even numbers?
The key intuition is that we want to avoid suspicious coincidences. If the
true concept was even numbers, how come we only saw numbers that
happened to be powers of two?
𝑃 𝒟 𝐶𝑒𝑣𝑒𝑛 =
1
𝑠𝑖𝑧𝑒(𝐶𝑒𝑣𝑒𝑛)
𝑁
=
1
50
4
= 1.6 × 10−7
𝑃 𝒟 𝐶𝑡𝑤𝑜 =
1
𝑠𝑖𝑧𝑒(𝐶𝑡𝑤𝑜)
𝑁
=
1
6
4
= 7.7 × 10−4
Introduction
31
𝑃 𝒟 𝐶𝑡𝑤𝑜
′
=
1
5
4
= 1.6 × 10−3
𝐶𝑡𝑤𝑜
′
: Powers of two except 32.
Introduction
32
𝐶𝑡𝑤𝑜
′
is conceptually unnatural!
We can capture such intuition by assigning low prior probability
to unnatural concepts.
Bayesian Inference
33
Thomas Bayes
Bayes Rule:
Bayesian Inference
34
𝑃 = 𝜃 𝑃 = 1 − 𝜃
𝑃(𝒟|𝜃) = 𝑃(𝑛ℎ, 𝑛 𝑡|𝜃) = 𝜃 𝑛ℎ(1 − 𝜃) 𝑛 𝑡
𝜃 𝑀𝐿𝐸 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝒟|𝜃)
𝜃
Bayesian Inference
35
𝜃 𝑀𝐿𝐸 =
𝑛ℎ
𝑛ℎ + 𝑛 𝑡
Bayesian Inference
36
Bayesian Method: 𝑷 𝜽 𝑫 =
𝑷 𝑫 𝜽 𝑷(𝜽)
𝑷(𝑫)
𝑃 𝐷 𝜃 Binomial
𝑃(𝜃) ?Beta Distribution
ℬ(𝜃; 𝛼, 𝛽) =
𝜃 𝛼−1(1 − 𝜃) 𝛽−1
𝐵(𝛼, 𝛽)
𝐵 𝛼, 𝛽 =
0
1
𝜃 𝛼−1
(1 − 𝜃) 𝛽−1
𝑑𝜃
Bayesian Inference
37
𝑃 𝜃 𝑛ℎ, 𝑛 𝑡 =
𝑃 𝑛ℎ, 𝑛 𝑡 𝜃 𝑃(𝜃)
0
1
𝑃 𝑛ℎ, 𝑛 𝑡 𝜃 𝑃(𝜗)𝑑𝜗
= ℬ(𝛼 + 𝑛ℎ, 𝛽 + 𝑛 𝑡)
Bayesian Inference
38
𝑃 𝜃 ∈ ℋ 𝐷 =
ℋ
𝑃 𝜃 𝐷 𝑑𝜃
Hypothesis ℋ: The coin is fair
𝑃 𝜃 = 1
2 𝐷 =
1
2
1
2
𝑃 𝜃 𝐷 𝑑𝜃 = 0
Interpretations of Probability
39
Frequency Interpretation:
The dominant statistical practice for many years (known as the classical or
frequentist theory) defines probability in terms of the limit of conducting infinitely
many random experiments.
So it is impossible to consider the probability of a statement such as “at least 50%
of Iranians enjoy drinking Doogh.” This statement is either true or false, so its
frequentist probability is either zero or one (but we might not know which).
Interpretations of Probability
40
Subjective (or Bayesian) Interpretation:
“By degree of probability, we really mean, or ought to mean, degree of belief”
According to the subjective interpretation, probabilities are degrees of
confidence, or credence, or partial beliefs of suitable agents. Thus, we really have
many interpretations of probability here— as many as there are suitable agents.
In the Bayesian interpretation, we allow probabilities instead to describe degrees
of belief in such a proposition. In this way, we can treat everything as a random
variable and use the tools of probability to carry out all inference. That is, in
subjective probability, parameters, data, and hypotheses are all treated the same.
De Morgan
Interpretations of Probability
41
Subjective (or Bayesian) Interpretation:
The betting analysis
𝐸
𝑝 𝑥
-𝑝𝑥 +𝑝𝑥
Interpretations of Probability
42
Subjective (or Bayesian) Interpretation:
The betting analysis
𝐸
𝑝 𝑥
-𝑝𝑥 +𝑝𝑥
+𝑥 -𝑥
𝑝 ≤ 1
𝑝 ≥ 0
Point Estimation
Goal: Choose a good value of θ for D
Typically the posterior mean or median is the most appropriate choice
for a real valued quantity, and the vector of posterior marginals is the
best choice for a discrete quantity.
However, the posterior mode, is the most popular choice because it
reduces to an optimization problem, for which efficient algorithms often
exist.
Point Estimation
44
Maximum a posteriori (MAP) estimation
𝜽 𝑴𝑨𝑷 = 𝒂𝒓𝒈𝒎𝒂𝒙 𝑷(𝜽|𝑫) = 𝒂𝒓𝒈𝒎𝒂𝒙 𝑷 𝓓 𝜽 𝑷(𝜽)
𝜽 𝜽
We have avoided computing the normalization constant 𝑃(𝐷).
Interpretations of Probability
45
Maximum a posteriori (MAP) estimation
Pros: Easy to compute
Interpretable
Avoids overfitting – Regularization, Shrinkage
Cons: No representation of uncertainty
Not invariant to reparameterization: 𝜏 = 𝑓 𝜃 𝜏 𝑀𝐿𝐸 = 𝑓(𝜃 𝑀𝐿𝐸)
The mode is an untypical point
Linear Regression
46
Linear Regression
47
Frequentist approach:
Linear Regression
48
The Gaussian distribution
Linear Regression
49
The Gaussian distribution
Linear Regression
50
The Gaussian distribution
Marginalization:
Conditioning:
Linear Regression
51
The Gaussian distribution
Convolutions:
Affine transformations:
Linear Regression
52
Bayesian approach:
Objective Prior:
Vector 𝐲 is a sum of two independent multivariate Gaussian distributed vectors
Linear Regression
53
Bayesian approach:
Linear Regression
54
Bayesian approach:
Making predictions
Connection to ridge regression?
Bayesian Model Comparison
55
Bayesian Model Comparison
56
Occam’s Razor
The simplest answer is often the right one.
−1, 3, 7, 11
𝑥 → 𝑥 + 4
𝑥 →
−𝑥3
11
+
9𝑥2
11
+
23
11
Bayesian Model Comparison
57
MAP
MLE
Bayesian Model Comparison
58
Bayes factor in favor of
Bayesian Model Comparison
59
Bayes factor in favor of is approximately 1.2
Bayesian Model Averaging
60
Full Bayesian method would avoid model selection.
When making predictions, we should theoretically use the sum rule
to marginalize the unknown model:
But model selection is still used widely in practice.
Pros and Cons of Bayesian Inference
61
Pros: Directly answers the questions
Avoid overfitting
Model Selection (Occam’s Razor)
Cons: Must assume prior
Intractable integral
Limited to specific approximated distributions
Bayesian Deep Learning
62
Probabilistic (Bayesian) Programming
63
Deep Learning
Optimization, usually gradient-based
Automated differentiation tools
Bayesian Learning and Reasoning
Inference
Probabilistic (Bayesian) programming languages
Probabilistic (Bayesian) Program Induction
64
Troubles with Bayesianism
65
Eric Mandelbaum, 2019
Some Good Resources
66
Some Good Resources
67
Bayesian Machine Learning
metacademy.org/roadmaps/rgrosse/bayesian_machine_learning
Created by: Roger Grosse
Some Good Resources
68
Some Good Resources
69
70
mohammadrezasamsami76@gmail.com
mrsamsami.github.io

More Related Content

Similar to Bayesian Reasoning and Learning

Lecture 7 Area Objects and Spatial Autocorrelation.pptx
Lecture 7 Area Objects and Spatial Autocorrelation.pptxLecture 7 Area Objects and Spatial Autocorrelation.pptx
Lecture 7 Area Objects and Spatial Autocorrelation.pptxss248sky08
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3DigiGurukul
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dMerrileeDelvalle969
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...InfoTrust LLC
 
Unit1_AI&ML_leftover (2).pptx
Unit1_AI&ML_leftover (2).pptxUnit1_AI&ML_leftover (2).pptx
Unit1_AI&ML_leftover (2).pptxsahilshah890338
 
Bayesian statistics for biologists and ecologists
Bayesian statistics for biologists and ecologistsBayesian statistics for biologists and ecologists
Bayesian statistics for biologists and ecologistsMasahiro Ryo. Ph.D.
 
Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Object Automation
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
Line of best fit lesson
Line of best fit lessonLine of best fit lesson
Line of best fit lessonReneeTorres11
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionSumit Prajapati
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Datadr_jp_ebejer
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 

Similar to Bayesian Reasoning and Learning (20)

Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Lecture 7 Area Objects and Spatial Autocorrelation.pptx
Lecture 7 Area Objects and Spatial Autocorrelation.pptxLecture 7 Area Objects and Spatial Autocorrelation.pptx
Lecture 7 Area Objects and Spatial Autocorrelation.pptx
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Relation Anaylsis
Relation AnaylsisRelation Anaylsis
Relation Anaylsis
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
 
Unit1_AI&ML_leftover (2).pptx
Unit1_AI&ML_leftover (2).pptxUnit1_AI&ML_leftover (2).pptx
Unit1_AI&ML_leftover (2).pptx
 
Bayesian statistics for biologists and ecologists
Bayesian statistics for biologists and ecologistsBayesian statistics for biologists and ecologists
Bayesian statistics for biologists and ecologists
 
Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation Uncertain Knowledge in AI from Object Automation
Uncertain Knowledge in AI from Object Automation
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Glm
GlmGlm
Glm
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Line of best fit lesson
Line of best fit lessonLine of best fit lesson
Line of best fit lesson
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Bayesian Reasoning and Learning

  • 1. Bayesian Learning and Reasoning Mohammad Reza Samsami 1 Sharif University of Technology Fall 2019
  • 2. Outline • A brief history: from Symbolic to Connectionist AI • Motivations • Introduction to Bayesian Inference • Bayesian vs Frequentist • Bayesian method • Point estimation • Meaning of probability • Bayesian linear regression • Bayesian model comparison and averaging • New approaches • Troubles with Bayesianism 2
  • 3. A brief history: from Symbolic to Connectionist AI General Problem Solver 3 Content Technique Back to 1959
  • 4. A brief history: from Symbolic to Connectionist AI 4 {𝑋 ∨ 𝑌, ¬𝑌, 𝑋 → 𝑍} ⊢ 𝑍 Logic
  • 5. A brief history: from Symbolic to Connectionist AI 5 Logic Tree Number Pizza
  • 6. A brief history: from Symbolic to Connectionist AI 6 Logic Tree Number Pizza Symbols
  • 7. A brief history: from Symbolic to Connectionist AI 7 Tree Number Pizza Symbols Symbolic AI
  • 8. A brief history: from Symbolic to Connectionist AI 8
  • 9. A brief history: from Symbolic to Connectionist AI 9 Can machines think? “Thinking is manipulation of symbols and Reasoning is computation.” Thomas Hobbes
  • 10. A brief history: from Symbolic to Connectionist AI 10 is our problem-solving procedure, and are how we represent the world. are verbs explaining how symbols interact with each other, or adjectives describing symbols. Show(MohammadReza, Slides)
  • 11. A brief history: from Symbolic to Connectionist AI 11 The set of all true things about our universe is called a knowledge base, and we can use logic to examine our knowledge bases to answer questions and discover new things. The process of coming up with new propositions and checking whether they fit with the logic of a knowledge base is called inference.
  • 12. A brief history: from Symbolic to Connectionist AI 12 Problems with Symbolic AI Perception The computer itself doesn’t know what the symbols mean; which means they are not necessarily linked to any other representations of the world in a non-symbolic way.
  • 13. A brief history: from Symbolic to Connectionist AI 13 Problems with Symbolic AI Monotonicity Reasoning based on classical deductive logic is monotonic. The new knowledge cannot undo old knowledge. A ⋃ ⊢ 𝑋Γ
  • 14. A brief history: from Symbolic to Connectionist AI 14 Problems with Symbolic AI Uncertainty
  • 15. A brief history: from Symbolic to Connectionist AI 15 Intelligence as Reasoning Learning
  • 16. A brief history: from Symbolic to Connectionist AI 16 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E
  • 17. A brief history: from Symbolic to Connectionist AI 17 Statistics Optimization
  • 18. A brief history: from Symbolic to Connectionist AI 18
  • 19. A brief history: from Symbolic to Connectionist AI 19
  • 20. A brief history: from Symbolic to Connectionist AI 20
  • 21. A brief history: from Symbolic to Connectionist AI 21
  • 22. A brief history: from Symbolic to Connectionist AI 22
  • 26. Introduction 26 Concept Learning f(x) = 1 if x is an example of the concept C, and otherwise f(x) = 0 The goal is to learn the indicator function f, which just defines which elements are in the set C. Number Game Arithmetical concept C 𝒟 = {𝑥1, … , 𝑥 𝑛} 𝑥 belongs to C? {1, 2, … , 100}
  • 27. Introduction 27 𝒟 = {16} 𝑃 𝑥 𝒟 : Posterior predictive distribution
  • 28. Introduction 28 𝒟 = {16, 2, 64, 8} Induction Powers of two
  • 29. Introduction 29 How can we explain this behavior and emulate it in a machine? The classic approach to induction is to suppose we have a hypothesis space of concepts, ℋ, such as: odd numbers, even numbers, all numbers between 1 and 100, powers of two, all numbers ending in 8. The subset of ℋ that is consistent with the data 𝒟 is called the version space. As we see more examples, the version space shrinks and we become increasingly certain about the concept.
  • 30. Introduction 30 Why powers of 2 and not even numbers? The key intuition is that we want to avoid suspicious coincidences. If the true concept was even numbers, how come we only saw numbers that happened to be powers of two? 𝑃 𝒟 𝐶𝑒𝑣𝑒𝑛 = 1 𝑠𝑖𝑧𝑒(𝐶𝑒𝑣𝑒𝑛) 𝑁 = 1 50 4 = 1.6 × 10−7 𝑃 𝒟 𝐶𝑡𝑤𝑜 = 1 𝑠𝑖𝑧𝑒(𝐶𝑡𝑤𝑜) 𝑁 = 1 6 4 = 7.7 × 10−4
  • 31. Introduction 31 𝑃 𝒟 𝐶𝑡𝑤𝑜 ′ = 1 5 4 = 1.6 × 10−3 𝐶𝑡𝑤𝑜 ′ : Powers of two except 32.
  • 32. Introduction 32 𝐶𝑡𝑤𝑜 ′ is conceptually unnatural! We can capture such intuition by assigning low prior probability to unnatural concepts.
  • 34. Bayesian Inference 34 𝑃 = 𝜃 𝑃 = 1 − 𝜃 𝑃(𝒟|𝜃) = 𝑃(𝑛ℎ, 𝑛 𝑡|𝜃) = 𝜃 𝑛ℎ(1 − 𝜃) 𝑛 𝑡 𝜃 𝑀𝐿𝐸 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝒟|𝜃) 𝜃
  • 35. Bayesian Inference 35 𝜃 𝑀𝐿𝐸 = 𝑛ℎ 𝑛ℎ + 𝑛 𝑡
  • 36. Bayesian Inference 36 Bayesian Method: 𝑷 𝜽 𝑫 = 𝑷 𝑫 𝜽 𝑷(𝜽) 𝑷(𝑫) 𝑃 𝐷 𝜃 Binomial 𝑃(𝜃) ?Beta Distribution ℬ(𝜃; 𝛼, 𝛽) = 𝜃 𝛼−1(1 − 𝜃) 𝛽−1 𝐵(𝛼, 𝛽) 𝐵 𝛼, 𝛽 = 0 1 𝜃 𝛼−1 (1 − 𝜃) 𝛽−1 𝑑𝜃
  • 37. Bayesian Inference 37 𝑃 𝜃 𝑛ℎ, 𝑛 𝑡 = 𝑃 𝑛ℎ, 𝑛 𝑡 𝜃 𝑃(𝜃) 0 1 𝑃 𝑛ℎ, 𝑛 𝑡 𝜃 𝑃(𝜗)𝑑𝜗 = ℬ(𝛼 + 𝑛ℎ, 𝛽 + 𝑛 𝑡)
  • 38. Bayesian Inference 38 𝑃 𝜃 ∈ ℋ 𝐷 = ℋ 𝑃 𝜃 𝐷 𝑑𝜃 Hypothesis ℋ: The coin is fair 𝑃 𝜃 = 1 2 𝐷 = 1 2 1 2 𝑃 𝜃 𝐷 𝑑𝜃 = 0
  • 39. Interpretations of Probability 39 Frequency Interpretation: The dominant statistical practice for many years (known as the classical or frequentist theory) defines probability in terms of the limit of conducting infinitely many random experiments. So it is impossible to consider the probability of a statement such as “at least 50% of Iranians enjoy drinking Doogh.” This statement is either true or false, so its frequentist probability is either zero or one (but we might not know which).
  • 40. Interpretations of Probability 40 Subjective (or Bayesian) Interpretation: “By degree of probability, we really mean, or ought to mean, degree of belief” According to the subjective interpretation, probabilities are degrees of confidence, or credence, or partial beliefs of suitable agents. Thus, we really have many interpretations of probability here— as many as there are suitable agents. In the Bayesian interpretation, we allow probabilities instead to describe degrees of belief in such a proposition. In this way, we can treat everything as a random variable and use the tools of probability to carry out all inference. That is, in subjective probability, parameters, data, and hypotheses are all treated the same. De Morgan
  • 41. Interpretations of Probability 41 Subjective (or Bayesian) Interpretation: The betting analysis 𝐸 𝑝 𝑥 -𝑝𝑥 +𝑝𝑥
  • 42. Interpretations of Probability 42 Subjective (or Bayesian) Interpretation: The betting analysis 𝐸 𝑝 𝑥 -𝑝𝑥 +𝑝𝑥 +𝑥 -𝑥 𝑝 ≤ 1 𝑝 ≥ 0
  • 43. Point Estimation Goal: Choose a good value of θ for D Typically the posterior mean or median is the most appropriate choice for a real valued quantity, and the vector of posterior marginals is the best choice for a discrete quantity. However, the posterior mode, is the most popular choice because it reduces to an optimization problem, for which efficient algorithms often exist.
  • 44. Point Estimation 44 Maximum a posteriori (MAP) estimation 𝜽 𝑴𝑨𝑷 = 𝒂𝒓𝒈𝒎𝒂𝒙 𝑷(𝜽|𝑫) = 𝒂𝒓𝒈𝒎𝒂𝒙 𝑷 𝓓 𝜽 𝑷(𝜽) 𝜽 𝜽 We have avoided computing the normalization constant 𝑃(𝐷).
  • 45. Interpretations of Probability 45 Maximum a posteriori (MAP) estimation Pros: Easy to compute Interpretable Avoids overfitting – Regularization, Shrinkage Cons: No representation of uncertainty Not invariant to reparameterization: 𝜏 = 𝑓 𝜃 𝜏 𝑀𝐿𝐸 = 𝑓(𝜃 𝑀𝐿𝐸) The mode is an untypical point
  • 50. Linear Regression 50 The Gaussian distribution Marginalization: Conditioning:
  • 51. Linear Regression 51 The Gaussian distribution Convolutions: Affine transformations:
  • 52. Linear Regression 52 Bayesian approach: Objective Prior: Vector 𝐲 is a sum of two independent multivariate Gaussian distributed vectors
  • 54. Linear Regression 54 Bayesian approach: Making predictions Connection to ridge regression?
  • 56. Bayesian Model Comparison 56 Occam’s Razor The simplest answer is often the right one. −1, 3, 7, 11 𝑥 → 𝑥 + 4 𝑥 → −𝑥3 11 + 9𝑥2 11 + 23 11
  • 59. Bayesian Model Comparison 59 Bayes factor in favor of is approximately 1.2
  • 60. Bayesian Model Averaging 60 Full Bayesian method would avoid model selection. When making predictions, we should theoretically use the sum rule to marginalize the unknown model: But model selection is still used widely in practice.
  • 61. Pros and Cons of Bayesian Inference 61 Pros: Directly answers the questions Avoid overfitting Model Selection (Occam’s Razor) Cons: Must assume prior Intractable integral Limited to specific approximated distributions
  • 63. Probabilistic (Bayesian) Programming 63 Deep Learning Optimization, usually gradient-based Automated differentiation tools Bayesian Learning and Reasoning Inference Probabilistic (Bayesian) programming languages
  • 67. Some Good Resources 67 Bayesian Machine Learning metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse