SlideShare a Scribd company logo
1 of 21
Download to read offline
Bayesian Learning
 Each observed training example can incrementally decrease or
increase the estimated probability that a hypothesis is correct.
 This provides a more flexible approach to learning than
algorithms that completely eliminate a hypothesis if it is found to
be inconsistent with any single example.
 Prior knowledge can be combined with observed data to
determine the final probability of a hypothesis. In Bayesian
learning, prior knowledge is provided by asserting
 A prior probability for each candidate hypothesis, and a
probability distribution over observed data for each possible
hypothesis.
 Bayesian methods can accommodate hypotheses that make
probabilistic predictions
 New instances can be classified by combining the predictions of
multiple hypotheses, weighted by their probabilities.
 Even in cases where Bayesian methods prove computationally
intractable, they can provide a standard of optimal decision
making against which other practical methods can be measured
Bayesian Learning…
 Require initial knowledge of many probabilities – When
these probabilities are not known in advance, they are
often estimated based on background knowledge,
previously available data, and assumptions about the
form of the underlying distributions.
 Significant computational cost is required to determine
the Bayes optimal hypothesis in the general case (linear
in the number of candidate hypotheses). In certain
specialized situations, this computational cost can be
significantly reduced
Bayes Theorem
 In machine learning, we try to determine the best
hypothesis from some hypothesis space H, given the
observed training data D.
 In Bayesian learning, the best hypothesis means the
most probable hypothesis, given the data D plus any
initial knowledge about the prior probabilities of the
various hypotheses in H.
 Bayes theorem provides a way to calculate the
probability of a hypothesis based on its prior probability,
the probabilities of observing various data given the
hypothesis, and the observed data itself
Bayes Theorem..
 In ML problems, we are interested in the probability P(h|D) that h
holds given the observed training data D.
 Bayes theorem provides a way to calculate the posterior probability
P(h|D), from the prior probability P(h), together with P(D) and
P(D|h).
 Bayes Theorem:
 P(h) is prior probability of hypothesis h
 P(D) is prior probability of training data D
 P(h|D) is posterior probability of h given D
 P(D|h) is posterior probability of D given h
 P(h|D) increases with P(h) and P(D|h) according to Bayes theorem.
 P(h|D) decreases as P(D) increases, because the more probable it is
that D will be observed independent of h, the less evidence D
provides in support of h
Maximum A Posteriori (MAP)
Hypothesis, hMAP
Maximum Likelihood (ML)
Hypothesis, hML
Brute-Force Bayes Concept
Learning
Brute-Force MAP Learning
Algorithm
Brute-Force MAP Learning
Algorithm
Brute-Force MAP Learning
Algorithm
Maximum Likelihood and Least-
Squared Error Hypotheses
 Many learning approaches such as neural network learning,
linear regression, and polynomial curve fitting try to learn a
continuous-valued target function.
 Under certain assumptions any learning algorithm that
minimizes the squared error between the output hypothesis
predictions and the training data will output a MAXIMUM
LIKELIHOOD HYPOTHESIS.
 The significance of this result is that it provides a Bayesian
justification (under certain assumptions) for many neural
network and other curve fitting methods that attempt to
minimize the sum of squared errors over the training data.
Maximum Likelihood and Least-
Squared Error Hypotheses –
Deriving hML
Maximum Likelihood and Least-
Squared Error Hypotheses –
Deriving hML
Maximum Likelihood and Least-
Squared Error Hypotheses –
Deriving hML
Maximum Likelihood and Least-
Squared Error Hypotheses
 The maximum likelihood hypothesis hML is the one that minimizes
the sum of the squared errors between observed training values di
and hypothesis predictions h(xi ).
 This holds under the assumption that the observed training values
di are generated by adding random noise to the true target value,
where this random noise is drawn independently for each example
from a Normal distribution with zero mean.
 Similar derivations can be performed starting with other assumed
noise distributions, producing different results.
 Why is it reasonable to choose the Normal distribution to
characterize noise? – One reason, is that it allows for a
mathematically straightforward analysis. – A second reason is that
the smooth, bell-shaped distribution is a good approximation to
many types of noise in physical systems.
 Minimizing the sum of squared errors is a common approach in
many neural network, curve fitting, and other approaches to
approximating real-valued functions.
Minimum Description Length
Principle
Minimum Description Length
Principle
 Consider the problem of designing a code to transmit messages
drawn at random, where the probability of encountering
message i is pi.
 We are interested here in the most compact code; that is, we
are interested in the code that minimizes the expected
number of bits we must transmit in order to encode a message
drawn at random.
 Clearly, to minimize the expected code length we should
assign shorter codes to messages that are more probable.
 We will refer to the number of bits required to encode
message i using code C as the description length of message i
with respect to C, which we denote by Lc(i).
Minimum Description Length
Principle
• Therefore the above equation is rewritten to show that
is the hypothesis h that minimizes the sum given by the
description length of the hypothesis plus the description
length of the data given the hypothesis
Bayes Optimal Classifier
 It is the most probable classification for the new instance in the given
the training data. So, whenever it gets the new instance, it makes the
most efficient classification.
 To develop some intuitions, consider a hypothesis space containing
three hypotheses, h1, h2, and h3.
 Suppose that the posterior probabilities of these hypotheses given the
training data are .4, .3, and .3 respectively. Thus, h1 is the MAP
hypothesis.
 Suppose a new instance x is encountered, which is classified positive by
h1, but negative by h2 and h3.
 Taking all hypotheses into account, the probability that x is positive is
.4 (the probability associated with hi), and the probability that it is
negative is therefore .6.
 The most probable classification (negative) in this case is different from
the classification generated by the MAP hypothesis
 If the possible classification of the new example can take on any value
vj from some set V, then the probability P(vjlD) that the correct
classification for the new instance is vj is,
Bayes Optimal Classifier
Gibbs Algorithm

More Related Content

What's hot

Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimensionbutest
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theorybutest
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theoremBalaji P
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learningswapnac12
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programmingEli Gottlieb
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set TheoryAMIT KUMAR
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Fuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemFuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemDigiGurukul
 

What's hot (20)

Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimension
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theorem
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
Goodness of fit (ppt)
Goodness of fit (ppt)Goodness of fit (ppt)
Goodness of fit (ppt)
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learning
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set Theory
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Fuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemFuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th Sem
 

Similar to Bayes Theorem.pdf

4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptxSaitama84
 
Module 4_F.pptx
Module  4_F.pptxModule  4_F.pptx
Module 4_F.pptxSupriyaN21
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12Kumari Naveen
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learningkensaleste
 
original
originaloriginal
originalbutest
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
GUC_2744_59_29307_2023-02-22T14_07_02.pdf
GUC_2744_59_29307_2023-02-22T14_07_02.pdfGUC_2744_59_29307_2023-02-22T14_07_02.pdf
GUC_2744_59_29307_2023-02-22T14_07_02.pdfChrisRomany1
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1jemille6
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.pptOmDalvi4
 
bayesNaive algorithm in machine learning
bayesNaive algorithm in machine learningbayesNaive algorithm in machine learning
bayesNaive algorithm in machine learningKumari Naveen
 
Stat451 - Life Distribution
Stat451 - Life DistributionStat451 - Life Distribution
Stat451 - Life Distributionnpxquynh
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 

Similar to Bayes Theorem.pdf (20)

4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx4-ML-UNIT-IV-Bayesian Learning.pptx
4-ML-UNIT-IV-Bayesian Learning.pptx
 
Module 4_F.pptx
Module  4_F.pptxModule  4_F.pptx
Module 4_F.pptx
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
UNIT II (7).pptx
UNIT II (7).pptxUNIT II (7).pptx
UNIT II (7).pptx
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learning
 
original
originaloriginal
original
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
GUC_2744_59_29307_2023-02-22T14_07_02.pdf
GUC_2744_59_29307_2023-02-22T14_07_02.pdfGUC_2744_59_29307_2023-02-22T14_07_02.pdf
GUC_2744_59_29307_2023-02-22T14_07_02.pdf
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1
 
AI ML M5
AI ML M5AI ML M5
AI ML M5
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
 
bayesNaive algorithm in machine learning
bayesNaive algorithm in machine learningbayesNaive algorithm in machine learning
bayesNaive algorithm in machine learning
 
Stat451 - Life Distribution
Stat451 - Life DistributionStat451 - Life Distribution
Stat451 - Life Distribution
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
unit 3 -ML.pptx
unit 3 -ML.pptxunit 3 -ML.pptx
unit 3 -ML.pptx
 

More from Nirmalavenkatachalam

More from Nirmalavenkatachalam (6)

Unit 2 DATABASE ESSENTIALS.pptx
Unit 2 DATABASE ESSENTIALS.pptxUnit 2 DATABASE ESSENTIALS.pptx
Unit 2 DATABASE ESSENTIALS.pptx
 
Unit 3 ppt.pptx
Unit 3 ppt.pptxUnit 3 ppt.pptx
Unit 3 ppt.pptx
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
Divide and Conquer / Greedy Techniques
Divide and Conquer / Greedy TechniquesDivide and Conquer / Greedy Techniques
Divide and Conquer / Greedy Techniques
 
DAA Unit 1.pdf
DAA Unit 1.pdfDAA Unit 1.pdf
DAA Unit 1.pdf
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 

Bayes Theorem.pdf

  • 1. Bayesian Learning  Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct.  This provides a more flexible approach to learning than algorithms that completely eliminate a hypothesis if it is found to be inconsistent with any single example.  Prior knowledge can be combined with observed data to determine the final probability of a hypothesis. In Bayesian learning, prior knowledge is provided by asserting  A prior probability for each candidate hypothesis, and a probability distribution over observed data for each possible hypothesis.  Bayesian methods can accommodate hypotheses that make probabilistic predictions  New instances can be classified by combining the predictions of multiple hypotheses, weighted by their probabilities.  Even in cases where Bayesian methods prove computationally intractable, they can provide a standard of optimal decision making against which other practical methods can be measured
  • 2. Bayesian Learning…  Require initial knowledge of many probabilities – When these probabilities are not known in advance, they are often estimated based on background knowledge, previously available data, and assumptions about the form of the underlying distributions.  Significant computational cost is required to determine the Bayes optimal hypothesis in the general case (linear in the number of candidate hypotheses). In certain specialized situations, this computational cost can be significantly reduced
  • 3. Bayes Theorem  In machine learning, we try to determine the best hypothesis from some hypothesis space H, given the observed training data D.  In Bayesian learning, the best hypothesis means the most probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H.  Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself
  • 4. Bayes Theorem..  In ML problems, we are interested in the probability P(h|D) that h holds given the observed training data D.  Bayes theorem provides a way to calculate the posterior probability P(h|D), from the prior probability P(h), together with P(D) and P(D|h).  Bayes Theorem:  P(h) is prior probability of hypothesis h  P(D) is prior probability of training data D  P(h|D) is posterior probability of h given D  P(D|h) is posterior probability of D given h  P(h|D) increases with P(h) and P(D|h) according to Bayes theorem.  P(h|D) decreases as P(D) increases, because the more probable it is that D will be observed independent of h, the less evidence D provides in support of h
  • 5. Maximum A Posteriori (MAP) Hypothesis, hMAP
  • 11. Maximum Likelihood and Least- Squared Error Hypotheses  Many learning approaches such as neural network learning, linear regression, and polynomial curve fitting try to learn a continuous-valued target function.  Under certain assumptions any learning algorithm that minimizes the squared error between the output hypothesis predictions and the training data will output a MAXIMUM LIKELIHOOD HYPOTHESIS.  The significance of this result is that it provides a Bayesian justification (under certain assumptions) for many neural network and other curve fitting methods that attempt to minimize the sum of squared errors over the training data.
  • 12. Maximum Likelihood and Least- Squared Error Hypotheses – Deriving hML
  • 13. Maximum Likelihood and Least- Squared Error Hypotheses – Deriving hML
  • 14. Maximum Likelihood and Least- Squared Error Hypotheses – Deriving hML
  • 15. Maximum Likelihood and Least- Squared Error Hypotheses  The maximum likelihood hypothesis hML is the one that minimizes the sum of the squared errors between observed training values di and hypothesis predictions h(xi ).  This holds under the assumption that the observed training values di are generated by adding random noise to the true target value, where this random noise is drawn independently for each example from a Normal distribution with zero mean.  Similar derivations can be performed starting with other assumed noise distributions, producing different results.  Why is it reasonable to choose the Normal distribution to characterize noise? – One reason, is that it allows for a mathematically straightforward analysis. – A second reason is that the smooth, bell-shaped distribution is a good approximation to many types of noise in physical systems.  Minimizing the sum of squared errors is a common approach in many neural network, curve fitting, and other approaches to approximating real-valued functions.
  • 17. Minimum Description Length Principle  Consider the problem of designing a code to transmit messages drawn at random, where the probability of encountering message i is pi.  We are interested here in the most compact code; that is, we are interested in the code that minimizes the expected number of bits we must transmit in order to encode a message drawn at random.  Clearly, to minimize the expected code length we should assign shorter codes to messages that are more probable.  We will refer to the number of bits required to encode message i using code C as the description length of message i with respect to C, which we denote by Lc(i).
  • 18. Minimum Description Length Principle • Therefore the above equation is rewritten to show that is the hypothesis h that minimizes the sum given by the description length of the hypothesis plus the description length of the data given the hypothesis
  • 19. Bayes Optimal Classifier  It is the most probable classification for the new instance in the given the training data. So, whenever it gets the new instance, it makes the most efficient classification.  To develop some intuitions, consider a hypothesis space containing three hypotheses, h1, h2, and h3.  Suppose that the posterior probabilities of these hypotheses given the training data are .4, .3, and .3 respectively. Thus, h1 is the MAP hypothesis.  Suppose a new instance x is encountered, which is classified positive by h1, but negative by h2 and h3.  Taking all hypotheses into account, the probability that x is positive is .4 (the probability associated with hi), and the probability that it is negative is therefore .6.  The most probable classification (negative) in this case is different from the classification generated by the MAP hypothesis  If the possible classification of the new example can take on any value vj from some set V, then the probability P(vjlD) that the correct classification for the new instance is vj is,