SlideShare a Scribd company logo
1 of 31
Download to read offline
Bayesian Data Analysis
By.Dr.Sarita Tripathy
Assistant Professor
School of Computer Engineering
KIIT Deemed to be University
Bayesian Data Analysis
• BDA deals with a set of practical methods for making inferences from
the available data.
• These methods use probability models to model the given data and
also predict future values.
• Bayesian analysis is a statistical paradigm that answers research
questions about unknown parameters using probability statements. For
example, what is the probability that the average male height is
between 70 and 80 inches or that the average female height is between
60 and 70 inches?
• Identify the data, define a descriptive model, specify a prior, compute
the posterior distribution, interpret the posterior distribution, and,
check that the model is a reasonable description of the data.
Steps in BDA
1.BDA consists of three steps:
1.Setting up the prior distribution:
• The domain expertise and prior knowledge is used to develop join probability
distribution
• for all parameters of data under consideration and also output data(which needs to
be predicted) this is termed as prior distribution.
2.Setting up the posterior distribution:
• After taking into account the observed data(given dataset)calculating and
interpreting the appropriate posterior distribution is done.This is estimating the
conditional probability distribution of the data.
3.Evaluating the fit of the model
BAYESIAN INFERENCE
INTRODUCTION
• Bayesian inference is a method of statistical inference in which
Bayes' theorem is used to update the probability for a hypothesis as
more evidence or information becomes available.
• Bayesian inference is an important technique in statistics, and
especially in mathematical statistics.
• Bayesian inference has found application in a wide range of
activities, including science, engineering, philosophy, medicine,
sport, and law.
• In the philosophy of decision theory, Bayesian inference isclosely
related to subjective probability, often called "Bayesianprobability".
• Bayes theorem adjusts probabilities given new evidence in the
following way:
P(H0|E)= P(E|H0) P(H0)/ P(E)
• Where, H0 represents the hypothesis, called a null hypothesis,
inferred before new evidence
• P(H0) is called the prior probability of H0.
• P(E|H0) is called the conditional probability
• P(E) is called the marginal probability of E: the probability of
witnessing the new evidence.
• P(H0|E) is called the posterior probability of H0 given E.
• The factor P(E|H0) P(E) represents the impact that the
evidence has on the belief in the hypothesis.
• Multiplying the prior probability P(H0) by the factor
P(E|H0)/P(E) will never the yield a probability that is greater
than 1.
• Since P(E) is at least as great as P(E∩H0), which equals to
P(E|H0). P(H0), replacing P(E) with P(E∩H0) in the factor
P(E|H0)/P(E) will yield a posterior probability of 1.
• Therefore, the posterior probability could yield a probability
greater than 1 only if P(E) were less than P(E∩H0) which is
never true.
LIKELIHOOD FUNCTION
• The probability of E given H0, P(E|H0), can be represented as
function of its second argument with its first argument held at
a given value. Such a function is called likelihood function; it
is a function of H0 given E. A ratio of two likelihood functions
is called a likelihood ratio,
˄=L(H0|E)/L(not H0|E)=P(E|H0)/P(E| not H0)
• The marginal probability, P(E), can also be represented as the
sum of the product of all probabilities of mutually exclusive
hypothesis and corresponding conditional probabilities:
P(E| H0) P(H0)+ P(E| not H0) P(not H0)
• As a result, we can rewrite Bayes Theorem as
P(H0|E)=P(E|H0) P(H0)/P(E|H0) P(H0)+ P(E| not H0) P(not
H0)- ˄P(H0)/˄P(H0)| P(not H0)
• With two independent pieces of evidence E1 and E2, Bayesian
inference can be applied iteratively.
• We could use the first piece of evidence to calculate an initial
posterior probability, and then use that posterior probability as
a new prior probability to calculate a second posterior
probability given the second piece of evidence.
• Independence of evidence implies that,
P(E1, E2| H0)= P(E1| H0) * P(E2|H0)
P(E1, E2)= P(E1)* P(E2)
P(E1, E2| not H0)= P(E1| not H0) * P(E2| not H0)
From which bowl is the cookie?
• Suppose there are two full bowls of cookies. Bowl #1 has 10
chocolate chip and 30 plain cookies, while bowl #2 has 20 of each.
Our friend Hardika picks a bowl at random, and then picks a cookie
at random. We may assume there is no reason to believe Hardika
treats one bowl differently from another, likewise for the cookies.
The cookie turns out to be a plain one. How probable is it that
Hardika picked it out of bowl #1?
Bowl#1 Bowl#2
• Intuitively, it seems clear that the answer should be more than
a half, since there are more plain cookies in bowl #1.
• The precise answer is given by Bayes' theorem. Let H1
correspond to bowl #1, and H2 to bowl #2.
• It is given that the bowls are identical from Hardika’s point of
view, thus P(H1) = P(H2) and the two must add up to 1, so
both are equal to 0.5.
• The D is the observation of a plain cookie.
• From the contents of the bowls, we know that
P(D| H1)= 30/40= 0.75 and P(D| H2)= 20/40= 0.5
• Bayes formula then yields,
P(H1| D)= P(H1)*P(D|H1)/ P(H1)*P(D|H1)+ P(H2)*P(D|H2)
= 0.5* 0.75/ 0.5*0.75+0.5*0.5
= 0.6
• Before observing the cookie, the probability that Hardika
chose bowl#1 is the prior probability, P(H1) which is 0.5.
After observing the cookie, we revise the probability as 0.6.
• Its worth noting that our belief that observing the plain cookie
should somewhat affect the prior probability P(H1) has formed
the posterior probability P(H1| D), increased from 0.5 to 0.6
• This reflects our intuition that the cookie is more likely from
the bowl#1, since it has a higher ratio of plain to chocolate
cookies than the other.
PRIOR PROBABILITY DISTRIBUTION
• In Bayesian statistical inference, a prior probability
distribution, often called simply the prior, of an uncertain
quantity p( For e.g. suppose p is the proportion of voters who
will vote for Mr. Narendra Modi in a future election) is the
probability distribution that would express ones uncertainty
about p before the data (For e.g. an election poll) are taken into
account.
• It is meant to attribute uncertainty rather than randomness to
the uncertain quantity.
INTRODUCTION TO NAIVE BAYES
• Suppose your data consist of fruits, described by their color
and shape.
• Bayesian classifiers operate by saying “If you see a fruit that is
red and round, which type of fruit most likely to be, based on
the observed data sample? In future, classify red and round
fruit as that type of fruit.”
• A difficulty arises when you have more than a few variables
and classes- you would require an enormous number of
observations to estimate these probabilities.
• Naïve Bayes classifier assume that the effect of a variable
value on a given class is independent of the values of other
variable.
• This assumption is called class conditional independence.
• It is made to simplify the computation and in this sense
considered to be Naïve.
APPLICATIONS
1. Computer applications
• Bayesian inference has applications in artificial
intelligence and expert systems.
• Bayesian inference techniques have been a fundamental
part of computerized pattern recognition techniques since
the late 1950s.
• Recently Bayesian inference has gained popularity among
the phylogenetics community for these reasons; a number
of applications allow many demographic and evolutionary
parameters to be estimated simultaneously.
2. Bioinformatics applications
• Bayesian inference has been applied in different
Bioinformatics applications, including differentially
gene expression analysis, single-cell classification,
cancer subtyping, and etc.
ADVANTAGES
• Including
prediction.
• Including
good information should improve
structure can allow the method to
incorporate more data (for example, hierarchical
modeling allows partial pooling so that external data
can be included in a model even if these external
data share only some characteristics with the current
data being modeled).
DISADVANTAGES
• If the prior information is wrong, it can send
inferences in the wrong direction.
• Bayes inference combines different sources of
information; thus it is no longer an
encapsulation of a particular dataset (which is
sometimes desired, for reasons that go beyond
immediate predictive accuracy and instead
touch on issues of statistical communication).
Naïve Bayesian Classifier
CS 40003: Data Analytics 23
Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which
is as follows.
From Bayes’ theorem on conditional probability, we have
𝑃 𝑌 𝑋 =
𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃(𝑋)
=
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑖=1
𝑘
𝑃(𝑋|𝑌 = 𝑦𝑖) ∙ 𝑃(Y = 𝑦𝑖)
Note:
 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).
 Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
Naïve Bayesian Classifier
CS 40003: Data Analytics 24
Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1) and ….. (𝑋𝑛= 𝑥𝑛)).
There are any two class conditional probabilities namely P(Y= 𝑦𝑖|X=x) and P(Y=
𝑦𝑗 | X=x).
If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗 for
the instance X = x.
The strongest 𝑦𝑖 is the classification for the instance X = x.
Naive Bayesian Classifier
25
Example: With reference to the Air Traffic Dataset mentioned earlier, let us tabulate all the posterior
and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Day
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0
Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Season
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
Naïve Bayesian Classifier
CS 40003: Data Analytics 26
Instance:
Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013
Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000
Case3 is the strongest; Hence correct classification is Very Late
Week Day Winter High Heavy ???
Naïve Bayesian Classifier
CS 40003: Data Analytics 27
Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather
proportion values (to posterior probabilities)
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1, 𝑐2, … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = 𝐴1, 𝐴2, … . . , 𝐴𝑛 , which for a given instance have
values 𝐴1 = 𝑎1, 𝐴2 = 𝑎2,….., 𝐴𝑛 = 𝑎𝑛
Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k
𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑗=1
𝑛
𝑃(𝐴𝑗 = 𝑎𝑗|𝐶𝑖)
𝑝𝑥 = max 𝑝1, 𝑝2, … . . , 𝑝𝑘
Output: 𝐶𝑥 is the classification
Algorithm: Naïve Bayesian Classification
Naïve Bayesian Classifier
CS 40003: Data Analytics 28
Pros and Cons
The Naïve Bayes’ approach is a very popular one, which often works well.
However, it has a number of potential problems
It relies on all attributes being categorical.
If the data is less, then it estimates poorly.
Naïve Bayesian Classifier
CS 40003: Data Analytics 29
Approach to overcome the limitations in Naïve Bayesian Classification
Estimating the posterior probabilities for continuous attributes
In real life situation, all attributes are not necessarily be categorical, In fact, there is a mix of both
categorical and continuous attributes.
In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier.
1. We can discretize each continuous attributes and then replace the continuous
values with its corresponding discrete intervals.
2. We can assume a certain form of probability distribution for the continuous variable and
estimate the parameters of the distribution using the training data. A Gaussian
distribution is usually chosen to represent the posterior probabilities for continuous
attributes. A general form of Gaussian distribution will look like
P x: μ, σ2
=
1
2πσ
e−
x − μ 2
2σ2
where, μ and σ2
denote mean and variance, respectively.
Naïve Bayesian Classifier
CS 40003: Data Analytics 30
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric
attribute) can be calculated following Gaussian normal distribution as follows.
P Aj = aj|Ci =
1
2πσij
e−
aj − μij 2
2σij
2
Here, the parameter μij can be calculated based on the sample mean of attribute
value of Aj for the training records that belong to the class Ci.
Similarly, σij
2 can be estimated from the calculation of variance of such training
records.
THANK YOU.

More Related Content

What's hot

Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
Damian T. Gordon
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
jundumaug1
 
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
A  Method  for  Solving  Balanced  Intuitionistic  Fuzzy  Assignment  Problem A  Method  for  Solving  Balanced  Intuitionistic  Fuzzy  Assignment  Problem
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
Navodaya Institute of Technology
 

What's hot (20)

Spanos: Lecture 1 Notes: Introduction to Probability and Statistical Inference
Spanos: Lecture 1 Notes: Introduction to Probability and Statistical InferenceSpanos: Lecture 1 Notes: Introduction to Probability and Statistical Inference
Spanos: Lecture 1 Notes: Introduction to Probability and Statistical Inference
 
Data analysis
Data analysisData analysis
Data analysis
 
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
Chap10 hypothesis testing ; additional topics
Chap10 hypothesis testing ; additional topicsChap10 hypothesis testing ; additional topics
Chap10 hypothesis testing ; additional topics
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distribution
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
 
Day2 statistical tests
Day2 statistical testsDay2 statistical tests
Day2 statistical tests
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
 
Transfer Learning for the Detection and Classification of traditional pneumon...
Transfer Learning for the Detection and Classification of traditional pneumon...Transfer Learning for the Detection and Classification of traditional pneumon...
Transfer Learning for the Detection and Classification of traditional pneumon...
 
Ev4301897903
Ev4301897903Ev4301897903
Ev4301897903
 
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
A  Method  for  Solving  Balanced  Intuitionistic  Fuzzy  Assignment  Problem A  Method  for  Solving  Balanced  Intuitionistic  Fuzzy  Assignment  Problem
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 

Similar to Bayesian data analysis1

Bayesian decision making in clinical research
Bayesian decision making in clinical researchBayesian decision making in clinical research
Bayesian decision making in clinical research
Bhaswat Chakraborty
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systems
bhupendra kumar
 

Similar to Bayesian data analysis1 (20)

Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Bayesian Statistics.pdf
Bayesian Statistics.pdfBayesian Statistics.pdf
Bayesian Statistics.pdf
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r intro
 
unit 3 -ML.pptx
unit 3 -ML.pptxunit 3 -ML.pptx
unit 3 -ML.pptx
 
Bayesian decision making in clinical research
Bayesian decision making in clinical researchBayesian decision making in clinical research
Bayesian decision making in clinical research
 
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeBayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
 
UNIT II (7).pptx
UNIT II (7).pptxUNIT II (7).pptx
UNIT II (7).pptx
 
Bayesian statistics intro using r
Bayesian statistics intro using rBayesian statistics intro using r
Bayesian statistics intro using r
 
Bayesian Learning- part of machine learning
Bayesian Learning-  part of machine learningBayesian Learning-  part of machine learning
Bayesian Learning- part of machine learning
 
Module 4_F.pptx
Module  4_F.pptxModule  4_F.pptx
Module 4_F.pptx
 
Uncertainty
UncertaintyUncertainty
Uncertainty
 
Representing uncertainty in expert systems
Representing uncertainty in expert systemsRepresenting uncertainty in expert systems
Representing uncertainty in expert systems
 
Lec13_Bayes.pptx
Lec13_Bayes.pptxLec13_Bayes.pptx
Lec13_Bayes.pptx
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
 
Mayo & parker spsp 2016 june 16
Mayo & parker   spsp 2016 june 16Mayo & parker   spsp 2016 june 16
Mayo & parker spsp 2016 june 16
 
Bayesian intro
Bayesian introBayesian intro
Bayesian intro
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statistics
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information Theory
 
Warmup_New.pdf
Warmup_New.pdfWarmup_New.pdf
Warmup_New.pdf
 

Recently uploaded

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

Bayesian data analysis1

  • 1. Bayesian Data Analysis By.Dr.Sarita Tripathy Assistant Professor School of Computer Engineering KIIT Deemed to be University
  • 2. Bayesian Data Analysis • BDA deals with a set of practical methods for making inferences from the available data. • These methods use probability models to model the given data and also predict future values. • Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. For example, what is the probability that the average male height is between 70 and 80 inches or that the average female height is between 60 and 70 inches? • Identify the data, define a descriptive model, specify a prior, compute the posterior distribution, interpret the posterior distribution, and, check that the model is a reasonable description of the data.
  • 3. Steps in BDA 1.BDA consists of three steps: 1.Setting up the prior distribution: • The domain expertise and prior knowledge is used to develop join probability distribution • for all parameters of data under consideration and also output data(which needs to be predicted) this is termed as prior distribution. 2.Setting up the posterior distribution: • After taking into account the observed data(given dataset)calculating and interpreting the appropriate posterior distribution is done.This is estimating the conditional probability distribution of the data. 3.Evaluating the fit of the model
  • 5. INTRODUCTION • Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. • Bayesian inference is an important technique in statistics, and especially in mathematical statistics. • Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. • In the philosophy of decision theory, Bayesian inference isclosely related to subjective probability, often called "Bayesianprobability".
  • 6. • Bayes theorem adjusts probabilities given new evidence in the following way: P(H0|E)= P(E|H0) P(H0)/ P(E) • Where, H0 represents the hypothesis, called a null hypothesis, inferred before new evidence • P(H0) is called the prior probability of H0. • P(E|H0) is called the conditional probability • P(E) is called the marginal probability of E: the probability of witnessing the new evidence. • P(H0|E) is called the posterior probability of H0 given E. • The factor P(E|H0) P(E) represents the impact that the evidence has on the belief in the hypothesis.
  • 7. • Multiplying the prior probability P(H0) by the factor P(E|H0)/P(E) will never the yield a probability that is greater than 1. • Since P(E) is at least as great as P(E∩H0), which equals to P(E|H0). P(H0), replacing P(E) with P(E∩H0) in the factor P(E|H0)/P(E) will yield a posterior probability of 1. • Therefore, the posterior probability could yield a probability greater than 1 only if P(E) were less than P(E∩H0) which is never true.
  • 8. LIKELIHOOD FUNCTION • The probability of E given H0, P(E|H0), can be represented as function of its second argument with its first argument held at a given value. Such a function is called likelihood function; it is a function of H0 given E. A ratio of two likelihood functions is called a likelihood ratio, ˄=L(H0|E)/L(not H0|E)=P(E|H0)/P(E| not H0) • The marginal probability, P(E), can also be represented as the sum of the product of all probabilities of mutually exclusive hypothesis and corresponding conditional probabilities: P(E| H0) P(H0)+ P(E| not H0) P(not H0)
  • 9. • As a result, we can rewrite Bayes Theorem as P(H0|E)=P(E|H0) P(H0)/P(E|H0) P(H0)+ P(E| not H0) P(not H0)- ˄P(H0)/˄P(H0)| P(not H0) • With two independent pieces of evidence E1 and E2, Bayesian inference can be applied iteratively. • We could use the first piece of evidence to calculate an initial posterior probability, and then use that posterior probability as a new prior probability to calculate a second posterior probability given the second piece of evidence.
  • 10. • Independence of evidence implies that, P(E1, E2| H0)= P(E1| H0) * P(E2|H0) P(E1, E2)= P(E1)* P(E2) P(E1, E2| not H0)= P(E1| not H0) * P(E2| not H0)
  • 11. From which bowl is the cookie? • Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Hardika picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Hardika treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Hardika picked it out of bowl #1? Bowl#1 Bowl#2
  • 12. • Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. • The precise answer is given by Bayes' theorem. Let H1 correspond to bowl #1, and H2 to bowl #2. • It is given that the bowls are identical from Hardika’s point of view, thus P(H1) = P(H2) and the two must add up to 1, so both are equal to 0.5. • The D is the observation of a plain cookie. • From the contents of the bowls, we know that P(D| H1)= 30/40= 0.75 and P(D| H2)= 20/40= 0.5
  • 13. • Bayes formula then yields, P(H1| D)= P(H1)*P(D|H1)/ P(H1)*P(D|H1)+ P(H2)*P(D|H2) = 0.5* 0.75/ 0.5*0.75+0.5*0.5 = 0.6 • Before observing the cookie, the probability that Hardika chose bowl#1 is the prior probability, P(H1) which is 0.5. After observing the cookie, we revise the probability as 0.6. • Its worth noting that our belief that observing the plain cookie should somewhat affect the prior probability P(H1) has formed the posterior probability P(H1| D), increased from 0.5 to 0.6
  • 14. • This reflects our intuition that the cookie is more likely from the bowl#1, since it has a higher ratio of plain to chocolate cookies than the other.
  • 15. PRIOR PROBABILITY DISTRIBUTION • In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p( For e.g. suppose p is the proportion of voters who will vote for Mr. Narendra Modi in a future election) is the probability distribution that would express ones uncertainty about p before the data (For e.g. an election poll) are taken into account. • It is meant to attribute uncertainty rather than randomness to the uncertain quantity.
  • 16. INTRODUCTION TO NAIVE BAYES • Suppose your data consist of fruits, described by their color and shape. • Bayesian classifiers operate by saying “If you see a fruit that is red and round, which type of fruit most likely to be, based on the observed data sample? In future, classify red and round fruit as that type of fruit.” • A difficulty arises when you have more than a few variables and classes- you would require an enormous number of observations to estimate these probabilities.
  • 17. • Naïve Bayes classifier assume that the effect of a variable value on a given class is independent of the values of other variable. • This assumption is called class conditional independence. • It is made to simplify the computation and in this sense considered to be Naïve.
  • 18. APPLICATIONS 1. Computer applications • Bayesian inference has applications in artificial intelligence and expert systems. • Bayesian inference techniques have been a fundamental part of computerized pattern recognition techniques since the late 1950s. • Recently Bayesian inference has gained popularity among the phylogenetics community for these reasons; a number of applications allow many demographic and evolutionary parameters to be estimated simultaneously.
  • 19.
  • 20. 2. Bioinformatics applications • Bayesian inference has been applied in different Bioinformatics applications, including differentially gene expression analysis, single-cell classification, cancer subtyping, and etc.
  • 21. ADVANTAGES • Including prediction. • Including good information should improve structure can allow the method to incorporate more data (for example, hierarchical modeling allows partial pooling so that external data can be included in a model even if these external data share only some characteristics with the current data being modeled).
  • 22. DISADVANTAGES • If the prior information is wrong, it can send inferences in the wrong direction. • Bayes inference combines different sources of information; thus it is no longer an encapsulation of a particular dataset (which is sometimes desired, for reasons that go beyond immediate predictive accuracy and instead touch on issues of statistical communication).
  • 23. Naïve Bayesian Classifier CS 40003: Data Analytics 23 Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which is as follows. From Bayes’ theorem on conditional probability, we have 𝑃 𝑌 𝑋 = 𝑃(𝑋|𝑌)∙𝑃(𝑌) 𝑃(𝑋) = 𝑃(𝑋|𝑌) ∙ 𝑃(𝑌) 𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘 where, 𝑃 𝑋 = σ𝑖=1 𝑘 𝑃(𝑋|𝑌 = 𝑦𝑖) ∙ 𝑃(Y = 𝑦𝑖) Note:  𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.  The probability P(Y|X) (also called class conditional probability) is therefore proportional to P(X|Y)∙ 𝑃(𝑌).  Thus, P(Y|X) can be taken as a measure of Y given that X. P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
  • 24. Naïve Bayesian Classifier CS 40003: Data Analytics 24 Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1) and ….. (𝑋𝑛= 𝑥𝑛)). There are any two class conditional probabilities namely P(Y= 𝑦𝑖|X=x) and P(Y= 𝑦𝑗 | X=x). If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗 for the instance X = x. The strongest 𝑦𝑖 is the classification for the instance X = x.
  • 25. Naive Bayesian Classifier 25 Example: With reference to the Air Traffic Dataset mentioned earlier, let us tabulate all the posterior and prior probabilities as shown below. Class Attribute On Time Late Very Late Cancelled Day Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0 Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1 Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0 Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0 Season Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0 Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0 Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0 Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
  • 26. Naïve Bayesian Classifier CS 40003: Data Analytics 26 Instance: Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013 Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125 Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222 Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000 Case3 is the strongest; Hence correct classification is Very Late Week Day Winter High Heavy ???
  • 27. Naïve Bayesian Classifier CS 40003: Data Analytics 27 Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion values (to posterior probabilities) Input: Given a set of k mutually exclusive and exhaustive classes C = 𝑐1, 𝑐2, … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck). There are n-attribute set A = 𝐴1, 𝐴2, … . . , 𝐴𝑛 , which for a given instance have values 𝐴1 = 𝑎1, 𝐴2 = 𝑎2,….., 𝐴𝑛 = 𝑎𝑛 Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k 𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑗=1 𝑛 𝑃(𝐴𝑗 = 𝑎𝑗|𝐶𝑖) 𝑝𝑥 = max 𝑝1, 𝑝2, … . . , 𝑝𝑘 Output: 𝐶𝑥 is the classification Algorithm: Naïve Bayesian Classification
  • 28. Naïve Bayesian Classifier CS 40003: Data Analytics 28 Pros and Cons The Naïve Bayes’ approach is a very popular one, which often works well. However, it has a number of potential problems It relies on all attributes being categorical. If the data is less, then it estimates poorly.
  • 29. Naïve Bayesian Classifier CS 40003: Data Analytics 29 Approach to overcome the limitations in Naïve Bayesian Classification Estimating the posterior probabilities for continuous attributes In real life situation, all attributes are not necessarily be categorical, In fact, there is a mix of both categorical and continuous attributes. In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier. 1. We can discretize each continuous attributes and then replace the continuous values with its corresponding discrete intervals. 2. We can assume a certain form of probability distribution for the continuous variable and estimate the parameters of the distribution using the training data. A Gaussian distribution is usually chosen to represent the posterior probabilities for continuous attributes. A general form of Gaussian distribution will look like P x: μ, σ2 = 1 2πσ e− x − μ 2 2σ2 where, μ and σ2 denote mean and variance, respectively.
  • 30. Naïve Bayesian Classifier CS 40003: Data Analytics 30 For each class Ci, the posterior probabilities for attribute Aj (it is the numeric attribute) can be calculated following Gaussian normal distribution as follows. P Aj = aj|Ci = 1 2πσij e− aj − μij 2 2σij 2 Here, the parameter μij can be calculated based on the sample mean of attribute value of Aj for the training records that belong to the class Ci. Similarly, σij 2 can be estimated from the calculation of variance of such training records.