1. A N I N T R O D U C T I O N T O B A Y E S I A N B E L I E F
N E T W O R K S A N D N A Ï V E B A Y E S I A N
C L A S S I F I C A T I O N
A D N A N M A S O O D
S C I S . N O V A . E D U / ~ A D N A N
A D N A N @ N O V A . E D U
Belief Networks &
Bayesian Classification
2. Overview
Probability and Uncertainty
Probability Notation
Bayesian Statistics
Notation of Probability
Axioms of Probability
Probability Table
Bayesian Belief Network
Joint Probability Table
Probability of Disjunctions
Conditional Probability
Conditional Independence
Bayes' Rule
Classification with Bayes rule
Bayesian Classification
Conclusion & Further Reading
3. Probability and Uncertainty
Probability provide a way of summarizing the uncertainty.
60% chance of rain today
85% chance of alarm in case of a burglary
Probability is calculated based upon past performance, or
degree of belief.
4. Bayesian Statistics
Three approaches to Probability
Axiomatic
Probability by definition and properties
Relative Frequency
Repeated trials
Degree of belief (subjective)
Personal measure of uncertainty
Examples
The chance that a meteor strikes earth is 1%
The probability of rain today is 30%
The chance of getting an A on the exam is 50%
8. Probability Table
P(Weather= sunny)=P(sunny)=5/13
P(Weather)={5/14, 4/14, 5/14}
Calculate probabilities from data
sunny overcast rainy
5/14 4/14 5/14
Outlook
9. An expert built belief network using weather
dataset(Mitchell; Witten & Frank)
Bayesian inference can help answer questions like probability of
game play if
a. Outlook=sunny, Temperature=cool, Humidity=high,
Wind=strong
b. Outlook=overcast, Temperature=cool, Humidity=high,
Wind=strong
10. Bayesian Belief Network
Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships
Several cases of learning Bayesian belief networks
• Given both network structure and all the variables: easy
• Given network structure but only some variables
• When the network structure is not known in advance
11. Bayesian Belief Network
Family
History
Smoker
Lung Cancer Emphysema
Positive X Ray Dyspnea
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
Bayesian Belief Network
The conditional probability table
for the variable Lung Cancer
16. Conditional Probability
Probabilities discussed so far are called prior probabilities
or unconditional probabilities
Probabilities depend only on the data, not on any other variable
But what if you have some evidence or knowledge about the
situation? You know have a toothache. Now what is the
probability of having a cavity?
20. The independence hypothesis…
… makes computation possible
… yields optimal classifiers when satisfied
… but is seldom satisfied in practice, as attributes
(variables) are often correlated.
Attempts to overcome this limitation:
• Bayesian networks, that combine Bayesian reasoning with
causal relationships between attributes
• Decision trees, that reason on one attribute at the time,
considering most important attributes first
26. Bayesian Classification: Why?
Probabilistic learning: Computation of explicit
probabilities for hypothesis, among the most practical
approaches to certain types of learning problems
Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct. Prior knowledge can be combined with
observed data.
Probabilistic prediction: Predict multiple
hypotheses, weighted by their probabilities
Benchmark: Even if Bayesian methods are
computationally intractable, they can provide a
benchmark for other algorithms
27. Classification with Bayes Rule
Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics
28. Issues with naïve Bayes
Change in Classifier Data (on the fly, during classification)
Conditional independence assumption is violated
Consider the task of classifying whether or not a certain word is
corporation name
E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”
Two useful features we might want to use are capitalized, and all-
capitals
Native Bayes will assume that these two features are independent
given the class, but this clearly isn’t the case (things that are all-caps
must also be capitalized )!!
However naïve Bayes seems to work well in practice even
when this assumption is violated
30. Naive Bayesian Classifier
Given a training set, we can compute the probabilities
Outlook P N
Sunny 2/9 3/5
Overcast 4/9 0
rain 3/9 2/5
Temperature
Hot 2/9 2/5
Mild 4/9 2/5
cool 3/9 1/5
Humidity P N
High 3/9 4/5
normal 6/9 1/5
Windy
true 3/9 3/5
false 6/9 2/5
37. References
J. Han, M. Kamber; Data Mining; Morgan Kaufmann Publishers: San
Francisco, CA.
Bayesian Networks without Tears. | Charniak | AI Magazine
http://www.aaai.org/ojs/index.php/aimagazine/article/view/918
Bayesian networks - Automated Reasoning Group – UCLA – Adnan
Darwiche