A short presentation on Bayesian Classifiers. The PPT contains an example of Naive Bayes Classifier along with a short introduction to Bayesian Belief Networks
2. Bayesian Classifiers
They are statistical classifiers.
Based primarily on the Bayes’ Theorem.
In Bayesian terms, every tuple X is called Evidence.
Let H be some hypothesis such as that X belongs to a specified class
C.
For classification problems, we want to determine P(H|X), the proba-
bility that the hypothesis H holds given X.
Simple put, we are looking for the probability that X belongs to class C,
given that we know the attribute description of X, or we are computing
P(C|X).
After computing P(Ci |X) for all classes Ci , i = 1..n, we simply assign
X to the class which has the highest value of P(Ci |X).
Amit Praseed Bayesian Classifiers October 10, 2019 2 / 14
3. Bayes’ Theorem
P(C|X) =
P(X, C)
P(X)
=
P(X|C)P(C)
P(X)
We need to maximize P(C|X), and find the class Ci which maximizes
this value. This is the class of X.
Since P(X) is constant across all classes, we can reduce the problem
as follows:
maximize P(Ci |X) = P(X|Ci )P(Ci )
However, computing P(X|Ci ) is incredibly complex for large datasets
involving a large number of attributes or dimensions.
Amit Praseed Bayesian Classifiers October 10, 2019 3 / 14
4. The Independence Assumption and Naive Bayes Classifier
Class Conditional Independence
For simplicity, it can be assumed that the effect of an attribute value in X
on a given class C is independent of the values of the other attributes. With
this assumption,
P(X|C) = P(x1|C)P(x2|C)...P(xn|C)
The Bayesian Classifier that makes use of the Class Conditional Inde-
pendence assumption is called Naive Bayes Classifier.
These classifiers are extremely simple and effective in a number of
situations.
However, they do fail in situations where class conditional independence
cannot be assumed.
Amit Praseed Bayesian Classifiers October 10, 2019 4 / 14
5. AllElectronics Customer Database
RID Age Income Student Credit Rating Class:Buy?
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Let X = (age = youth, income = medium, student = yes, credit = fair)
Amit Praseed Bayesian Classifiers October 10, 2019 5 / 14
7. Computing the Posterior Probabilities
RID Age Income Student Credit Rating Class:Buy?
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
P(age = youth|Cyes ) =
2
9
P(income = medium|Cyes ) =
4
9
P(student = yes|Cyes ) =
6
9
P(credit = fair|Cyes ) =
6
9
P(Cyes ) =
9
14
Amit Praseed Bayesian Classifiers October 10, 2019 7 / 14
8. Computing the Posterior Probabilities
RID Age Income Student Credit Rating Class:Buy?
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
P(age = youth|Cno) =
3
5
P(income = medium|Cno) =
2
5
P(student = yes|Cno) =
1
5
P(credit = fair|Cno) =
2
5
P(Cno) =
5
14
Amit Praseed Bayesian Classifiers October 10, 2019 8 / 14
9. Assigning the Class
Let X = (age = youth, income = medium, student = yes, credit = fair)
P(Cyes |X) = P(X|Cyes )P(Cyes )
= P(age = youth|Cyes )P(income = medium|Cyes )P(student = yes|Cyes )
P(credit = fair|Cyes )P(Cyes )
= 0.028
P(Cno|X) = P(X|Cno)P(Cno)
= P(age = youth|Cno)P(income = medium|Cno)P(student = yes|Cno)
P(credit = fair|Cno)P(Cno)
= 0.0068
Since P(Cyes |X) > P(Cno|X), X is assigned to Class YES.
Amit Praseed Bayesian Classifiers October 10, 2019 9 / 14
10. Removing the Independence Assumption
In a number of real world applications, a subset of attributes will be
dependent on each other which leads the Naive Bayes classifier to give
inferior results.
However, the original version of the Bayes Equation can still be used
to compute the probabilities.
P(C|X) =
P(X, C)
P(X)
P(A1, A2...An) can be computed using the Chain Rule.
P(A1, A2...An) = P(A1|A2, A3...An)P(A2|A3...An)...P(An−1|An)P(An)
The main issue that arises here is computing P(X, C) which can easily
swell upto a large number of terms for a moderately sized dataset.
Amit Praseed Bayesian Classifiers October 10, 2019 10 / 14
11. Bayesian Belief Networks
Bayesian Belief Networks (BBN) are probabilistic graphical models used
to represent a set of attributes and their dependencies using a Directed
Acyclic Graph (DAG).
Amit Praseed Bayesian Classifiers October 10, 2019 11 / 14
12. Conditional Independence
A node in a BBN is said to
be conditionally independent
of its non-descendents given its
parents.
Lung Tumour is conditionally
independent of Exposure to
Toxins and Smoking, given
Cancer.
Exposure to Toxins and
Smoking are conditionally
independent of each other,
in the absence of cancer.
Amit Praseed Bayesian Classifiers October 10, 2019 12 / 14
13. Prediction in Bayesian Belief Networks
We wish to find the probability of Cancer given that the person has
Lung Tumour and a history of smoking i.e. X = {LT = T, S = T}.
In the normal computation of Bayesian Probability, we will encounter
the term P(LT, C, S, ET) which requires 24 = 16 combinations to
compute completely.
The presence of conditional independence simplifies the computation
to the following:
P(LT, C, S, ET) = P(LT|C)P(C|ET, S)P(ET)P(S)
= P(LT|C)P(S)
ET
P(C|ET, S)P(ET)
Amit Praseed Bayesian Classifiers October 10, 2019 13 / 14
14. The Most Famous Application
The Microsoft Office Assis-
tant nicknamed ”Clippy” was a
prominent feature in MS Office
’97-’03.
It was implemented partly using
Bayesian Belief Networks.
Amit Praseed Bayesian Classifiers October 10, 2019 14 / 14