Bayesianclassifiers

Bayesian Classiﬁers
October 10, 2019
Amit Praseed Bayesian Classiﬁers October 10, 2019 1 / 14

Bayesian Classifiers
They are statistical classifiers.
Based primarily on the Bayes’ Theorem.
In Bayesian terms, every tuple X is called Evidence.
Let H be some hypothesis such as that X belongs to a specified class
C.
For classification problems, we want to determine P(H|X), the proba-
bility that the hypothesis H holds given X.
Simple put, we are looking for the probability that X belongs to class C,
given that we know the attribute description of X, or we are computing
P(C|X).
After computing P(Ci |X) for all classes Ci , i = 1..n, we simply assign
X to the class which has the highest value of P(Ci |X).

Bayes’ Theorem
P(C|X) =
P(X, C)
P(X)
=
P(X|C)P(C)
P(X)
We need to maximize P(C|X), and ﬁnd the class Ci which maximizes
this value. This is the class of X.
Since P(X) is constant across all classes, we can reduce the problem
as follows:
maximize P(Ci |X) = P(X|Ci )P(Ci )
However, computing P(X|Ci ) is incredibly complex for large datasets
involving a large number of attributes or dimensions.

The Independence Assumption and Naive Bayes Classifier
Class Conditional Independence
For simplicity, it can be assumed that the effect of an attribute value in X
on a given class C is independent of the values of the other attributes. With
this assumption,
P(X|C) = P(x1|C)P(x2|C)...P(xn|C)
The Bayesian Classifier that makes use of the Class Conditional Inde-
pendence assumption is called Naive Bayes Classifier.
These classifiers are extremely simple and effective in a number of
situations.
However, they do fail in situations where class conditional independence
cannot be assumed.

AllElectronics Customer Database
RID Age Income Student Credit Rating Class:Buy?
1 youth high no fair no
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Let X = (age = youth, income = medium, student = yes, credit = fair)

Computing the Posterior Probabilities
P(age = youth|Cyes ) =
2
9
P(income = medium|Cyes ) =
4
9
P(student = yes|Cyes ) =
6
9
P(credit = fair|Cyes ) =
6
9
P(Cyes ) =
9
14

Computing the Posterior Probabilities
P(age = youth|Cno) =
3
5
P(income = medium|Cno) =
2
5
P(student = yes|Cno) =
1
5
P(credit = fair|Cno) =
2
5
P(Cno) =
5
14

Removing the Independence Assumption
In a number of real world applications, a subset of attributes will be
dependent on each other which leads the Naive Bayes classiﬁer to give
inferior results.
However, the original version of the Bayes Equation can still be used
to compute the probabilities.
P(C|X) =
P(X, C)
P(X)
P(A1, A2...An) can be computed using the Chain Rule.
P(A1, A2...An) = P(A1|A2, A3...An)P(A2|A3...An)...P(An−1|An)P(An)
The main issue that arises here is computing P(X, C) which can easily
swell upto a large number of terms for a moderately sized dataset.

Bayesian Belief Networks
Bayesian Belief Networks (BBN) are probabilistic graphical models used
to represent a set of attributes and their dependencies using a Directed
Acyclic Graph (DAG).

Conditional Independence
A node in a BBN is said to
be conditionally independent
of its non-descendents given its
parents.
Lung Tumour is conditionally
independent of Exposure to
Toxins and Smoking, given
Cancer.
Exposure to Toxins and
Smoking are conditionally
independent of each other,
in the absence of cancer.

Prediction in Bayesian Belief Networks
We wish to ﬁnd the probability of Cancer given that the person has
Lung Tumour and a history of smoking i.e. X = {LT = T, S = T}.
In the normal computation of Bayesian Probability, we will encounter
the term P(LT, C, S, ET) which requires 24 = 16 combinations to
compute completely.
The presence of conditional independence simpliﬁes the computation
to the following:
P(LT, C, S, ET) = P(LT|C)P(C|ET, S)P(ET)P(S)
= P(LT|C)P(S)
ET
P(C|ET, S)P(ET)

The Most Famous Application
The Microsoft Oﬃce Assis-
tant nicknamed ”Clippy” was a
prominent feature in MS Oﬃce
’97-’03.
It was implemented partly using
Bayesian Belief Networks.

Bayesianclassifiers

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Bayesianclassifiers