Naive Bayes

“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”

● Introduction to Probability
● Conditional Probability
● Independence Events
● Bayes’ Theorem
● Estimations - MLE, MAP
● Joint Probability
● Naive Bayes’
● Gaussian NB
Agenda

Probability - The chance
● How likely something is to happen
● Probability is quantified as a number between 0 and 1
● What is the probability of getting 6 out of a dice roll?
○ Not biased, equal chance
○ Probability of getting any number is 1/6

Conditional Probability
● Dependence of event A on B
● P(A and B) is joint probability
● Probability of A given event B has happened
2
4
5 1 2 3
5
1
4
2
3

Independent Events
● Happening of event A doesn’t depend on event B
● So, the joint probability is product of
individual probabilities

Bayes’ Theorem
● describes the probability of an event, based on prior knowledge
P(A | B) - Conditional Probability; Posterior
● P(B | A) - Conditional Probability; Likelihood
● P(A) and P(B) - Marginal probabilities; probability

Joint Probability Distribution
Gender Hours_Worked Wealth Probabilities
Female
<40.5
poor 0.253122
rich 0.0245895
>40.5
poor 0.0421768
rich 0.0116293
Male
<40.5
poor 0.331313
rich 0.0971295
>40.5
poor 0.134106
rich 0.105933
Total Probability 0.9999991
Gender
Hours
Worked
P(rich | G,HW) P(poor | G,HW)
F <40.5 0.09 0.91
F >40.5 0.21 0.79
M <40.5 0.23 0.77
M >40.5 0.38 0.62
To learn P(Y | X1, X2) we need
2^n estimates here
How P(Y | X1, X2)s are
calculated?

Maximum Likelihood Estimation
● Data: Observed set of D of “h” Heads and “t” Tails
P(D|𝜽) = P(h,t|𝜽) = 𝜽^h(1-𝜽^t)
● Optimization problem: Learning 𝜽
● Objective function:
○ MLE: Choose 𝜽 that maximizes the probability of observed data
𝜽c = arg max P(D|𝜽)
𝜽c = h/(h+t)

Maximise A Posteriori
● MLE is not a good estimate in case of less data
● Prior information about parameter is required for better estimate
● P(𝜽) is the prior information
● Prior is assumed to be Beta distribution
P(𝜽|D) ∝ P(D|𝜽)P(𝜽)
𝜽c = [h+𝛃1] / [(h+𝛃1)+(t+𝛃2)]
𝛃1 = Prior information about heads
𝛃2 = Prior information about tails

Pros of Naive Bayes’
● In spite of over-simplified assumptions, naive Bayes classifiers have worked
quite well
● Document classification and Spam filtering
● Requires small amount of training data to estimate the necessary
parameters
● Extremely fast compared to more sophisticated methods

Gaussian NB
● Continuous-valued features
● Conditional probability often modeled with Gaussian Distribution

Naive Bayes

Recommended

Recommended

More Related Content

More from zekeLabs Technologies

More from zekeLabs Technologies (20)

Recently uploaded

Recently uploaded (20)

Naive Bayes