SlideShare a Scribd company logo
1 of 13
zekeLabs
Naive Bayes
“Goal - Become a Data Scientist”
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
“A Goal without a Plan is just a wish”
● Introduction to Probability
● Conditional Probability
● Independence Events
● Bayes’ Theorem
● Estimations - MLE, MAP
● Joint Probability
● Naive Bayes’
● Gaussian NB
Agenda
Probability - The chance
● How likely something is to happen
● Probability is quantified as a number between 0 and 1
● What is the probability of getting 6 out of a dice roll?
○ Not biased, equal chance
○ Probability of getting any number is 1/6
Conditional Probability
● Dependence of event A on B
● P(A and B) is joint probability
● Probability of A given event B has happened
2
4
5 1 2 3
5
1
4
2
3
Independent Events
● Happening of event A doesn’t depend on event B
● So, the joint probability is product of
individual probabilities
Bayes’ Theorem
● describes the probability of an event, based on prior knowledge
P(A | B) - Conditional Probability; Posterior
● P(B | A) - Conditional Probability; Likelihood
● P(A) and P(B) - Marginal probabilities; probability
Joint Probability Distribution
Gender Hours_Worked Wealth Probabilities
Female
<40.5
poor 0.253122
rich 0.0245895
>40.5
poor 0.0421768
rich 0.0116293
Male
<40.5
poor 0.331313
rich 0.0971295
>40.5
poor 0.134106
rich 0.105933
Total Probability 0.9999991
Gender
Hours
Worked
P(rich | G,HW) P(poor | G,HW)
F <40.5 0.09 0.91
F >40.5 0.21 0.79
M <40.5 0.23 0.77
M >40.5 0.38 0.62
To learn P(Y | X1, X2) we need
2^n estimates here
How P(Y | X1, X2)s are
calculated?
Maximum Likelihood Estimation
● Data: Observed set of D of “h” Heads and “t” Tails
P(D|𝜽) = P(h,t|𝜽) = 𝜽^h(1-𝜽^t)
● Optimization problem: Learning 𝜽
● Objective function:
○ MLE: Choose 𝜽 that maximizes the probability of observed data
𝜽c = arg max P(D|𝜽)
𝜽c = h/(h+t)
Maximise A Posteriori
● MLE is not a good estimate in case of less data
● Prior information about parameter is required for better estimate
● P(𝜽) is the prior information
● Prior is assumed to be Beta distribution
P(𝜽|D) ∝ P(D|𝜽)P(𝜽)
𝜽c = [h+𝛃1] / [(h+𝛃1)+(t+𝛃2)]
𝛃1 = Prior information about heads
𝛃2 = Prior information about tails
Naive Bayes’ The Hero
● Have less no. of estimators, how?
● Assumption of Conditional Independence
P( X1…..Xn | Y ) = 𝚷 P(Xi | Y)
Conditioned on Y and X1 to Xn are independent
P( X1…..Xn | Y ) = P(X1 | Y) P(X2 | Y) P(X3 | Y)...P(Xn | Y)
● If ‘Xi’ is a binary feature then 2n+1 parameters to estimate
Pros of Naive Bayes’
● In spite of over-simplified assumptions, naive Bayes classifiers have worked
quite well
● Document classification and Spam filtering
● Requires small amount of training data to estimate the necessary
parameters
● Extremely fast compared to more sophisticated methods
Gaussian NB
● Continuous-valued features
● Conditional probability often modeled with Gaussian Distribution

More Related Content

More from zekeLabs Technologies

Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KuberneteszekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldzekeLabs Technologies
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist zekeLabs Technologies
 

More from zekeLabs Technologies (20)

Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Master guide to become a data scientist
Master guide to become a data scientist Master guide to become a data scientist
Master guide to become a data scientist
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 

Recently uploaded

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Naive Bayes

  • 2. “Goal - Become a Data Scientist” “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett “The Plan” “A Goal without a Plan is just a wish”
  • 3. ● Introduction to Probability ● Conditional Probability ● Independence Events ● Bayes’ Theorem ● Estimations - MLE, MAP ● Joint Probability ● Naive Bayes’ ● Gaussian NB Agenda
  • 4. Probability - The chance ● How likely something is to happen ● Probability is quantified as a number between 0 and 1 ● What is the probability of getting 6 out of a dice roll? ○ Not biased, equal chance ○ Probability of getting any number is 1/6
  • 5. Conditional Probability ● Dependence of event A on B ● P(A and B) is joint probability ● Probability of A given event B has happened 2 4 5 1 2 3 5 1 4 2 3
  • 6. Independent Events ● Happening of event A doesn’t depend on event B ● So, the joint probability is product of individual probabilities
  • 7. Bayes’ Theorem ● describes the probability of an event, based on prior knowledge P(A | B) - Conditional Probability; Posterior ● P(B | A) - Conditional Probability; Likelihood ● P(A) and P(B) - Marginal probabilities; probability
  • 8. Joint Probability Distribution Gender Hours_Worked Wealth Probabilities Female <40.5 poor 0.253122 rich 0.0245895 >40.5 poor 0.0421768 rich 0.0116293 Male <40.5 poor 0.331313 rich 0.0971295 >40.5 poor 0.134106 rich 0.105933 Total Probability 0.9999991 Gender Hours Worked P(rich | G,HW) P(poor | G,HW) F <40.5 0.09 0.91 F >40.5 0.21 0.79 M <40.5 0.23 0.77 M >40.5 0.38 0.62 To learn P(Y | X1, X2) we need 2^n estimates here How P(Y | X1, X2)s are calculated?
  • 9. Maximum Likelihood Estimation ● Data: Observed set of D of “h” Heads and “t” Tails P(D|𝜽) = P(h,t|𝜽) = 𝜽^h(1-𝜽^t) ● Optimization problem: Learning 𝜽 ● Objective function: ○ MLE: Choose 𝜽 that maximizes the probability of observed data 𝜽c = arg max P(D|𝜽) 𝜽c = h/(h+t)
  • 10. Maximise A Posteriori ● MLE is not a good estimate in case of less data ● Prior information about parameter is required for better estimate ● P(𝜽) is the prior information ● Prior is assumed to be Beta distribution P(𝜽|D) ∝ P(D|𝜽)P(𝜽) 𝜽c = [h+𝛃1] / [(h+𝛃1)+(t+𝛃2)] 𝛃1 = Prior information about heads 𝛃2 = Prior information about tails
  • 11. Naive Bayes’ The Hero ● Have less no. of estimators, how? ● Assumption of Conditional Independence P( X1…..Xn | Y ) = 𝚷 P(Xi | Y) Conditioned on Y and X1 to Xn are independent P( X1…..Xn | Y ) = P(X1 | Y) P(X2 | Y) P(X3 | Y)...P(Xn | Y) ● If ‘Xi’ is a binary feature then 2n+1 parameters to estimate
  • 12. Pros of Naive Bayes’ ● In spite of over-simplified assumptions, naive Bayes classifiers have worked quite well ● Document classification and Spam filtering ● Requires small amount of training data to estimate the necessary parameters ● Extremely fast compared to more sophisticated methods
  • 13. Gaussian NB ● Continuous-valued features ● Conditional probability often modeled with Gaussian Distribution