SlideShare a Scribd company logo
1 of 35
Download to read offline
Naïve Bayes’ Classifier
Python Session
Dr. Mostafa A. Elhosseini
Agenda
≡Naïve Bayes’s Theorem
▪ Examples test have a disease
≡Python session
▪ Example
▪ Categorical features
▪ Continues variable - Non categorical attribute
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 2
Example: Bayes’s Theorem
≡ Suppose a certain disease has an incidence rate of 0.1% (that is, it
afflicts 0.1% of the population). A test has been devised to detect this
disease. The test does not produce false negatives (that is, anyone
who has the disease will test positive for it), but the false positive
rate is 5% (that is, about 5% of people who take the test will test
positive, even though they do not have the disease).
≡ Suppose a randomly selected person takes the test and tests positive.
What is the probability that this person actually has the disease?
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 3
Example: Bayes’s Theorem
≡ The disease has an incidence rate of 0.1%, we could write P(disease)
= 0.001
≡ Everyone who has the disease will test positive, or alternatively
everyone who tests negative does not have the disease. (We could
also say P(positive | disease) = 1.)
≡ about 5% of people who take the test will test positive, even though
they do not have the disease P(positive | no disease) = 0.05.)
≡ Here we want to compute P(disease|positive)
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 4
Example: Bayes’s Theorem
≡ First, suppose we randomly select 1000 people and administer the test
≡ Only 1 of 1000 test subjects actually has the disease; the other 999 do not.
≡ We also know that 5% of all people who do not have the disease will test
positive. There are 999 disease-free people, so we would expect
(0.05)(999) = 49.95 (so, about 50) people to test positive who do not have
the disease.
≡ There are 51 people who test positive in our example (the one
unfortunate person who actually has the disease, plus the 50 people who
tested positive but don’t). Only one of these people has the disease, so
≡ P(disease | positive) ≈ 1/51≈ 0.0196, or less than 2%.
≡ This means that of all people who test positive, over 98% do not have the
disease.
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 5
Example: Bayes’s Theorem
≡ 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑖𝑣𝑒 =
𝑝(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝑑𝑖𝑠𝑒𝑎𝑠𝑒)𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
𝑝 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 +𝑝 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝 𝑛𝑜 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
≡ 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑖𝑣𝑒 =
1×0.001
1×0.001+0.05∗0.999
= 0.0196
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 6
Example
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 7
Example – One feature
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 8
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 9
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 10
Handling Text and Categorical Attributes
Ꚛ Most Machine Learning algorithms prefer to work with numbers anyway, so let’s
convert these text labels to numbers.
Ꚛ Scikit-Learn provides a transformer for this task called LabelEncoder
Ꚛ One issue with this representation is that ML algorithms will assume that two
nearby values are more similar than two distant values
Ꚛ To fix this issue, a common solution is to create one binary attribute per
category: one attribute equal to 1 (and 0 otherwise)
▪ This is called one-hot encoding
Ꚛ Scikit-Learn provides a OneHotEncoder encoder to convert integer categorical
values into one-hot vectors
Ꚛ We can apply both transformations (from text categories to integer categories,
then from integer categories to one-hot vectors) in one shot using the
LabelBinarizer class
Custom Transformers
Ꚛ Although Scikit-Learn provides many useful transformers, you will need to
write your own for tasks such as custom cleanup operations or combining
specific attributes.
Ꚛ You will want your transformer to work seamlessly with Scikit-Learn
functionalities (such as pipelines)
Ꚛ hyperparameter will allow you to easily find out whether adding this
attribute helps the Machine Learning algorithms or not.
Ꚛ More generally, you can add a hyperparameter to gate any data
preparation step that you are not 100% sure about.
Ꚛ The more you automate these data preparation steps, the more
combinations you can automatically try out, making it much more likely
that you will find a great combination (and saving you a lot of time).
Python
https://colab.research.google.com/drive/1tu3_CWRnl9aylppme0s4cN9-M3w6nj7F#scrollTo=il4fDyb7vcwr
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 13
▪ Given the dataset below: Using Naïve Bayes classifier, what is the
classifier output for this instance
Outlook Temp Humidity Windy Play
Rainy Hot Normal True ?
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Sunny Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 14
Feature
Play or Not?
Yes Probability No Probability
Outlook
Sunny 3 3/9 3 3/5
Overcast 4 4/9 0 0
Rainy 2 2/9 2 2/5
Temperature
Hot 2 2/9 2 2/5
Mild 4 4/9 2 2/5
Cool 3 3/9 1 1/5
Humidity
High 3 3/9 4 4/5
Normal 6 6/9 1 1/5
Windy
True 3 3/9 3 3/5
False 6 6/9 2 2/5
Probability 9/14 5/14
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Sunny Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 15
▪ To Avoid ambiguity
≡ The denominator is calculated either in the two cases: Play or Not
≡ So to simplify, we cancel the denominator
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 16
Python
https://colab.research.google.com/drive/1nVJYqUvwXVuXfmFZiAFJYTUTebXgQ17G
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 17
▪ What is the probability that instance with these attributes is classified
as Play
Outlook Temp Humidity Windy Play
Sunny 66 99 True ?
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 18
Gaussian Naïve Bayes
≡ So far we’ve seen the computations when the X’s are categorical, but
how to compute the probabilities when X is a continuous variable
≡ If we assume that x follows a particular distribution, then you can
plug in the probability density function of that distribution to
compute the probability of likelihoods
≡ Assume the X’s follows a normal distribution (aka Gaussian)
Distribution, which is fairly common, we substitute the
corresponding probability density of a normal distribution and call it
Gaussian Naïve Bayes
𝑓 𝑥 =
1
2𝜋𝜎
𝑒
−
𝑥−𝑚 2
2𝜎2
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 19
Categorical dataset // Numerical dataset
≡ In the previous example, your
attributes are categorical
▪ Sunny / rainy / overcast / true / false /
hot / Mild
▪ What about Temperature = 68 →
must be converted to probability
≡ Calculate average (m) of these
numerical values of each attribute
with Target attribute (Yes / No Play)
≡ Calculate the standard deviation
(𝜎)
≡ Probability Density function (f)
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 20
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2 3 64,
68,
69,
70,
72
65,
71,
72,
80,
85
65,
70,
70,
75,
80
70,
85,
90,
91,
95
False 6 2
Rainy 3 2
Overcast 4 0 True
3 3
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2/9 3/5
𝑚 = 68.6
𝜎 = 2.65
𝑚 = 74.6
𝜎 = 7.06
𝑚 = 72
𝜎 = 5.06
𝑚 = 86.2
𝜎 = 8.7
False 6/9 2/5
Rainy 3/9 2/5
Overcast 4/9 0/5 True 3/9 3/5
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 21
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2 3 64,
68,
69,
70,
72
65,
71,
72,
80,
85
65,
70,
70,
75,
80
70,
85,
90,
91,
95
False 6 2
Rainy 3 2
Overcast 4 0 True
3 3
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2/9 3/5
𝑚 = 68.6
𝜎 = 2.65
𝑚 = 74.6
𝜎 = 7.06
𝑚 = 72
𝜎 = 5.06
𝑚 = 86.2
𝜎 = 8.7
False 6/9 2/5
Rainy 3/9 2/5
Overcast 4/9 0/5 True 3/9 3/5
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 22
Example: only continuous variables
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 23
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 24
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 25
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 26
Python session
https://colab.research.google.com/drive/1FaBhZXjvu9rFhv2_sZm0lgauXa4TnSsi
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 27
Pima Indian Dataset
≡ This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
≡ The objective of the dataset is to diagnostically predict whether or not a
patient has diabetes, based on certain diagnostic measurements included
in the dataset.
≡ Several constraints were placed on the selection of these instances from a
larger database. In particular, all patients here are females at least 21 years
old of Pima Indian heritage.
≡ The datasets consists of several medical predictor variables and one target
variable, Outcome. Predictor variables includes the number of
pregnancies the patient has had, their BMI, insulin level, age, and so on.
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 28
Pima Indian Dataset
≡ This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
≡ BloodPressure: Diastolic blood pressure (mm Hg)
≡ SkinThicknessTriceps: skin fold thickness (mm)
≡ Insulin: 2-Hour serum insulin (mu U/ml)
≡ BMI: Body mass index (weight in kg/(height in m)^2)
≡ DiabetesPedigreeFunction: Diabetes pedigree function
≡ Age: Age (years)
≡ OutcomeClass variable (0 or 1) 268 of 768 are 1, the others are 0
≡ https://www.kaggle.com/uciml/pima-indians-diabetes-database
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 29
Pima Indian Dataset
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 30
Python session - Pima Indian Dataset
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 31
Python session - Pima Indian Dataset
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 32
Python session - Pima Indian Dataset
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 33
Python session - Pima Indian Dataset
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 34
https://colab.research.google.com/drive/1AE4N6OH95Gp235V_7qiSh2Oad79g0Ixo
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 35

More Related Content

Similar to Naive bayes classifier python session

Similar to Naive bayes classifier python session (20)

Probablity & queueing theory basic terminologies & applications
Probablity & queueing theory basic terminologies & applicationsProbablity & queueing theory basic terminologies & applications
Probablity & queueing theory basic terminologies & applications
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
Causal Inference for Everyone
Causal Inference for EveryoneCausal Inference for Everyone
Causal Inference for Everyone
 
Basic Concepts of Probability
Basic Concepts of ProbabilityBasic Concepts of Probability
Basic Concepts of Probability
 
Navies bayes
Navies bayesNavies bayes
Navies bayes
 
Probably, Definitely, Maybe
Probably, Definitely, MaybeProbably, Definitely, Maybe
Probably, Definitely, Maybe
 
Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015
 
Hypothesis Testing With Python
Hypothesis Testing With PythonHypothesis Testing With Python
Hypothesis Testing With Python
 
Probability&Bayes theorem
Probability&Bayes theoremProbability&Bayes theorem
Probability&Bayes theorem
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
basic probability Lecture 9.pptx
basic probability Lecture 9.pptxbasic probability Lecture 9.pptx
basic probability Lecture 9.pptx
 
Machine Learning using biased data
Machine Learning using biased dataMachine Learning using biased data
Machine Learning using biased data
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Notes_5.2.pptx
Notes_5.2.pptxNotes_5.2.pptx
Notes_5.2.pptx
 
Notes_5.2 (1).pptx
Notes_5.2 (1).pptxNotes_5.2 (1).pptx
Notes_5.2 (1).pptx
 
Notes_5.2 (1).pptx
Notes_5.2 (1).pptxNotes_5.2 (1).pptx
Notes_5.2 (1).pptx
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 

More from Mostafa El-Hosseini

More from Mostafa El-Hosseini (16)

why now Deep Neural Networks?
why now Deep Neural Networks?why now Deep Neural Networks?
why now Deep Neural Networks?
 
Activation functions types
Activation functions typesActivation functions types
Activation functions types
 
Why activation function
Why activation functionWhy activation function
Why activation function
 
Logistic Regression (Binary Classification)
Logistic Regression (Binary Classification)Logistic Regression (Binary Classification)
Logistic Regression (Binary Classification)
 
Model validation and_early_stopping_-_shooting
Model validation and_early_stopping_-_shootingModel validation and_early_stopping_-_shooting
Model validation and_early_stopping_-_shooting
 
Lecture 19 chapter_4_regularized_linear_models
Lecture 19 chapter_4_regularized_linear_modelsLecture 19 chapter_4_regularized_linear_models
Lecture 19 chapter_4_regularized_linear_models
 
Svm rbf kernel
Svm rbf kernelSvm rbf kernel
Svm rbf kernel
 
Lecture 23 support vector classifier
Lecture 23  support vector classifierLecture 23  support vector classifier
Lecture 23 support vector classifier
 
Lecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixLecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrix
 
Lecture 11 linear regression
Lecture 11 linear regressionLecture 11 linear regression
Lecture 11 linear regression
 
Numpy 02
Numpy 02Numpy 02
Numpy 02
 
Ga
GaGa
Ga
 
Numpy 01
Numpy 01Numpy 01
Numpy 01
 
Lecture 08 prepare the data for ml algorithm
Lecture 08 prepare the data for ml algorithmLecture 08 prepare the data for ml algorithm
Lecture 08 prepare the data for ml algorithm
 
Lecture 02 ml supervised and unsupervised
Lecture 02 ml supervised and unsupervisedLecture 02 ml supervised and unsupervised
Lecture 02 ml supervised and unsupervised
 
Lecture 01 intro. to ml and overview
Lecture 01 intro. to ml and overviewLecture 01 intro. to ml and overview
Lecture 01 intro. to ml and overview
 

Recently uploaded

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Naive bayes classifier python session

  • 1. Naïve Bayes’ Classifier Python Session Dr. Mostafa A. Elhosseini
  • 2. Agenda ≡Naïve Bayes’s Theorem ▪ Examples test have a disease ≡Python session ▪ Example ▪ Categorical features ▪ Continues variable - Non categorical attribute Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 2
  • 3. Example: Bayes’s Theorem ≡ Suppose a certain disease has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). A test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it), but the false positive rate is 5% (that is, about 5% of people who take the test will test positive, even though they do not have the disease). ≡ Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease? Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 3
  • 4. Example: Bayes’s Theorem ≡ The disease has an incidence rate of 0.1%, we could write P(disease) = 0.001 ≡ Everyone who has the disease will test positive, or alternatively everyone who tests negative does not have the disease. (We could also say P(positive | disease) = 1.) ≡ about 5% of people who take the test will test positive, even though they do not have the disease P(positive | no disease) = 0.05.) ≡ Here we want to compute P(disease|positive) Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 4
  • 5. Example: Bayes’s Theorem ≡ First, suppose we randomly select 1000 people and administer the test ≡ Only 1 of 1000 test subjects actually has the disease; the other 999 do not. ≡ We also know that 5% of all people who do not have the disease will test positive. There are 999 disease-free people, so we would expect (0.05)(999) = 49.95 (so, about 50) people to test positive who do not have the disease. ≡ There are 51 people who test positive in our example (the one unfortunate person who actually has the disease, plus the 50 people who tested positive but don’t). Only one of these people has the disease, so ≡ P(disease | positive) ≈ 1/51≈ 0.0196, or less than 2%. ≡ This means that of all people who test positive, over 98% do not have the disease. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 5
  • 6. Example: Bayes’s Theorem ≡ 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑖𝑣𝑒 = 𝑝(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝑑𝑖𝑠𝑒𝑎𝑠𝑒)𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 +𝑝 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝 𝑛𝑜 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ≡ 𝑝 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑝𝑜𝑠𝑖𝑖𝑣𝑒 = 1×0.001 1×0.001+0.05∗0.999 = 0.0196 Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 6
  • 7. Example Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 7
  • 8. Example – One feature Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 8
  • 9. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 9
  • 10. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 10
  • 11. Handling Text and Categorical Attributes Ꚛ Most Machine Learning algorithms prefer to work with numbers anyway, so let’s convert these text labels to numbers. Ꚛ Scikit-Learn provides a transformer for this task called LabelEncoder Ꚛ One issue with this representation is that ML algorithms will assume that two nearby values are more similar than two distant values Ꚛ To fix this issue, a common solution is to create one binary attribute per category: one attribute equal to 1 (and 0 otherwise) ▪ This is called one-hot encoding Ꚛ Scikit-Learn provides a OneHotEncoder encoder to convert integer categorical values into one-hot vectors Ꚛ We can apply both transformations (from text categories to integer categories, then from integer categories to one-hot vectors) in one shot using the LabelBinarizer class
  • 12. Custom Transformers Ꚛ Although Scikit-Learn provides many useful transformers, you will need to write your own for tasks such as custom cleanup operations or combining specific attributes. Ꚛ You will want your transformer to work seamlessly with Scikit-Learn functionalities (such as pipelines) Ꚛ hyperparameter will allow you to easily find out whether adding this attribute helps the Machine Learning algorithms or not. Ꚛ More generally, you can add a hyperparameter to gate any data preparation step that you are not 100% sure about. Ꚛ The more you automate these data preparation steps, the more combinations you can automatically try out, making it much more likely that you will find a great combination (and saving you a lot of time).
  • 14. ▪ Given the dataset below: Using Naïve Bayes classifier, what is the classifier output for this instance Outlook Temp Humidity Windy Play Rainy Hot Normal True ? Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 14
  • 15. Feature Play or Not? Yes Probability No Probability Outlook Sunny 3 3/9 3 3/5 Overcast 4 4/9 0 0 Rainy 2 2/9 2 2/5 Temperature Hot 2 2/9 2 2/5 Mild 4 4/9 2 2/5 Cool 3 3/9 1 1/5 Humidity High 3 3/9 4 4/5 Normal 6 6/9 1 1/5 Windy True 3 3/9 3 3/5 False 6 6/9 2 2/5 Probability 9/14 5/14 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 15
  • 16. ▪ To Avoid ambiguity ≡ The denominator is calculated either in the two cases: Play or Not ≡ So to simplify, we cancel the denominator Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 16
  • 18. ▪ What is the probability that instance with these attributes is classified as Play Outlook Temp Humidity Windy Play Sunny 66 99 True ? Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 18
  • 19. Gaussian Naïve Bayes ≡ So far we’ve seen the computations when the X’s are categorical, but how to compute the probabilities when X is a continuous variable ≡ If we assume that x follows a particular distribution, then you can plug in the probability density function of that distribution to compute the probability of likelihoods ≡ Assume the X’s follows a normal distribution (aka Gaussian) Distribution, which is fairly common, we substitute the corresponding probability density of a normal distribution and call it Gaussian Naïve Bayes 𝑓 𝑥 = 1 2𝜋𝜎 𝑒 − 𝑥−𝑚 2 2𝜎2 Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 19
  • 20. Categorical dataset // Numerical dataset ≡ In the previous example, your attributes are categorical ▪ Sunny / rainy / overcast / true / false / hot / Mild ▪ What about Temperature = 68 → must be converted to probability ≡ Calculate average (m) of these numerical values of each attribute with Target attribute (Yes / No Play) ≡ Calculate the standard deviation (𝜎) ≡ Probability Density function (f) Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 20
  • 21. Outlook Temp Humidity Windy Yes No Yes No Yes No Yes No Sunny 2 3 64, 68, 69, 70, 72 65, 71, 72, 80, 85 65, 70, 70, 75, 80 70, 85, 90, 91, 95 False 6 2 Rainy 3 2 Overcast 4 0 True 3 3 Outlook Temp Humidity Windy Yes No Yes No Yes No Yes No Sunny 2/9 3/5 𝑚 = 68.6 𝜎 = 2.65 𝑚 = 74.6 𝜎 = 7.06 𝑚 = 72 𝜎 = 5.06 𝑚 = 86.2 𝜎 = 8.7 False 6/9 2/5 Rainy 3/9 2/5 Overcast 4/9 0/5 True 3/9 3/5 Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 21
  • 22. Outlook Temp Humidity Windy Yes No Yes No Yes No Yes No Sunny 2 3 64, 68, 69, 70, 72 65, 71, 72, 80, 85 65, 70, 70, 75, 80 70, 85, 90, 91, 95 False 6 2 Rainy 3 2 Overcast 4 0 True 3 3 Outlook Temp Humidity Windy Yes No Yes No Yes No Yes No Sunny 2/9 3/5 𝑚 = 68.6 𝜎 = 2.65 𝑚 = 74.6 𝜎 = 7.06 𝑚 = 72 𝜎 = 5.06 𝑚 = 86.2 𝜎 = 8.7 False 6/9 2/5 Rainy 3/9 2/5 Overcast 4/9 0/5 True 3/9 3/5 Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 22
  • 23. Example: only continuous variables Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 23
  • 24. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 24
  • 25. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 25
  • 26. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 26
  • 28. Pima Indian Dataset ≡ This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. ≡ The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. ≡ Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. ≡ The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on. Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 28
  • 29. Pima Indian Dataset ≡ This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. ≡ BloodPressure: Diastolic blood pressure (mm Hg) ≡ SkinThicknessTriceps: skin fold thickness (mm) ≡ Insulin: 2-Hour serum insulin (mu U/ml) ≡ BMI: Body mass index (weight in kg/(height in m)^2) ≡ DiabetesPedigreeFunction: Diabetes pedigree function ≡ Age: Age (years) ≡ OutcomeClass variable (0 or 1) 268 of 768 are 1, the others are 0 ≡ https://www.kaggle.com/uciml/pima-indians-diabetes-database Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 29
  • 30. Pima Indian Dataset Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 30
  • 31. Python session - Pima Indian Dataset Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 31
  • 32. Python session - Pima Indian Dataset Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 32
  • 33. Python session - Pima Indian Dataset Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 33
  • 34. Python session - Pima Indian Dataset Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 34