This lecture covers Naive Bayes's theorem, how to implement on Python. demonstrated categorical features, continuous features, Examples ranging from 1 feature to 4 features, Continuous feature, and the most well-known Pima India Dataset "Diabetes"
2. Agenda
≡Naïve Bayes’s Theorem
▪ Examples test have a disease
≡Python session
▪ Example
▪ Categorical features
▪ Continues variable - Non categorical attribute
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 2
3. Example: Bayes’s Theorem
≡ Suppose a certain disease has an incidence rate of 0.1% (that is, it
afflicts 0.1% of the population). A test has been devised to detect this
disease. The test does not produce false negatives (that is, anyone
who has the disease will test positive for it), but the false positive
rate is 5% (that is, about 5% of people who take the test will test
positive, even though they do not have the disease).
≡ Suppose a randomly selected person takes the test and tests positive.
What is the probability that this person actually has the disease?
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 3
4. Example: Bayes’s Theorem
≡ The disease has an incidence rate of 0.1%, we could write P(disease)
= 0.001
≡ Everyone who has the disease will test positive, or alternatively
everyone who tests negative does not have the disease. (We could
also say P(positive | disease) = 1.)
≡ about 5% of people who take the test will test positive, even though
they do not have the disease P(positive | no disease) = 0.05.)
≡ Here we want to compute P(disease|positive)
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 4
5. Example: Bayes’s Theorem
≡ First, suppose we randomly select 1000 people and administer the test
≡ Only 1 of 1000 test subjects actually has the disease; the other 999 do not.
≡ We also know that 5% of all people who do not have the disease will test
positive. There are 999 disease-free people, so we would expect
(0.05)(999) = 49.95 (so, about 50) people to test positive who do not have
the disease.
≡ There are 51 people who test positive in our example (the one
unfortunate person who actually has the disease, plus the 50 people who
tested positive but don’t). Only one of these people has the disease, so
≡ P(disease | positive) ≈ 1/51≈ 0.0196, or less than 2%.
≡ This means that of all people who test positive, over 98% do not have the
disease.
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 5
11. Handling Text and Categorical Attributes
Ꚛ Most Machine Learning algorithms prefer to work with numbers anyway, so let’s
convert these text labels to numbers.
Ꚛ Scikit-Learn provides a transformer for this task called LabelEncoder
Ꚛ One issue with this representation is that ML algorithms will assume that two
nearby values are more similar than two distant values
Ꚛ To fix this issue, a common solution is to create one binary attribute per
category: one attribute equal to 1 (and 0 otherwise)
▪ This is called one-hot encoding
Ꚛ Scikit-Learn provides a OneHotEncoder encoder to convert integer categorical
values into one-hot vectors
Ꚛ We can apply both transformations (from text categories to integer categories,
then from integer categories to one-hot vectors) in one shot using the
LabelBinarizer class
12. Custom Transformers
Ꚛ Although Scikit-Learn provides many useful transformers, you will need to
write your own for tasks such as custom cleanup operations or combining
specific attributes.
Ꚛ You will want your transformer to work seamlessly with Scikit-Learn
functionalities (such as pipelines)
Ꚛ hyperparameter will allow you to easily find out whether adding this
attribute helps the Machine Learning algorithms or not.
Ꚛ More generally, you can add a hyperparameter to gate any data
preparation step that you are not 100% sure about.
Ꚛ The more you automate these data preparation steps, the more
combinations you can automatically try out, making it much more likely
that you will find a great combination (and saving you a lot of time).
14. ▪ Given the dataset below: Using Naïve Bayes classifier, what is the
classifier output for this instance
Outlook Temp Humidity Windy Play
Rainy Hot Normal True ?
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Sunny Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 14
15. Feature
Play or Not?
Yes Probability No Probability
Outlook
Sunny 3 3/9 3 3/5
Overcast 4 4/9 0 0
Rainy 2 2/9 2 2/5
Temperature
Hot 2 2/9 2 2/5
Mild 4 4/9 2 2/5
Cool 3 3/9 1 1/5
Humidity
High 3 3/9 4 4/5
Normal 6 6/9 1 1/5
Windy
True 3 3/9 3 3/5
False 6 6/9 2 2/5
Probability 9/14 5/14
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Sunny Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 15
16. ▪ To Avoid ambiguity
≡ The denominator is calculated either in the two cases: Play or Not
≡ So to simplify, we cancel the denominator
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 16
18. ▪ What is the probability that instance with these attributes is classified
as Play
Outlook Temp Humidity Windy Play
Sunny 66 99 True ?
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 18
19. Gaussian Naïve Bayes
≡ So far we’ve seen the computations when the X’s are categorical, but
how to compute the probabilities when X is a continuous variable
≡ If we assume that x follows a particular distribution, then you can
plug in the probability density function of that distribution to
compute the probability of likelihoods
≡ Assume the X’s follows a normal distribution (aka Gaussian)
Distribution, which is fairly common, we substitute the
corresponding probability density of a normal distribution and call it
Gaussian Naïve Bayes
𝑓 𝑥 =
1
2𝜋𝜎
𝑒
−
𝑥−𝑚 2
2𝜎2
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 19
20. Categorical dataset // Numerical dataset
≡ In the previous example, your
attributes are categorical
▪ Sunny / rainy / overcast / true / false /
hot / Mild
▪ What about Temperature = 68 →
must be converted to probability
≡ Calculate average (m) of these
numerical values of each attribute
with Target attribute (Yes / No Play)
≡ Calculate the standard deviation
(𝜎)
≡ Probability Density function (f)
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 20
21. Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2 3 64,
68,
69,
70,
72
65,
71,
72,
80,
85
65,
70,
70,
75,
80
70,
85,
90,
91,
95
False 6 2
Rainy 3 2
Overcast 4 0 True
3 3
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2/9 3/5
𝑚 = 68.6
𝜎 = 2.65
𝑚 = 74.6
𝜎 = 7.06
𝑚 = 72
𝜎 = 5.06
𝑚 = 86.2
𝜎 = 8.7
False 6/9 2/5
Rainy 3/9 2/5
Overcast 4/9 0/5 True 3/9 3/5
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 21
22. Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2 3 64,
68,
69,
70,
72
65,
71,
72,
80,
85
65,
70,
70,
75,
80
70,
85,
90,
91,
95
False 6 2
Rainy 3 2
Overcast 4 0 True
3 3
Outlook Temp Humidity Windy
Yes No Yes No Yes No Yes No
Sunny 2/9 3/5
𝑚 = 68.6
𝜎 = 2.65
𝑚 = 74.6
𝜎 = 7.06
𝑚 = 72
𝜎 = 5.06
𝑚 = 86.2
𝜎 = 8.7
False 6/9 2/5
Rainy 3/9 2/5
Overcast 4/9 0/5 True 3/9 3/5
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 22
23. Example: only continuous variables
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 23
28. Pima Indian Dataset
≡ This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
≡ The objective of the dataset is to diagnostically predict whether or not a
patient has diabetes, based on certain diagnostic measurements included
in the dataset.
≡ Several constraints were placed on the selection of these instances from a
larger database. In particular, all patients here are females at least 21 years
old of Pima Indian heritage.
≡ The datasets consists of several medical predictor variables and one target
variable, Outcome. Predictor variables includes the number of
pregnancies the patient has had, their BMI, insulin level, age, and so on.
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 28
29. Pima Indian Dataset
≡ This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
≡ BloodPressure: Diastolic blood pressure (mm Hg)
≡ SkinThicknessTriceps: skin fold thickness (mm)
≡ Insulin: 2-Hour serum insulin (mu U/ml)
≡ BMI: Body mass index (weight in kg/(height in m)^2)
≡ DiabetesPedigreeFunction: Diabetes pedigree function
≡ Age: Age (years)
≡ OutcomeClass variable (0 or 1) 268 of 768 are 1, the others are 0
≡ https://www.kaggle.com/uciml/pima-indians-diabetes-database
Mostafa A. Elhosseini https://youtube.com/c/mostafaelhosseini 29