This document provides a summary of Lecture 10 on Bayesian decision theory and Naive Bayes machine learning algorithms. It begins with a recap of Lecture 9 on using probability to classify patterns into categories. It then discusses how to apply these probabilistic concepts to both nominal and continuous variables. A medical example is presented to illustrate Bayesian classification. The document concludes by explaining the Naive Bayes algorithm for classification tasks and providing a worked example of how it is trained and makes predictions.
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
Introduction to Statistical Machine Learningmahutte
This course provides a broad introduction to the methods and practice
of statistical machine learning, which is concerned with the development
of algorithms and techniques that learn from observed data by
constructing stochastic models that can be used for making predictions
and decisions. Topics covered include Bayesian inference and maximum
likelihood modeling; regression, classi¯cation, density estimation,
clustering, principal component analysis; parametric, semi-parametric,
and non-parametric models; basis functions, neural networks, kernel
methods, and graphical models; deterministic and stochastic
optimization; over¯tting, regularization, and validation.
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
** Machine Learning Training with Python: https://www.edureka.co/python **
This Edureka tutorial will provide you with a detailed and comprehensive knowledge of the Naive Bayes Classifier Algorithm in python. At the end of the video, you will learn from a demo example on Naive Bayes. Below are the topics covered in this tutorial:
1. What is Naive Bayes?
2. Bayes Theorem and its use
3. Mathematical Working of Naive Bayes
4. Step by step Programming in Naive Bayes
5. Prediction Using Naive Bayes
Check out our playlist for more videos: http://bit.ly/2taym8X
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
Introduction to Statistical Machine Learningmahutte
This course provides a broad introduction to the methods and practice
of statistical machine learning, which is concerned with the development
of algorithms and techniques that learn from observed data by
constructing stochastic models that can be used for making predictions
and decisions. Topics covered include Bayesian inference and maximum
likelihood modeling; regression, classi¯cation, density estimation,
clustering, principal component analysis; parametric, semi-parametric,
and non-parametric models; basis functions, neural networks, kernel
methods, and graphical models; deterministic and stochastic
optimization; over¯tting, regularization, and validation.
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
** Machine Learning Training with Python: https://www.edureka.co/python **
This Edureka tutorial will provide you with a detailed and comprehensive knowledge of the Naive Bayes Classifier Algorithm in python. At the end of the video, you will learn from a demo example on Naive Bayes. Below are the topics covered in this tutorial:
1. What is Naive Bayes?
2. Bayes Theorem and its use
3. Mathematical Working of Naive Bayes
4. Step by step Programming in Naive Bayes
5. Prediction Using Naive Bayes
Check out our playlist for more videos: http://bit.ly/2taym8X
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Efficient end-to-end learning for quantizable representationsNAVER Engineering
발표자: 정연우(서울대 박사과정)
발표일: 2018.7.
유사한 이미지 검색을 위해 neural network를 이용해 이미지의 embedding을 학습시킨다. 기존 연구에서는 검색 속도 증가를 위해 binary code의 hamming distance를 활용하지만 여전히 전체 데이터 셋을 검색해야 하며 정확도가 떨어지는 다는 단점이 있다. 이 논문에서는 sparse한 binary code를 학습하여 검색 정확도가 떨어지지 않으면서 검색 속도도 향상시키는 해쉬 테이블을 생성한다. 또한 mini-batch 상에서 optimal한 sparse binary code를 minimum cost flow problem을 통해 찾을 수 있음을 보였다. 우리의 방법은 Cifar-100과 ImageNet에서 precision@k, NMI에서 최고의 검색 정확도를 보였으며 각각 98× 와 478×의 검색 속도 증가가 있었다.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
1. Introduction to Machine
Learning
Lecture 10
Bayesian decision theory – Naïve Bayes
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
2. Recap of Lecture 9
Outputs the most probable hypothesis h∈H, given the data D +
knowledge about prior probabilities of hypotheses in H
Terminology:
P(h|D): probability that h holds given data D. Posterior probability of h;
confidence that h holds given D.
P(h): prior probability of h (background knowledge we have about that h is a
correct hypothesis)
P(D): prior probability that training data D will be observed
P(D|h): probability of observing D given h holds
P (D | h )P (h )
P (h | D ) =
P (D )
Slide 2
Artificial Intelligence Machine Learning
3. Bayes’ Theorem
Given H the space of possible hypothesis
The
Th most probable h
b bl hypothesis i the one that maximizes P(h|D)
h i is h h ii P(h|D):
P (D | h )P (h )
hMAP ≡ arg max P (h | D ) = arg max = arg max P (D | h )P (h )
P (D )
h∈H
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Bayesian Decision Theory
y y
Nominal Variables
Continuous Variables
A Medical Example
Naïve Bayes
Slide 4
Artificial Intelligence Introduction to C++
5. Bayesian Decision Theory
Statistical approach to pattern classification
pp p
Forget about rule-based and tree-based models
We will express the problem in probabilistic terms
Goal
Classify a pattern x into one of the two classes w1 or w2 to minimize
the probability of misclassification P(error)
Prior
P i probability
b bilit
P(x) = Fraction of times that x belongs to class wk
Without more information, we have to classify a new example
x’. What should we do?
if P ( w1 ) > P ( w2 )
⎧w1 The best option if we know
class of x = ⎨ nothing else about the domain!
⎩w 2 otherwise
Slide 5
Artificial Intelligence Machine Learning
6. Bayesian Decision Theory
Now, we measure a feature of each example x
, p
Threshold θ
How we should classify these data?
As the classes overlap, x1 cannot perfectly discriminate
y
At the end, I want my algorithm to put a threshold that defines
the class boundaryy
Slide 6
Artificial Intelligence Machine Learning
7. Bayesian Decision Theory
Let’s dd
L t’ add a second feature
df t
How we should classify these data?
An oblique line will be a good discriminant
So the problem turns out to be: How can we build or simulate
this oblique line?
Slide 7
Artificial Intelligence Machine Learning
8. Bayesian Decision Theory
Assume that xi are nominal variables with possible values
{xi1, xi2, …, xin}
Let’s build a table of number of occurrences
P(w1,xi1) = 1/8
x
Xi1 Xi2 Xin Total P(w1) = 4/8
4
W1 1 3 0
P(xi1| w1) = 1/4
4
W2 0 2 2
Join probability P(wk,xij): Probability of a pattern having value xij for
variable xi and belonging to class wk. That is, the value of each cell divided
by the total number of examples.
examples
Priors P(wk): Marginal sums of each row
Conditional P(xij| k) P b bilit th t a pattern h a value xij given th t it
C diti l P( |w ): Probability that tt has l i that
belongs to class wk. That is, each cell divided by the sum of each row.
Slide 8
Artificial Intelligence Machine Learning
9. Bayesian Decision Theory
Recall that recognizing that P(A,B)=P(B|A)P(A) = P(A|B)P(B)
g g (,) (|)() (|)()
P ( wk , xij ) = P ( xij | wk ) P ( wk )
P ( wk , xijj ) = P ( wk | xijj ) P ( xijj )
We have all these values
Therefore
P ( xij | wk ) P ( wk )
P ( wk | xij ) =
P ( xij )
And the class:
class of x =arg max k =1, 2 P ( wk | xij )
Slide 9
Artificial Intelligence Machine Learning
10. Bayesian Decision Theory
From nominal to continuous attributes
From probability mass functions to probability density functions
(
(PDFs)
s)
b
P ( x ∈ [a, b]) =∫ p ( x)dx where ∫ p(x)dx =1
a X
As well, we have class-conditional PDFs p(x, wk)
If we have d random variables x = ( 1, …, xd)
e a e a do a ab es (x ,
r rr
P( x ∈ R ) =∫ p ( x )dx
R
Slide 10
Artificial Intelligence Machine Learning
11. Naïve Bayes
But step down… I still need to learn the probabilities from data
p p
described by
Nominal attributes
Continuous attributes
That is,
is
Given a new instance with attributes (a1,a2,…,an), the Bayesian
approach classifies it to the most probable value vMAP
v MAP = arg max P (v j | a1, a2 ,..., an )
v j ∈V
Using Bayes’ theorem:
P (a1, a2 ,..., an | v j )P (v j )
v MAP = arg max = arg max P (a1, a2 ,..., an | v j )P (v j )
P (a1, a2 ,..., an )
v j ∈V v j ∈V
How to compute P(vj) and P(a1,a2,…,an|vj) ?
a a
Slide 11
Artificial Intelligence Machine Learning
12. Naïve Bayes
How to compute P(vj)?
p ()
P(vj): counting the frequency with which each target value vj occurs in
the training data.
How to compute P(a1,a2,…,an|vj) ?
P(a1,a2,…,an|vj) : we should have a very large dataset. The number of these
terms=number of possible instances times the number of possible target values
(infeasible).
(i f ibl )
Simplifying assumption: the attribute values are conditionally independent
given the target value. I.e., the probability of observing (a1,a2,…,an) is the
product of the probabilities for the individual attributes.
Slide 12
Artificial Intelligence Machine Learning
13. Naïve Bayes
Prediction of Naïve Bayes classifier:
v NB = arg max P (v j )∏ P (ai |v j )
v j ∈V i
The learning algorithm:
gg
Training:
Estimate the probabilities P(vj) and P(ai|vj) based on their frequencies over the
training data
Output after training:
The l
Th learned hypothesis consists of the set of estimates
dh th i i t f th t f ti t
Test:
Use formula above to classify new instances
Observations:
Number of distinct P(ai|vj) terms =number of distinct attribute values times the
number
number of distinct target values
The algorithm does not p
g perform an explicit search through the space of
p g p
possible hypothesis (the space of possible hypotheses is the space of possible
values that can be assigned to the various probabilities).
Slide 13
Artificial Intelligence Machine Learning
14. Example
Given the training examples:
g p
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny
S Hot
Ht High
Hi h Strong
St No
N
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Classify the new instance:
(Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)
Slide 14
Artificial Intelligence Machine Learning
15. Example
Naive Bayes training
Sunny|Yes 2/9 Sunny|No 3/5
Outlook|Yes Overcast|Yes 4/9 Outlook|No Overcast|No 0
Rain|Yes 3/9 Rain|No 2/5
Hot|Yes 2/9 Hot 2/5
Temperature|yYes Mild|Yes 4/9 Temperature|No Mild 2/5
Cool|Yes
| 3/9 Cool 1/5
High 3/9 High 4/5
Humidity|Yes Humidity|No
Normal 6/9 Normal 1/5
Weak 6/9 Weak 2/5
Wind|Yes Wind|Yes
Strong 3/9 Strong 3/5
P(Yes)=9/14
P(No)=5/14
()
Test:
Classify (Outlook=sunny, Temp cool, Humidity high, Wind strong)
(Outlook sunny, Temp=cool, Humidity=high, Wind=strong)
max { 9/14·2/9·3/9·3/9·3/9, 5/14·3/5·1/5·4/5·3/5} = {.0053, .0206} = 0.0206
Do not play tennis!
Slide 15
Artificial Intelligence Machine Learning
16. Estimation of Probabilities
The explained process to estimate probabilities could lead to poor
estimate if the number of observations is small
E.g.: P( Outlook=overcast| No) = 0.008, but we only have 5 examples
Use the following estimate
nc + mp
n+m
where p is the prior estimate of the probability we wish to determine
m : constant, equivalent sample size, which determines the weightg
assigned to the observed data
Assuming uniform distribution, p=1/k, being k the number of values
of th attribute.
f the tt ib t
E.g., P(Outlook=overcast | No):
nc + mp 0 + 1/ 3·2
=
n+m 5+2
Slide 16
Artificial Intelligence Machine Learning
17. Next Class
Neural Networks and Support Vector Machines
Slide 17
Artificial Intelligence Introduction to C++
18. Introduction to Machine
Learning
Lecture 10
Bayesian decision theory – Naïve Bayes
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull