SlideShare a Scribd company logo
Introduction to Machine
       Learning
                     Lecture 10
  Bayesian decision theory – Naïve Bayes

                   Albert Orriols i Puig
                  aorriols@salle.url.edu
                      i l @ ll       ld

         Artificial Intelligence – Machine Learning
             Enginyeria i Arquitectura La Salle
                 gy           q
                    Universitat Ramon Llull
Recap of Lecture 9
        Outputs the most probable hypothesis h∈H, given the data D +
        knowledge about prior probabilities of hypotheses in H
        Terminology:
                P(h|D): probability that h holds given data D. Posterior probability of h;
                confidence that h holds given D.
                P(h): prior probability of h (background knowledge we have about that h is a
                correct hypothesis)
                P(D): prior probability that training data D will be observed
                P(D|h): probability of observing D given h holds



                                           P (D | h )P (h )
                              P (h | D ) =
                                               P (D )



                                                                                             Slide 2
Artificial Intelligence                       Machine Learning
Bayes’ Theorem

           Given H the space of possible hypothesis
           The
           Th most probable h
                      b bl hypothesis i the one that maximizes P(h|D)
                                h i is h         h      ii     P(h|D):




                                          P (D | h )P (h )
hMAP ≡ arg max P (h | D ) = arg max                        = arg max P (D | h )P (h )
                                              P (D )
                   h∈H




                                                                              Slide 3
  Artificial Intelligence           Machine Learning
Today’s Agenda


        Bayesian Decision Theory
          y                    y
                Nominal Variables
                Continuous Variables
        A Medical Example
        Naïve Bayes



                                                    Slide 4
Artificial Intelligence       Introduction to C++
Bayesian Decision Theory
        Statistical approach to pattern classification
                     pp         p
                Forget about rule-based and tree-based models
                We will express the problem in probabilistic terms
        Goal
                Classify a pattern x into one of the two classes w1 or w2 to minimize
                the probability of misclassification P(error)
                Prior
                P i probability
                       b bilit
                          P(x) = Fraction of times that x belongs to class wk
        Without more information, we have to classify a new example
        x’. What should we do?

                                     if P ( w1 ) > P ( w2 )
                ⎧w1                                                The best option if we know
   class of x = ⎨                                                nothing else about the domain!
                ⎩w 2                 otherwise

                                                                                       Slide 5
Artificial Intelligence                       Machine Learning
Bayesian Decision Theory
        Now, we measure a feature of each example x
           ,                                  p
                                             Threshold θ




                How we should classify these data?
                          As the classes overlap, x1 cannot perfectly discriminate
                                                                    y
                At the end, I want my algorithm to put a threshold that defines
                the class boundaryy

                                                                                     Slide 6
Artificial Intelligence                      Machine Learning
Bayesian Decision Theory
      Let’s dd
      L t’ add a second feature
                      df t




               How we should classify these data?
                          An oblique line will be a good discriminant
               So the problem turns out to be: How can we build or simulate
               this oblique line?

                                                                          Slide 7
Artificial Intelligence                       Machine Learning
Bayesian Decision Theory
        Assume that xi are nominal variables with possible values
        {xi1, xi2, …, xin}
        Let’s build a table of number of occurrences
                                                                         P(w1,xi1) = 1/8
                                                                              x
                                  Xi1    Xi2               Xin   Total   P(w1) = 4/8
                                                                  4
                            W1    1      3                 0
                                                                         P(xi1| w1) = 1/4
                                                                  4
                            W2    0      2                 2



        Join probability P(wk,xij): Probability of a pattern having value xij for
        variable xi and belonging to class wk. That is, the value of each cell divided
        by the total number of examples.
                               examples
        Priors P(wk): Marginal sums of each row
        Conditional P(xij| k) P b bilit th t a pattern h a value xij given th t it
        C diti      l P( |w ): Probability that      tt   has     l      i    that
        belongs to class wk. That is, each cell divided by the sum of each row.



                                                                                            Slide 8
Artificial Intelligence                 Machine Learning
Bayesian Decision Theory
        Recall that recognizing that P(A,B)=P(B|A)P(A) = P(A|B)P(B)
                        g     g       (,) (|)()           (|)()

            P ( wk , xij ) = P ( xij | wk ) P ( wk )

            P ( wk , xijj ) = P ( wk | xijj ) P ( xijj )
                                                              We have all these values

        Therefore
                               P ( xij | wk ) P ( wk )
           P ( wk | xij ) =
                                       P ( xij )

        And the class:

                            class of x =arg max k =1, 2 P ( wk | xij )
                                                                                     Slide 9
Artificial Intelligence                    Machine Learning
Bayesian Decision Theory
        From nominal to continuous attributes
                From probability mass functions to probability density functions
                (
                (PDFs)
                    s)
                                 b
                P ( x ∈ [a, b]) =∫ p ( x)dx where ∫ p(x)dx =1
                                 a                        X



                As well, we have class-conditional PDFs p(x, wk)
                If we have d random variables x = ( 1, …, xd)
                    e a e a do       a ab es      (x    ,

                                r              rr
                             P( x ∈ R ) =∫ p ( x )dx
                                            R



                                                                            Slide 10
Artificial Intelligence                Machine Learning
Naïve Bayes
        But step down… I still need to learn the probabilities from data
               p                                 p
        described by
                Nominal attributes
                Continuous attributes
        That is,
             is
                Given a new instance with attributes (a1,a2,…,an), the Bayesian
                approach classifies it to the most probable value vMAP

                               v MAP = arg max P (v j | a1, a2 ,..., an )
                                               v j ∈V
                Using Bayes’ theorem:
                                   P (a1, a2 ,..., an | v j )P (v j )
              v MAP = arg max                                         = arg max P (a1, a2 ,..., an | v j )P (v j )
                                        P (a1, a2 ,..., an )
                          v j ∈V                                          v j ∈V

                How to compute P(vj) and P(a1,a2,…,an|vj) ?
                                              a    a

                                                                                                               Slide 11
Artificial Intelligence                             Machine Learning
Naïve Bayes
        How to compute P(vj)?
                  p     ()
                P(vj): counting the frequency with which each target value vj occurs in
                the training data.


        How to compute P(a1,a2,…,an|vj) ?
                P(a1,a2,…,an|vj) : we should have a very large dataset. The number of these
                terms=number of possible instances times the number of possible target values
                (infeasible).
                (i f   ibl )
                Simplifying assumption: the attribute values are conditionally independent
                given the target value. I.e., the probability of observing (a1,a2,…,an) is the
                product of the probabilities for the individual attributes.




                                                                                                 Slide 12
Artificial Intelligence                       Machine Learning
Naïve Bayes
        Prediction of Naïve Bayes classifier:
                                      v NB = arg max P (v j )∏ P (ai |v j )
                                                v j ∈V              i
        The learning algorithm:
                   gg
                Training:
                          Estimate the probabilities P(vj) and P(ai|vj) based on their frequencies over the
                          training data

                Output after training:
                          The l
                          Th learned hypothesis consists of the set of estimates
                                   dh    th i       i t f th      t f ti t

                Test:
                          Use formula above to classify new instances

        Observations:
                Number of distinct P(ai|vj) terms =number of distinct attribute values times the
                                                   number
                number of distinct target values
                The algorithm does not p
                      g                 perform an explicit search through the space of
                                                      p                  g      p
                possible hypothesis (the space of possible hypotheses is the space of possible
                values that can be assigned to the various probabilities).
                                                                                                     Slide 13
Artificial Intelligence                              Machine Learning
Example
        Given the training examples:
                         g     p
                   Day    Outlook    Temperature      Humidity   Wind     PlayTennis
                   D1      Sunny        Hot            High      Weak         No
                   D2      Sunny
                           S            Hot
                                        Ht             High
                                                       Hi h      Strong
                                                                 St           No
                                                                              N
                   D3     Overcast      Hot            High      Weak        Yes
                   D4       Rain        Mild           High      Weak        Yes
                   D5       Rain        Cool          Normal     Weak        Yes
                   D6       Rain        Cool          Normal     Strong       No
                   D7     Overcast      Cool          Normal     Strong      Yes
                   D8      Sunny        Mild           High      Weak         No
                   D9      Sunny        Cool          Normal     Weak        Yes
                   D10      Rain        Mild          Normal     Weak        Yes
                   D11     Sunny        Mild          Normal     Strong      Yes
                   D12    Overcast      Mild           High      Strong      Yes
                   D13    Overcast      Hot           Normal     Weak        Yes
                   D14      Rain        Mild           High      Strong       No



        Classify the new instance:
                (Outlook=sunny, Temp=cool, Humidity=high, Wind=strong)
                                                                                       Slide 14
Artificial Intelligence                  Machine Learning
Example
        Naive Bayes training

                                           Sunny|Yes      2/9                    Sunny|No      3/5
                          Outlook|Yes      Overcast|Yes   4/9     Outlook|No     Overcast|No   0
                                           Rain|Yes       3/9                    Rain|No       2/5
                                           Hot|Yes        2/9                    Hot           2/5
                          Temperature|yYes Mild|Yes       4/9     Temperature|No Mild          2/5
                                           Cool|Yes
                                                |         3/9                    Cool          1/5
                                           High           3/9                    High          4/5
                          Humidity|Yes                            Humidity|No
                                           Normal         6/9                    Normal        1/5
                                           Weak           6/9                    Weak          2/5
                          Wind|Yes                                Wind|Yes
                                           Strong         3/9                    Strong        3/5


          P(Yes)=9/14

          P(No)=5/14
           ()

          Test:
          Classify (Outlook=sunny, Temp cool, Humidity high, Wind strong)
                   (Outlook sunny, Temp=cool, Humidity=high, Wind=strong)

          max { 9/14·2/9·3/9·3/9·3/9, 5/14·3/5·1/5·4/5·3/5} = {.0053, .0206} = 0.0206

          Do not play tennis!

                                                                                                     Slide 15
Artificial Intelligence                        Machine Learning
Estimation of Probabilities
        The explained process to estimate probabilities could lead to poor
        estimate if the number of observations is small
                E.g.: P( Outlook=overcast| No) = 0.008, but we only have 5 examples

        Use the following estimate
                    nc + mp
                     n+m
                    where p is the prior estimate of the probability we wish to determine
                    m : constant, equivalent sample size, which determines the weightg
                          assigned to the observed data
        Assuming uniform distribution, p=1/k, being k the number of values
        of th attribute.
         f the tt ib t
        E.g., P(Outlook=overcast | No):

                                   nc + mp 0 + 1/ 3·2
                                          =
                                    n+m      5+2

                                                                                            Slide 16
Artificial Intelligence                         Machine Learning
Next Class



        Neural Networks and Support Vector Machines




                                                      Slide 17
Artificial Intelligence     Introduction to C++
Introduction to Machine
       Learning
                     Lecture 10
  Bayesian decision theory – Naïve Bayes

                   Albert Orriols i Puig
                  aorriols@salle.url.edu
                      i l @ ll       ld

         Artificial Intelligence – Machine Learning
             Enginyeria i Arquitectura La Salle
                 gy           q
                    Universitat Ramon Llull

More Related Content

What's hot

Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Bayes Theorem.pdf
Bayes Theorem.pdfBayes Theorem.pdf
Bayes Theorem.pdf
Nirmalavenkatachalam
 
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeBayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Dhivyaa C.R
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
mahutte
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
Vignesh Saravanan
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Edureka!
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
omaraldabash
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 

What's hot (20)

Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Decision tree
Decision treeDecision tree
Decision tree
 
Bayes Theorem.pdf
Bayes Theorem.pdfBayes Theorem.pdf
Bayes Theorem.pdf
 
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering CollegeBayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
Bayesian Learning by Dr.C.R.Dhivyaa Kongu Engineering College
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 

Similar to Lecture10 - Naïve Bayes

06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
Andres Mendez-Vazquez
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
Mark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
Mark Chang
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
KhushiDuttVatsa
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
OmDalvi4
 
bayesNaive algorithm in machine learning
bayesNaive algorithm in machine learningbayesNaive algorithm in machine learning
bayesNaive algorithm in machine learning
Kumari Naveen
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
岳華 杜
 
Naive Bayes for the Superbowl
Naive Bayes for the SuperbowlNaive Bayes for the Superbowl
Naive Bayes for the Superbowl
John Liu
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
original
originaloriginal
originalbutest
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
mehdi Cherti
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Feynman Liang
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
Albert Bifet
 
Parallel Bayesian Optimization
Parallel Bayesian OptimizationParallel Bayesian Optimization
Parallel Bayesian Optimization
Sri Ambati
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
Christian Robert
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
Christian Robert
 

Similar to Lecture10 - Naïve Bayes (20)

Bayes 6
Bayes 6Bayes 6
Bayes 6
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
 
bayesNaive.ppt
bayesNaive.pptbayesNaive.ppt
bayesNaive.ppt
 
bayesNaive algorithm in machine learning
bayesNaive algorithm in machine learningbayesNaive algorithm in machine learning
bayesNaive algorithm in machine learning
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Naive Bayes for the Superbowl
Naive Bayes for the SuperbowlNaive Bayes for the Superbowl
Naive Bayes for the Superbowl
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
original
originaloriginal
original
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Parallel Bayesian Optimization
Parallel Bayesian OptimizationParallel Bayesian Optimization
Parallel Bayesian Optimization
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 

More from Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

Lecture10 - Naïve Bayes

  • 1. Introduction to Machine Learning Lecture 10 Bayesian decision theory – Naïve Bayes Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  • 2. Recap of Lecture 9 Outputs the most probable hypothesis h∈H, given the data D + knowledge about prior probabilities of hypotheses in H Terminology: P(h|D): probability that h holds given data D. Posterior probability of h; confidence that h holds given D. P(h): prior probability of h (background knowledge we have about that h is a correct hypothesis) P(D): prior probability that training data D will be observed P(D|h): probability of observing D given h holds P (D | h )P (h ) P (h | D ) = P (D ) Slide 2 Artificial Intelligence Machine Learning
  • 3. Bayes’ Theorem Given H the space of possible hypothesis The Th most probable h b bl hypothesis i the one that maximizes P(h|D) h i is h h ii P(h|D): P (D | h )P (h ) hMAP ≡ arg max P (h | D ) = arg max = arg max P (D | h )P (h ) P (D ) h∈H Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Bayesian Decision Theory y y Nominal Variables Continuous Variables A Medical Example Naïve Bayes Slide 4 Artificial Intelligence Introduction to C++
  • 5. Bayesian Decision Theory Statistical approach to pattern classification pp p Forget about rule-based and tree-based models We will express the problem in probabilistic terms Goal Classify a pattern x into one of the two classes w1 or w2 to minimize the probability of misclassification P(error) Prior P i probability b bilit P(x) = Fraction of times that x belongs to class wk Without more information, we have to classify a new example x’. What should we do? if P ( w1 ) > P ( w2 ) ⎧w1 The best option if we know class of x = ⎨ nothing else about the domain! ⎩w 2 otherwise Slide 5 Artificial Intelligence Machine Learning
  • 6. Bayesian Decision Theory Now, we measure a feature of each example x , p Threshold θ How we should classify these data? As the classes overlap, x1 cannot perfectly discriminate y At the end, I want my algorithm to put a threshold that defines the class boundaryy Slide 6 Artificial Intelligence Machine Learning
  • 7. Bayesian Decision Theory Let’s dd L t’ add a second feature df t How we should classify these data? An oblique line will be a good discriminant So the problem turns out to be: How can we build or simulate this oblique line? Slide 7 Artificial Intelligence Machine Learning
  • 8. Bayesian Decision Theory Assume that xi are nominal variables with possible values {xi1, xi2, …, xin} Let’s build a table of number of occurrences P(w1,xi1) = 1/8 x Xi1 Xi2 Xin Total P(w1) = 4/8 4 W1 1 3 0 P(xi1| w1) = 1/4 4 W2 0 2 2 Join probability P(wk,xij): Probability of a pattern having value xij for variable xi and belonging to class wk. That is, the value of each cell divided by the total number of examples. examples Priors P(wk): Marginal sums of each row Conditional P(xij| k) P b bilit th t a pattern h a value xij given th t it C diti l P( |w ): Probability that tt has l i that belongs to class wk. That is, each cell divided by the sum of each row. Slide 8 Artificial Intelligence Machine Learning
  • 9. Bayesian Decision Theory Recall that recognizing that P(A,B)=P(B|A)P(A) = P(A|B)P(B) g g (,) (|)() (|)() P ( wk , xij ) = P ( xij | wk ) P ( wk ) P ( wk , xijj ) = P ( wk | xijj ) P ( xijj ) We have all these values Therefore P ( xij | wk ) P ( wk ) P ( wk | xij ) = P ( xij ) And the class: class of x =arg max k =1, 2 P ( wk | xij ) Slide 9 Artificial Intelligence Machine Learning
  • 10. Bayesian Decision Theory From nominal to continuous attributes From probability mass functions to probability density functions ( (PDFs) s) b P ( x ∈ [a, b]) =∫ p ( x)dx where ∫ p(x)dx =1 a X As well, we have class-conditional PDFs p(x, wk) If we have d random variables x = ( 1, …, xd) e a e a do a ab es (x , r rr P( x ∈ R ) =∫ p ( x )dx R Slide 10 Artificial Intelligence Machine Learning
  • 11. Naïve Bayes But step down… I still need to learn the probabilities from data p p described by Nominal attributes Continuous attributes That is, is Given a new instance with attributes (a1,a2,…,an), the Bayesian approach classifies it to the most probable value vMAP v MAP = arg max P (v j | a1, a2 ,..., an ) v j ∈V Using Bayes’ theorem: P (a1, a2 ,..., an | v j )P (v j ) v MAP = arg max = arg max P (a1, a2 ,..., an | v j )P (v j ) P (a1, a2 ,..., an ) v j ∈V v j ∈V How to compute P(vj) and P(a1,a2,…,an|vj) ? a a Slide 11 Artificial Intelligence Machine Learning
  • 12. Naïve Bayes How to compute P(vj)? p () P(vj): counting the frequency with which each target value vj occurs in the training data. How to compute P(a1,a2,…,an|vj) ? P(a1,a2,…,an|vj) : we should have a very large dataset. The number of these terms=number of possible instances times the number of possible target values (infeasible). (i f ibl ) Simplifying assumption: the attribute values are conditionally independent given the target value. I.e., the probability of observing (a1,a2,…,an) is the product of the probabilities for the individual attributes. Slide 12 Artificial Intelligence Machine Learning
  • 13. Naïve Bayes Prediction of Naïve Bayes classifier: v NB = arg max P (v j )∏ P (ai |v j ) v j ∈V i The learning algorithm: gg Training: Estimate the probabilities P(vj) and P(ai|vj) based on their frequencies over the training data Output after training: The l Th learned hypothesis consists of the set of estimates dh th i i t f th t f ti t Test: Use formula above to classify new instances Observations: Number of distinct P(ai|vj) terms =number of distinct attribute values times the number number of distinct target values The algorithm does not p g perform an explicit search through the space of p g p possible hypothesis (the space of possible hypotheses is the space of possible values that can be assigned to the various probabilities). Slide 13 Artificial Intelligence Machine Learning
  • 14. Example Given the training examples: g p Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny S Hot Ht High Hi h Strong St No N D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Classify the new instance: (Outlook=sunny, Temp=cool, Humidity=high, Wind=strong) Slide 14 Artificial Intelligence Machine Learning
  • 15. Example Naive Bayes training Sunny|Yes 2/9 Sunny|No 3/5 Outlook|Yes Overcast|Yes 4/9 Outlook|No Overcast|No 0 Rain|Yes 3/9 Rain|No 2/5 Hot|Yes 2/9 Hot 2/5 Temperature|yYes Mild|Yes 4/9 Temperature|No Mild 2/5 Cool|Yes | 3/9 Cool 1/5 High 3/9 High 4/5 Humidity|Yes Humidity|No Normal 6/9 Normal 1/5 Weak 6/9 Weak 2/5 Wind|Yes Wind|Yes Strong 3/9 Strong 3/5 P(Yes)=9/14 P(No)=5/14 () Test: Classify (Outlook=sunny, Temp cool, Humidity high, Wind strong) (Outlook sunny, Temp=cool, Humidity=high, Wind=strong) max { 9/14·2/9·3/9·3/9·3/9, 5/14·3/5·1/5·4/5·3/5} = {.0053, .0206} = 0.0206 Do not play tennis! Slide 15 Artificial Intelligence Machine Learning
  • 16. Estimation of Probabilities The explained process to estimate probabilities could lead to poor estimate if the number of observations is small E.g.: P( Outlook=overcast| No) = 0.008, but we only have 5 examples Use the following estimate nc + mp n+m where p is the prior estimate of the probability we wish to determine m : constant, equivalent sample size, which determines the weightg assigned to the observed data Assuming uniform distribution, p=1/k, being k the number of values of th attribute. f the tt ib t E.g., P(Outlook=overcast | No): nc + mp 0 + 1/ 3·2 = n+m 5+2 Slide 16 Artificial Intelligence Machine Learning
  • 17. Next Class Neural Networks and Support Vector Machines Slide 17 Artificial Intelligence Introduction to C++
  • 18. Introduction to Machine Learning Lecture 10 Bayesian decision theory – Naïve Bayes Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull