SlideShare a Scribd company logo
1 of 10
Download to read offline
Note to other teachers and users of these slides. Andrew would
                                                         be delighted if you found this source material useful in giving
                                                         your own lectures. Feel free to use these slides verbatim, or to
                                                         modify them to fit your own needs. PowerPoint originals are
                                                         available. If you make use of a significant portion of these sli des
                                                         in your own lecture, please include this message, or the
                                                         following link to the source repository of Andrew’s tutorials:
                                                         http://www.cs.cmu.edu/~awm/tutorials . Comments and
                                                         corrections gratefully received.




                      PAC-learning
                                    Andrew W. Moore
                                   Associate Professor
                                School of Computer Science
                                Carnegie Mellon University
                                     www.cs.cmu.edu/~awm
                                        awm@cs.cmu.edu
                                          412-268-7599




         Copyright © 2001, Andrew W. Moore                                             Nov 30th, 2001




   Probably Approximately Correct
           (PAC) Learning
• Imagine we’re doing classification with categorical
  inputs.
• All inputs and outputs are binary.
• Data is noiseless.
• There’s a machine f(x,h) which has H possible
  settings (a.k.a. hypotheses), called h1, h 2 .. hH.




Copyright © 2001, Andrew W. Moore                                                             PAC-learning: Slide 2




                                                                                                                                1
Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
  complete set of hypotheses in f?



Copyright © 2001, Andrew W. Moore                                 PAC-learning: Slide 3




                   Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are 3 attributes, what is the
  complete set of hypotheses in f? (H = 8)
              True                  X2        X3        X2 ^ X3

              X1                    X1 ^ X2   X1 ^ X3   X1 ^ X2 ^ X3
Copyright © 2001, Andrew W. Moore                                 PAC-learning: Slide 4




                                                                                          2
And-Positive-Literals Machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
  hypotheses in f?



Copyright © 2001, Andrew W. Moore            PAC-learning: Slide 5




       And-Positive-Literals Machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Example hypotheses:
• X1 ^ X3 ^ X19
• X3 ^ X18
• X7
• X1 ^ X2 ^ X2 ^ x4 … ^ Xm
• Question: if there are m attributes, how many
  hypotheses in f? (H = 2m)



Copyright © 2001, Andrew W. Moore            PAC-learning: Slide 6




                                                                     3
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
  contain only logical ands.
• Example hypotheses:
• X1 ^ ~X3 ^ X19
• X3 ^ ~X18
• ~X7
• X1 ^ X2 ^ ~X3 ^ … ^ Xm
• Question: if there are 2
  attributes, what is the
  complete set of hypotheses
  in f?

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 7




                     And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True       True
  contain only logical ands.
• Example hypotheses:               True       X2
• X1 ^ ~X3 ^ X19                    True       ~X2
• X3 ^ ~X18                         X1         True
• ~X7                               X1     ^   X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^   ~X2
• Question: if there are 2          ~X1        True
  attributes, what is the
                                    ~X1    ^   X2
  complete set of hypotheses
                                    ~X1    ^   ~X2
  in f? (H = 9)

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 8




                                                                       4
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True        True
  contain only logical ands.
• Example hypotheses:               True        X2
• X1 ^ ~X3 ^ X19                    True        ~X2
• X3 ^ ~X18                         X1          True
• ~X7                               X1     ^    X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^    ~X2
• Question: if there are m          ~X1         True
  attributes, what is the size of
                                    ~X1    ^    X2
  the complete set of
                                    ~X1    ^    ~X2
  hypotheses in f?

Copyright © 2001, Andrew W. Moore               PAC-learning: Slide 9




                     And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
                                    True        True
  contain only logical ands.
• Example hypotheses:               True        X2
• X1 ^ ~X3 ^ X19                    True        ~X2
• X3 ^ ~X18                         X1          True
• ~X7                               X1     ^    X2
• X1 ^ X2 ^ ~X3 ^ … ^ Xm            X1     ^    ~X2
• Question: if there are m          ~X1         True
  attributes, what is the size of
                                    ~X1    ^    X2
  the complete set of
                                    ~X1    ^    ~X2
  hypotheses in f? (H = 3m)

Copyright © 2001, Andrew W. Moore              PAC-learning: Slide 10




                                                                        5
Lookup Table Machine
• f(x,h) consists of all truth              X1   X2   X3   X4   Y

  tables mapping combinations               0    0    0    0    0
                                            0    0    0    1    1
  of input attributes to true               0    0    1    0    1
  and false                                 0    0    1    1    0
                                            0    1    0    0    1
• Example hypothesis:                       0    1    0    1    0
                                            0    1    1    0    0
                                            0    1    1    1    1
• Question: if there are m                  1    0    0    0    0

  attributes, what is the size of           1    0    0    1    0

  the complete set of                       1    0    1    0    0
                                            1    0    1    1    1
  hypotheses in f?                          1    1    0    0    0
                                            1    1    0    1    0
                                            1    1    1    0    0
                                            1    1    1    1    0




Copyright © 2001, Andrew W. Moore                                   PAC-learning: Slide 11




                  Lookup Table Machine
• f(x,h) consists of all truth              X1   X2   X3   X4   Y

  tables mapping combinations               0    0    0    0    0
                                            0    0    0    1    1
  of input attributes to true               0    0    1    0    1
  and false                                 0    0    1    1    0
                                            0    1    0    0    1
• Example hypothesis:                       0    1    0    1    0
                                            0    1    1    0    0
                                            0    1    1    1    1
• Question: if there are m                  1    0    0    0    0

  attributes, what is the size of           1    0    0    1    0

  the complete set of                       1    0    1    0    0
                                            1    0    1    1    1
  hypotheses in f?                          1    1    0    0    0
                                            1    1    0    1    0




            H =2
                                        m   1    1    1    0    0
                                    2       1    1    1    1    0




Copyright © 2001, Andrew W. Moore                                   PAC-learning: Slide 12




                                                                                             6
A Game
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
     1.Vector of inputs x k = (x k1 ,x k2 , x km) is
       drawn from a fixed unknown distrib: D
     2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0




 Copyright © 2001, Andrew W. Moore                      PAC-learning: Slide 13




    Test Error Rate
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
        drawn from a fixed unknown distrib: D
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0
• For each hypothesis h ,
• Say h is Correctly Classified (CCd) if h has
  zero training set error
• Define TESTERR(h )
      = Fraction of test points that h will classify
        correctly
      = P(h classifies a random test point
        correctly)
• Say h is BAD if TESTERR(h) > ε




 Copyright © 2001, Andrew W. Moore                      PAC-learning: Slide 14




                                                                                 7
Test Error Rate
                                                                      P (h is CCd | h is bad ) =
                                                        P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• We specify f, the machine
                                                                            ≤ (1 − ε ) R
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
        drawn from a fixed unknown distrib: D
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
  some hest for which the training set error is 0
• For each hypothesis h ,
• Say h is Correctly Classified (CCd) if h has
  zero training set error
• Define TESTERR(h )
      = Fraction of test points that i will classify
        correctly
      = P(h classifies a random test point
        correctly)
• Say h is BAD if TESTERR(h) > ε




 Copyright © 2001, Andrew W. Moore                                                          PAC-learning: Slide 15




    Test Error Rate
                                                                      P (h is CCd | h is bad ) =
                                                        P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad )
• We specify f, the machine
                                                                            ≤ (1 − ε ) R
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
                                                        P ( we learn a bad h ) ≤
   •How is a datapoint generated?
      1.Vector of inputs x k = (x k1 ,x k2 , x km) is
                                                                   the set of CCd h' s 
        drawn from a fixed unknown distrib: D                    P
                                                                   containsa bad h  = 
                                                                                       
      2.The corresponding output y k=f(x k , h*)
• We learn an approximation of h* by choosing
                                                                   P (∃h. h is CCd ^ h is bad)=
  some hest for which the training set error is 0
• For each hypothesis h ,
                                                                     ( h1 is CCd ^ h1 is bad) ∨ 
• Say h is Correctly Classified (CCd) if h has                                                  
  zero training set error                                            ( h2 is CCd ^ h2 is bad) ∨ 
                                                                                                 ≤
                                                                   P
• Define TESTERR(h )
                                                                                  :
                                                                                                
      = Fraction of test points that i will classify
                                                                     ( h is CCd ^ h is bad) 
                                                                    H                           
        correctly                                                                      H
      = P(h classifies a random test point                                H

                                                                      ∑ P(h is          CCd ^ hi is bad)≤
        correctly)                                                                  i
                                                                          i =1
                                                             H
• Say h is BAD if TESTERR(h) > ε
                                                            ∑ P(h is             CCd | hi is bad) =
                                                                      i
                                                            i =1

                                                        H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R
 Copyright © 2001, Andrew W. Moore                                                          PAC-learning: Slide 16




                                                                                                                     8
PAC Learning
 • Chose R such that with probability less than δ we’ll
   select a bad hest (i.e. an hest which makes mistakes
   more than fraction ε of the time)
 • Probably Approximately Correct
 • As we just saw, this can be achieved by choosing
   R such that
                 δ = P( we learn a bad h ) ≤ H (1 − ε ) R

 • i.e. R such that
                                   0.69                 1
                        R≥               log 2 H + log 2 
                                    ε                   δ
 Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 17




                                     PAC in action
Machine       Example                              H                R required to PAC-
              Hypothesis                                            learn
                                                                        0.69           1
And-positive- X3 ^ X7 ^ X8                         2m                         m + log 2 
                                                                          ε            δ
literals
And-literals X3 ^ ~X7                              3m                0.69                  1
                                                                           (log2 3)m + log2 
                                                                      ε                    δ

Lookup                   X1   X2     X3   X4   Y

                                                                        0.69  m        1
                                                         m
                         0    0      0    0    0



                                                    22                        2 + log 2 
                         0    0      0    1    1


Table
                         0    0      1    0    1

                                                                         ε             δ
                         0    0      1    1    0
                         0    1      0    0    1
                         0    1      0    1    0
                         0    1      1    0    0
                         0    1      1    1    1
                         1    0      0    0    0
                         1    0      0    1    0
                         1    0      1    0    0
                         1    0      1    1    1
                         1    1      0    0    0
                         1    1      0    1    0
                         1    1      1    0    0
                         1    1      1    1    0




                        (X1 ^ X5) v
And-lits or                                                          0. 69                       1
                                                   (3m ) 2 = 3 2m           ( 2 log 2 3) m + log2 
                                                                      ε                          δ
And-lits                (X2 ^ ~X7 ^ X8)

 Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 18




                                                                                                        9
PAC for decision trees of depth k
• Assume m attributes
• Hk = Number of decision trees of depth k

• H0 =2
• Hk+1 = (#choices of root attribute) *
         (# possible left subtrees) *
          (# possible right subtrees)
                  = m * Hk * Hk

• Write Lk = log2 Hk
• L0 = 1
• Lk+1 = log2 m + 2Lk
• So Lk = (2 k -1)(1+log2 m) +1
• So to PAC-learn, need
                                              0.69  k                                1
                                         R≥         ( 2 − 1)(1 + log 2 m) + 1 + log 2 
                                               ε                                     δ
Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 19




                 What you should know
• Be able to understand every step in the math that
  gets you to
                  δ = P( we learn a bad h ) ≤ H (1 − ε ) R

• Understand that you thus need this many records
  to PAC-learn a machine with H hypotheses
                                    0.69                 1
                       R≥                 log 2 H + log 2 
                                     ε                   δ
• Understand examples of deducing H for various
  machines
Copyright © 2001, Andrew W. Moore                                             PAC-learning: Slide 20




                                                                                                       10

More Related Content

What's hot

Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers Istiak Ahmed
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 
Unit 1 quantifiers
Unit 1  quantifiersUnit 1  quantifiers
Unit 1 quantifiersraksharao
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiersblaircomp2003
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and ApplicationsSaurav Jha
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceasimnawaz54
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Probleminventionjournals
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Matthew Leingang
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacityFrank Katta
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiersallyn joy calcaben
 
Conditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicConditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicFayçal Touazi
 
Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1saya efan
 

What's hot (20)

Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
I2ml3e chap2
I2ml3e chap2I2ml3e chap2
I2ml3e chap2
 
Module 4 part_1
Module 4 part_1Module 4 part_1
Module 4 part_1
 
Unit 1 quantifiers
Unit 1  quantifiersUnit 1  quantifiers
Unit 1 quantifiers
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiers
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
Puy chosuantai2
Puy chosuantai2Puy chosuantai2
Puy chosuantai2
 
Lecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inferenceLecture 3 qualtifed rules of inference
Lecture 3 qualtifed rules of inference
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Iteration Techniques
Iteration TechniquesIteration Techniques
Iteration Techniques
 
Lecture7 channel capacity
Lecture7   channel capacityLecture7   channel capacity
Lecture7 channel capacity
 
CMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & QuantifiersCMSC 56 | Lecture 3: Predicates & Quantifiers
CMSC 56 | Lecture 3: Predicates & Quantifiers
 
Predicate & quantifier
Predicate & quantifierPredicate & quantifier
Predicate & quantifier
 
Quantification
QuantificationQuantification
Quantification
 
Conditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logicConditional preference queries and possibilistic logic
Conditional preference queries and possibilistic logic
 
Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1Mth3101 Advanced Calculus Chapter 1
Mth3101 Advanced Calculus Chapter 1
 

Similar to PAC-Learning Overview

Information Gain
Information GainInformation Gain
Information Gainguest32311f
 
Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coverianceudhayax793
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: EntropyDongseo University
 
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxsmile790243
 

Similar to PAC-Learning Overview (8)

pac.ppt
pac.pptpac.ppt
pac.ppt
 
Information Gain
Information GainInformation Gain
Information Gain
 
Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coveriance
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 
Mle
MleMle
Mle
 
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docxMA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
MA500-2 Topological Structures 2016Aisling McCluskey, Dar.docx
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 

More from guestfee8698

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesguestfee8698
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
Short Overview of Bayes Nets
Short Overview of Bayes NetsShort Overview of Bayes Nets
Short Overview of Bayes Netsguestfee8698
 
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiersguestfee8698
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networksguestfee8698
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionguestfee8698
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)guestfee8698
 
Gaussian Bayes Classifiers
Gaussian Bayes ClassifiersGaussian Bayes Classifiers
Gaussian Bayes Classifiersguestfee8698
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Minersguestfee8698
 

More from guestfee8698 (19)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
VC dimensio
VC dimensioVC dimensio
VC dimensio
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Short Overview of Bayes Nets
Short Overview of Bayes NetsShort Overview of Bayes Nets
Short Overview of Bayes Nets
 
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian ClassifiersA Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
 
Learning Bayesian Networks
Learning Bayesian NetworksLearning Bayesian Networks
Learning Bayesian Networks
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Gaussian Bayes Classifiers
Gaussian Bayes ClassifiersGaussian Bayes Classifiers
Gaussian Bayes Classifiers
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
Gaussians
GaussiansGaussians
Gaussians
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

PAC-Learning Overview

  • 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these sli des in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, Andrew W. Moore Nov 30th, 2001 Probably Approximately Correct (PAC) Learning • Imagine we’re doing classification with categorical inputs. • All inputs and outputs are binary. • Data is noiseless. • There’s a machine f(x,h) which has H possible settings (a.k.a. hypotheses), called h1, h 2 .. hH. Copyright © 2001, Andrew W. Moore PAC-learning: Slide 2 1
  • 2. Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 3 Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? (H = 8) True X2 X3 X2 ^ X3 X1 X1 ^ X2 X1 ^ X3 X1 ^ X2 ^ X3 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 4 2
  • 3. And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 5 And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? (H = 2m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 6 3
  • 4. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that contain only logical ands. • Example hypotheses: • X1 ^ ~X3 ^ X19 • X3 ^ ~X18 • ~X7 • X1 ^ X2 ^ ~X3 ^ … ^ Xm • Question: if there are 2 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 7 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are 2 ~X1 True attributes, what is the ~X1 ^ X2 complete set of hypotheses ~X1 ^ ~X2 in f? (H = 9) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 8 4
  • 5. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 9 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? (H = 3m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 10 5
  • 6. Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 11 Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 H =2 m 1 1 1 0 0 2 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 12 6
  • 7. A Game • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 13 Test Error Rate • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that h will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 14 7
  • 8. Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that i will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 15 Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints P ( we learn a bad h ) ≤ •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is  the set of CCd h' s  drawn from a fixed unknown distrib: D P  containsa bad h  =    2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing P (∃h. h is CCd ^ h is bad)= some hest for which the training set error is 0 • For each hypothesis h ,  ( h1 is CCd ^ h1 is bad) ∨  • Say h is Correctly Classified (CCd) if h has   zero training set error  ( h2 is CCd ^ h2 is bad) ∨  ≤ P • Define TESTERR(h ) :   = Fraction of test points that i will classify  ( h is CCd ^ h is bad)  H  correctly H = P(h classifies a random test point H ∑ P(h is CCd ^ hi is bad)≤ correctly) i i =1 H • Say h is BAD if TESTERR(h) > ε ∑ P(h is CCd | hi is bad) = i i =1 H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R Copyright © 2001, Andrew W. Moore PAC-learning: Slide 16 8
  • 9. PAC Learning • Chose R such that with probability less than δ we’ll select a bad hest (i.e. an hest which makes mistakes more than fraction ε of the time) • Probably Approximately Correct • As we just saw, this can be achieved by choosing R such that δ = P( we learn a bad h ) ≤ H (1 − ε ) R • i.e. R such that 0.69  1 R≥  log 2 H + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 17 PAC in action Machine Example H R required to PAC- Hypothesis learn 0.69  1 And-positive- X3 ^ X7 ^ X8 2m  m + log 2  ε δ literals And-literals X3 ^ ~X7 3m 0.69  1  (log2 3)m + log2  ε δ Lookup X1 X2 X3 X4 Y 0.69  m 1 m 0 0 0 0 0 22  2 + log 2  0 0 0 1 1 Table 0 0 1 0 1 ε δ 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 (X1 ^ X5) v And-lits or 0. 69  1 (3m ) 2 = 3 2m  ( 2 log 2 3) m + log2  ε δ And-lits (X2 ^ ~X7 ^ X8) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 18 9
  • 10. PAC for decision trees of depth k • Assume m attributes • Hk = Number of decision trees of depth k • H0 =2 • Hk+1 = (#choices of root attribute) * (# possible left subtrees) * (# possible right subtrees) = m * Hk * Hk • Write Lk = log2 Hk • L0 = 1 • Lk+1 = log2 m + 2Lk • So Lk = (2 k -1)(1+log2 m) +1 • So to PAC-learn, need 0.69  k 1 R≥  ( 2 − 1)(1 + log 2 m) + 1 + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 19 What you should know • Be able to understand every step in the math that gets you to δ = P( we learn a bad h ) ≤ H (1 − ε ) R • Understand that you thus need this many records to PAC-learn a machine with H hypotheses 0.69  1 R≥  log 2 H + log 2  ε δ • Understand examples of deducing H for various machines Copyright © 2001, Andrew W. Moore PAC-learning: Slide 20 10