Note to other teachers and users of these slides. Andrew would
                                                         be...
Example of a machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical ands.
• Exa...
And-Positive-Literals Machine
• f(x,h) consists of all logical sentences about X1, X2
  .. Xm that contain only logical an...
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
  contain o...
And-Literals Machine
• f(x,h) consists of all logical
  sentences about X1, X2 ..
  Xm or their negations that
           ...
Lookup Table Machine
• f(x,h) consists of all truth              X1   X2   X3   X4   Y

  tables mapping combinations     ...
A Game
• We specify f, the machine
• Nature choose hidden random hypothesis h*
• Nature randomly generates R datapoints
  ...
Test Error Rate
                                                                      P (h is CCd | h is bad ) =
         ...
PAC Learning
 • Chose R such that with probability less than δ we’ll
   select a bad hest (i.e. an hest which makes mistak...
PAC for decision trees of depth k
• Assume m attributes
• Hk = Number of decision trees of depth k

• H0 =2
• Hk+1 = (#cho...
Upcoming SlideShare
Loading in …5
×

PAC Learning

628 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

PAC Learning

  1. 1. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these sli des in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Copyright © 2001, Andrew W. Moore Nov 30th, 2001 Probably Approximately Correct (PAC) Learning • Imagine we’re doing classification with categorical inputs. • All inputs and outputs are binary. • Data is noiseless. • There’s a machine f(x,h) which has H possible settings (a.k.a. hypotheses), called h1, h 2 .. hH. Copyright © 2001, Andrew W. Moore PAC-learning: Slide 2 1
  2. 2. Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 3 Example of a machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are 3 attributes, what is the complete set of hypotheses in f? (H = 8) True X2 X3 X2 ^ X3 X1 X1 ^ X2 X1 ^ X3 X1 ^ X2 ^ X3 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 4 2
  3. 3. And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 5 And-Positive-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm that contain only logical ands. • Example hypotheses: • X1 ^ X3 ^ X19 • X3 ^ X18 • X7 • X1 ^ X2 ^ X2 ^ x4 … ^ Xm • Question: if there are m attributes, how many hypotheses in f? (H = 2m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 6 3
  4. 4. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that contain only logical ands. • Example hypotheses: • X1 ^ ~X3 ^ X19 • X3 ^ ~X18 • ~X7 • X1 ^ X2 ^ ~X3 ^ … ^ Xm • Question: if there are 2 attributes, what is the complete set of hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 7 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are 2 ~X1 True attributes, what is the ~X1 ^ X2 complete set of hypotheses ~X1 ^ ~X2 in f? (H = 9) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 8 4
  5. 5. And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? Copyright © 2001, Andrew W. Moore PAC-learning: Slide 9 And-Literals Machine • f(x,h) consists of all logical sentences about X1, X2 .. Xm or their negations that True True contain only logical ands. • Example hypotheses: True X2 • X1 ^ ~X3 ^ X19 True ~X2 • X3 ^ ~X18 X1 True • ~X7 X1 ^ X2 • X1 ^ X2 ^ ~X3 ^ … ^ Xm X1 ^ ~X2 • Question: if there are m ~X1 True attributes, what is the size of ~X1 ^ X2 the complete set of ~X1 ^ ~X2 hypotheses in f? (H = 3m) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 10 5
  6. 6. Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 11 Lookup Table Machine • f(x,h) consists of all truth X1 X2 X3 X4 Y tables mapping combinations 0 0 0 0 0 0 0 0 1 1 of input attributes to true 0 0 1 0 1 and false 0 0 1 1 0 0 1 0 0 1 • Example hypothesis: 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 • Question: if there are m 1 0 0 0 0 attributes, what is the size of 1 0 0 1 0 the complete set of 1 0 1 0 0 1 0 1 1 1 hypotheses in f? 1 1 0 0 0 1 1 0 1 0 H =2 m 1 1 1 0 0 2 1 1 1 1 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 12 6
  7. 7. A Game • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 Copyright © 2001, Andrew W. Moore PAC-learning: Slide 13 Test Error Rate • We specify f, the machine • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that h will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 14 7
  8. 8. Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is drawn from a fixed unknown distrib: D 2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing some hest for which the training set error is 0 • For each hypothesis h , • Say h is Correctly Classified (CCd) if h has zero training set error • Define TESTERR(h ) = Fraction of test points that i will classify correctly = P(h classifies a random test point correctly) • Say h is BAD if TESTERR(h) > ε Copyright © 2001, Andrew W. Moore PAC-learning: Slide 15 Test Error Rate P (h is CCd | h is bad ) = P(∀k ∈ Training Set, f ( xk , h ) = yk | h is bad ) • We specify f, the machine ≤ (1 − ε ) R • Nature choose hidden random hypothesis h* • Nature randomly generates R datapoints P ( we learn a bad h ) ≤ •How is a datapoint generated? 1.Vector of inputs x k = (x k1 ,x k2 , x km) is  the set of CCd h' s  drawn from a fixed unknown distrib: D P  containsa bad h  =    2.The corresponding output y k=f(x k , h*) • We learn an approximation of h* by choosing P (∃h. h is CCd ^ h is bad)= some hest for which the training set error is 0 • For each hypothesis h ,  ( h1 is CCd ^ h1 is bad) ∨  • Say h is Correctly Classified (CCd) if h has   zero training set error  ( h2 is CCd ^ h2 is bad) ∨  ≤ P • Define TESTERR(h ) :   = Fraction of test points that i will classify  ( h is CCd ^ h is bad)  H  correctly H = P(h classifies a random test point H ∑ P(h is CCd ^ hi is bad)≤ correctly) i i =1 H • Say h is BAD if TESTERR(h) > ε ∑ P(h is CCd | hi is bad) = i i =1 H × P (hi is CCd | hi is bad) ≤ H (1 − ε ) R Copyright © 2001, Andrew W. Moore PAC-learning: Slide 16 8
  9. 9. PAC Learning • Chose R such that with probability less than δ we’ll select a bad hest (i.e. an hest which makes mistakes more than fraction ε of the time) • Probably Approximately Correct • As we just saw, this can be achieved by choosing R such that δ = P( we learn a bad h ) ≤ H (1 − ε ) R • i.e. R such that 0.69  1 R≥  log 2 H + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 17 PAC in action Machine Example H R required to PAC- Hypothesis learn 0.69  1 And-positive- X3 ^ X7 ^ X8 2m  m + log 2  ε δ literals And-literals X3 ^ ~X7 3m 0.69  1  (log2 3)m + log2  ε δ Lookup X1 X2 X3 X4 Y 0.69  m 1 m 0 0 0 0 0 22  2 + log 2  0 0 0 1 1 Table 0 0 1 0 1 ε δ 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 (X1 ^ X5) v And-lits or 0. 69  1 (3m ) 2 = 3 2m  ( 2 log 2 3) m + log2  ε δ And-lits (X2 ^ ~X7 ^ X8) Copyright © 2001, Andrew W. Moore PAC-learning: Slide 18 9
  10. 10. PAC for decision trees of depth k • Assume m attributes • Hk = Number of decision trees of depth k • H0 =2 • Hk+1 = (#choices of root attribute) * (# possible left subtrees) * (# possible right subtrees) = m * Hk * Hk • Write Lk = log2 Hk • L0 = 1 • Lk+1 = log2 m + 2Lk • So Lk = (2 k -1)(1+log2 m) +1 • So to PAC-learn, need 0.69  k 1 R≥  ( 2 − 1)(1 + log 2 m) + 1 + log 2  ε δ Copyright © 2001, Andrew W. Moore PAC-learning: Slide 19 What you should know • Be able to understand every step in the math that gets you to δ = P( we learn a bad h ) ≤ H (1 − ε ) R • Understand that you thus need this many records to PAC-learn a machine with H hypotheses 0.69  1 R≥  log 2 H + log 2  ε δ • Understand examples of deducing H for various machines Copyright © 2001, Andrew W. Moore PAC-learning: Slide 20 10

×