A Theory of the Learnable; PAC LearningPresentation Transcript

A Theory of the Learnable
Leslie Valiant
Dhruv Gairola
Computational Complexity, Michael Soltys
gairold@mcmaster.ca ; dhruvgairola.blogspot.ca
November 13, 2013
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
1 / 15

Overview
1
Learning
2
Contribution
3
PAC learning
Sample complexity
Boolean functions
k-decision lists
4
Conclusion
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
2 / 15

Learning
Humans can learn.
Machine learning (ML) : learning from data; knowledge acquisition
w/o explicit programming.
Explore computational models for learning.
Use models to get insights about learning.
Use models to develop new learning algorithms.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
3 / 15

Modelling supervised Learning
Given training set of labelled examples, learning algorithm generates a
hypothesis (candidate function). Run hypothesis on test set to check
how good it is.
But how good really? Maybe training and test data consists of bad
examples so the hypothesis doesn’t generalize well.
Insight : Introduce probabilities to measure degree of certainty and
correctness.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
4 / 15

Contribution
With high probability an (eﬃcient) learning algorithm will ﬁnd a
hypothesis that is approximately identical to the hidden target
function.
Intuition : A hypothesis built from a large amount of training data is
unlikely to be wrong i.e., Probably approximately correct (PAC).
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
5 / 15

PAC learning
Goal : show that after training, with high probability, all good
hypothesis will be approximately correct.
Notation :
X : set of all possible examples
D : distribution from which examples are drawn
H : set of all possible hypothesis
N : |Xtraining |
f : target function
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
6 / 15

PAC learning (2)
Hypothesis hg ∈ H is approximately correct if :
error (hg ) ≤ where
error(h) = P(h(x) = f (x)| x drawn from D)
Bad hypothesis :
error (hb ) >
P(hb disagrees with 1 example) >
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
7 / 15

PAC learning (3)
P(hb agrees with 1 example) ≤ (1 − ).
P(hb agrees with N examples) ≤ (1 − )N .
P(Hb contains a good hypothesis) ≤ |Hb |(1 − )N ≤ |H|(1 − )N .
Lets say |H|(1 − )N ≤ δ.
...
N ≥ ( 1 )(ln 1 + ln|H|)
δ
This expresses sample complexity.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
8 / 15

Sample complexity
N ≥ ( 1 )(ln 1 + ln|H|)
δ
If you train the learning algo with Xtraining of size N, then the
returned hypothesis is PAC because there exists a probability (1 − δ)
that this hypothesis will have an error of at most (approximately).
e.g., if you want smaller and smaller δ, you need more N’s (more
examples).
Lets look at example of H : boolean functions.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
9 / 15

Why boolean functions?
Because boolean functions can represent concepts, which is what we
commonly want machines to learn.
Concepts are predicates e.g., isMaleOrFemale(height).
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
10 / 15

Boolean functions
Boolean functions are of the form f : {0, 1}n → {0, 1} where n are
the number of literals.
n
Let H = {all boolean functions on n literals} ∴ |H| = 22
Substituting H into sample complexity expression gives O(2n ) i.e.,
boolean functions are not PAC-learnable.
Can we restrict size of H?
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
11 / 15

k-decision lists
A single decision list (DL) is a representation of a single boolean
function. DL is not PAC-learnable either.
A single DL consists of a series of tests.
e.g. if f1 then return b1 ; elseif f2 then return b2 ; ... elseif fn return bn ;
A single DL corresponds to a single hypothesis.
Apply restriction : A k-decision list is a decision list where each test is
a conjunction of at most k literals.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
12 / 15

k-decision lists (2)
What is |H| for k-DL i.e., what is |k-DL(n)| where n is number of
literals?
k
k
After calculations, |k-DL(n)| = 2O(n log (n ))
Substitute |k-DL(n)| into sample complexity expression :
N ≥ 1 (ln 1 + O(nk log (nk )))
δ
δ
Sample complexity is poly! What about learning complexity?
There are eﬃcient algorithms for learning k-decision lists! (e.g.,
greedy algorithm)
We have polynomial sample complexity and eﬃcient k-DL algorithms
∴ k-DL is PAC learnable!
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
13 / 15

Conclusion
PAC learning : with high
probability an (eﬃcient)
learning algorithm will ﬁnd a
hypothesis that is
approximately identical to
the hidden target hypothesis.
k-DL is PAC learnable.
Computational learning
theory : concerned with the
analysis of ML algorithms
and covers a lot of ﬁelds.
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
14 / 15

References
Carla Gomes, Cornell, Foundations of AI notes
Dhruv Gairola (McMaster Univ.)
A Theory of the Learnable
November 13, 2013
15 / 15

Full NameComment goes here.