Poggi analytics - concepts - 1a

Buenos Aires, abril de 2016
Eduardo Poggi
http://www.shutterstock.com/

Concept Learning
 Definitions
 Search Space and General-Specific Ordering
 Concept learning as search
 FIND-S
 The Candidate Elimination Algorithm
 Inductive Bias

First definition
 The problem is to learn a function mapping examples into two
classes: positive and negative.
 We are given a database of examples already classified as positive or
negative.
 Concept learning: the process of inducing a function mapping input
examples into a Boolean output.

Notation
 Set of instances X
 Target concept c : X  {+,-}
 Training examples E = {(x , c(x))}
 Data set D  X
 Set of possible hypotheses H
 h  H / h : X  {+,-}
 Goal: Find h / h(x)=c(x)

Representation of Examples
Features:
• color {red, brown, gray}
• size {small, large}
• shape {round,elongated}
• land {humid,dry}
• air humidity {low,high}
• texture {smooth, rough}

The Input and Output Space
X
Only a small subset is contained
in our database.
Y = {+,-}
X : The space of all possible examples (input space).
Y: The space of classes (output space).
An example in X is a feature vector X.
For instance: X = (red,small,elongated,humid,low,rough)
X is the cross product of all feature values.

The Training Examples
D: The set of training examples.
D is a set of pairs { (x,c(x)) }, where c is the target concept
Example of D:
((red,small,round,humid,low,smooth), +)
((red,small,elongated,humid,low,smooth),+)
((gray,large,elongated,humid,low,rough), -)
((red,small,elongated,humid,high,rough), +)
Instances from the input space
Instances from
the output space

Hypothesis Representation
Consider the following hypotheses:
(*,*,*,*,*,*): all mushrooms are poisonous
(0,0,0,0,0,0): no mushroom is poisonous
Special symbols:
 * Any value is acceptable
 0 no value is acceptable
Any hypothesis h is a function from X to Y
h: X Y
We will explore the space of conjunctions.

Hypothesis Space
The space of all hypotheses is represented by H
Let h be a hypothesis in H.
Let X be an example of a mushroom.
if h(X) = + then X is poisonous,
otherwise X is not-poisonous
Our goal is to find the hypothesis, h*, that is very “close”
to target concept c.
A hypothesis is said to “cover” those examples it classifies
as positive.
X
h

Assumption 1
We will explore the space of all conjunctions.
We assume the target concept falls within this space.
Target concept c
H

Assumption 2
A hypothesis close to target concept c obtained after
seeing many training examples will result in high
accuracy on the set of unobserved examples.
Training set D
Hypothesis h* is good
Complement set D’
Hypothesis h* is good

Concept Learning as Search
There is a general to specific ordering inherent to any
hypothesis space.
Consider these two hypotheses:
h1 = (red,*,*,humid,*,*)
h2 = (red,*,*,*,*,*)
We say h2 is more general than h1 because h2 classifies
more instances than h1 and h1 is covered by h2.

General-Specific
For example, consider the following hypotheses:
h1
h2 h3
h1 is more general than h2 and h3.
h2 and h3 are neither more specific nor more general
than each other.

Let hj and hk be two hypotheses mapping examples into {+,-}.
We say hj is more general than hk iff
For all examples X, hk(X) = +  hj(X) = +
We represent this fact as hj >= hk
The >= relation imposes a partial ordering over the
hypothesis space H (reflexive, antisymmetric, and transitive).
Definition

Lattice
Any input space X defines then a lattice of hypotheses ordered
according to the general-specific relation:
h1
h3 h4
h2
h5 h6
h7 h8

Working Example: Mushrooms
Class of Tasks: Predicting poisonous mushrooms
Performance: Accuracy of Classification
Experience: Database describing mushrooms with their class
Knowledge to learn:
Function mapping mushrooms to {+,-}
where -:not-poisonous and +:poisonous
Representation of target knowledge:
conjunction of attribute values.
Learning mechanism:
Find-S

Finding a Maximally-Specific Hypothesis
Algorithm to search the space of conjunctions:
 Start with the most specific hypothesis
 Generalize the hypothesis when it fails to cover a positive
example
Algorithm:
1. Initialize h to the most specific hypothesis
2. For each positive training example X
For each value a in h
If example X and h agree on a, do nothing
else generalize a by the next more general constraint
3. Output hypothesis h

Example
Let’s run the learning algorithm above with the
following examples:
((red,small,elongated,humid,low,smooth),+)
We start with the most specific hypothesis:
h = (0,0,0,0,0,0)
The first example comes and since the example is positive and h
fails to cover it, we simply generalize h to cover exactly this
example: h = (red,small,round,humid,low,smooth)

Example
Hypothesis h basically says that the first example is the only
positive example, all other examples are negative.
Then comes examples 2:
((red,small,elongated,humid,low,smooth), poisonous)
This example is positive. All attributes match hypothesis h
except for attribute shape: it has the value elongated, not
round.
We generalize this attribute using symbol * yielding:
h: (red,small,*,humid,low,smooth)
The third example is negative and so we just ignore it.
Why is it we don’t need to be concerned with negative
examples?

Example
Upon observing the 4th example, hypothesis h is
generalized to the following:
h = (red,small,*,humid,*,*)
h is interpreted as any mushroom that is red, small and
found on humid land should be classified as poisonous.

Analyzing the Algorithm
• The algorithm is
guaranteed to find the
hypothesis that is most
specific and consistent with
the set of training
examples.
• It takes advantage of the
general-specific ordering to
move on the corresponding
lattice searching for the
next most specific
hypothesis.
h1
h3 h4
h2
h5 h6
h7 h8

Points to Consider
 There are many hypotheses consistent with the training data D.
 Why should we prefer the most specific hypothesis?
 What would happen if the examples are not consistent?
 What would happen if they have errors, noise?
 What if there is a hypothesis space H where one can find more that one
maximally specific hypothesis h?
 The search over the lattice must then be different to allow for this
possibility.

Summary FIND-S
 The input space is the space of all examples; the output space is the
space of all classes.
 A hypothesis maps examples into classes.
 We want a hypothesis close to target concept c.
 The input space establishes a partial ordering over the hypothesis
space.
 One can exploit this ordering to move along the corresponding
lattice.

Working Example: Mushrooms
Class of Tasks: Predicting poisonous mushrooms
Performance: Accuracy of Classification
Experience: Database describing mushrooms with their class
Knowledge to learn:
Function mapping mushrooms to {+,-}
where -:not-poisonous and +:poisonous
Representation of target knowledge:
conjunction of attribute values.
Learning mechanism:
candidate-elimination

Candidate Elimination
 The algorithm that finds the maximally specific hypothesis
is limited in that it only finds one of many hypotheses
consistent with the training data.
 The Candidate Elimination Algorithm (CEA) finds ALL
hypotheses consistent with the training data.
 CEA does that without explicitly enumerating all
consistent hypotheses.

Consistency vs Coverage
h1
h2
h1 covers a different set of examples than h2
h2 is consistent with training set D
h1 is not consistent with training set D
Positive examples
Negative examples
Training set D
-
-
-
-
+
+
++
+
+
+

Version Space VS
Hypothesis space H
Version space:
Subset of hypothesis from H consistent with training set D.

List-Then-Eliminate Algorithm
Algorithm:
1. Version Space VS: All hypotheses in H
2. For each training example X
Remove every hypothesis h in H inconsistent
with X: h(x) = c(x)
3. Output the version space VS
Comments: This is unfeasible. The size of H is unmanageable.

Previous Exercise: Mushrooms
Let’s remember our exercise in which we tried to classify
mushrooms as poisonous (+) or not-poisonous (-).
Training set D:
((red,small,elongated,humid,low,smooth), +)

Consistent Hypotheses
Our first algorithm found only one out of six
consistent hypotheses:
(red,small,*,humid,*,*)
(*,small,*,humid,*,*)(red,*,*,humid,*,*) (red,small,*,*,*,*)
(red,*,*,*,*,*) (*,small,*,*,*,*)G:
S:
S: Most specific
G: Most general

Candidate-Elimination Algorithm
(red,*,*,*,*,*)(*,small,*,*,*,*)G:
S:
The candidate elimination algorithm keeps two lists
of hypotheses consistent with the training data:
The list of most specific hypotheses S and
The list of most general hypotheses G
This is enough to derive the whole version space VS.
VS

• Initialize G to the set of maximally general hypotheses in H
• Initialize S to the set of maximally specific hypotheses in H
• For each training example X do
• If X is positive: generalize S if necessary
• If X is negative: specialize G if necessary
• Output {G,S}

 Initialize G to the set of maximally general hypotheses in H
 Initialize S to the set of maximally specific hypotheses in H
 For each training example d, do
 If d+
 Remove from G any hypothesis inconsistent with d
 For each hypothesis s in S that is not consistent with d
 Remove s from S
 Add to S all minimal generalizations h of s such that h is consistent with
d and some member of G is more general than h
 Remove from S any hipothesis that is more general than another
hypothesis in S
 If d-
 Remove from S any hypothesis inconsistent with d
 For each hypothesis g in G that is not consistent with d
 Remove g from G
 Add to G all minimal specializations h of g such that h is consistent with
d and some member of S is more general than h
 Remove from G any hipothesis that is less general than another
hypothesis in G

Positive Examples
a) If X is positive:
 Remove from G any hypothesis inconsistent with X
 For each hypothesis h in S not consistent with X
 Remove h from S
 Add all minimal generalizations of h consistent with X
such that some member of G is more general than h
 Remove from S any hypothesis more general than
any other hypothesis in S
G:
S:
h
inconsistent
add minimal generalizations

Negative Examples
b) If X is negative:
Remove from S any hypothesis inconsistent with X
For each hypothesis h in G not consistent with X
Remove g from G
Add all minimal generalizations of h consistent with X
such that some member of S is more specific than h
Remove from G any hypothesis less general than any other
hypothesis in G
G:
S: h inconsistent
add minimal specializations

An Exercise
Initialize the S and G sets:
S: (0,0,0,0,0,0)
G: (*,*,*,*,*,*)
Let’s look at the first two examples:

An Exercise: two positives
The first two examples are positive:
S: (0,0,0,0,0,0)
(red,small,round,humid,low,smooth)
(red,small,*,humid,low,smooth)
G: (*,*,*,*,*,*)
generalize
specialize

An Exercise: first negative
The third example is a negative example:
S:(red,small,*,humid,low,smooth)
G: (*,*,*,*,*,*)
generalize
specialize
(red,*,*,*,*,*,*) (*,small,*,*,*,*) (*,*,*,*,*,smooth)
Why is (*,*,round,*,*,*) not a valid specialization of G

An Exercise: another positive
The fourth example is a positive example:
S:(red,small,*,humid,low,smooth)
generalize
specialize
G: (red,*,*,*,*,*,*) (*,small,*,*,*,*) (*,*,*,*,*,smooth)

The Learned Version Space VS
G: (red,*,*,*,*,*,*) (*,small,*,*,*,*)
S: (red,small,*,humid,*,*)
(red,*,*,humid,*,*) (red,small,*,*,*,*) (*,small,*,humid,*,*)

Points to Consider
 Will the algorithm converge to the right hypothesis?
 The algorithm is guaranteed to converge to the right hypothesis
provided the following:
 No errors exist in the examples
 The target concept is included in the hypothesis space H
 What happens if there exists errors in the examples?
 The right hypothesis would be inconsistent and thus eliminated.
 If the S and G sets converge to an empty space we have evidence that
the true concept lies outside space H.

Query Learning
Remember the version space VS after seeing our 4 examples
on the mushroom database:
G: (red,*,*,*,*,*,*) (*,small,*,*,*,*)
S: (red,small,*,humid,*,*)
(red,*,*,humid,*,*) (red,small,*,*,*,*) (*,small,*,humid,*,*)
What would be a good question to pose to the algorithm?
What example is best next?

Query Learning
 Remember there are three settings for learning:
 Tasks are generated by a random process outside the learner
 The learner can pose queries to a teacher
 The learner explores its surroundings autonomously
 Here we focus on the second setting; posing queries to an expert.
 Version space strategy: Ask about the class of an example that would
prune half of the space.
 Example: (red,small,round,dry,low,smooth)

Query Learning
 In general if we are able to prune the version space by
half on each new query then we can find an optimal
hypothesis in the following
 Number of steps: log2 |VS|
 Can you explain why?

Classifying Examples
 What if the version space VS has not collapsed into a
single hypothesis and we are asked to classify a new
instance?
 Suppose all hypotheses in set S agree that the instance is
positive.
 Then we are sure that all hypotheses in VS agree the instance is
positive. Why?
 The same can be said if the instance is negative by all members
of set G. Why?
 In general one can vote over all hypotheses in VS if there
is no unanimous agreement.

Inductive Bias
 Inductive bias is the preference for a hypothesis space H
and a search mechanism over H.
 What would happen if we choose an H that contains all
possible hypotheses?
 What would the size of H be?
 |H| = Size of the power set of the input space X.
 Example:
 You have n Boolean features. |X| = 2n
 And the size of H is 2^2^n

Inductive Bias
In this case, the candidate elimination algorithm would simply
classify as positive the training examples it has seen. This is
because H is so large, every possible hypothesis is contained
within it.
A Property of any Inductive Algorithm:
It must have some embedded assumptions about the
nature of H.
Without assumptions learning is impossible.

Summary
 The candidate elimination algorithm exploits the general-specific
ordering of hypotheses to find all hypotheses consistent with the
training data.
 The version space contains all consistent hypotheses and is simply
represented by two lists: S and G.
 Candidate elimination algorithm is not robust to noise and assumes
the target concept is included in the hypothesis space.
 Any inductive algorithm needs some assumptions about the
hypothesis space, otherwise it would be impossible to perform
predictions.

eduardopoggi@yahoo.com.ar
eduardo-poggi
http://ar.linkedin.com/in/eduardoapoggi
https://www.facebook.com/eduardo.poggi
@eduardoapoggi

Bibliografía
 Capítulo 2 de Mitchell

Poggi analytics - concepts - 1a

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Poggi analytics - concepts - 1a

Similar to Poggi analytics - concepts - 1a (20)

More from Gaston Liberman

More from Gaston Liberman (13)

Recently uploaded

Recently uploaded (20)

Poggi analytics - concepts - 1a

Editor's Notes