2. Concept Learning as Search
• We assume that the concept lies in the
hypothesis space. So we search for a
hypothesis belonging to this hypothesis
space that best fits the training examples,
such that the output given by the hypothesis
is same as the true output of concept
• Hence the search has achieved the
learning of the actual concept using the
given training set
3. Concept Learning as Search
• In short:
Assume , search for an that best fits D, such
that xi D, h(xi) = c(xi)
Where c is the concept we are trying to determine (the
output of the training set)
H is the hypothesis space
D is the training set
h is the hypothesis
xi is the ith instance of Instance space
Hc Hh
4. Ordering of Hypothesis Space
• General to Specific Ordering of Hypothesis
Space
• Most General Hypothesis:
– hg< ?, ? >
• Most Specific Hypothesis:
– hs< Ø , Ø >
5. Ordering of Hypothesis Space
SK = < T, BP >, T = { H, N, L } and BP = { H, N, L }
< ?, ? >
< H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L >
< H, H >< H, N >< H, L > < N, H >< N, N >< N, L > < L, H >< L, N >< L, L >
< Ø , Ø >
6. Find-S Algorithm
• FIND-S finds the most specific hypothesis
possible within the version space given a
set of training data
• Uses the general-to-specific ordering for
searching through the hypotheses space
7. Find-S Algorithm
Initialize hypothesis h to the most specific hypothesis in H
(the hypothesis space)
For each positive training instance x (i.e. output is 1)
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else
Replace ai in h by the next more
general constraint that is satisfied by x
Output hypothesis h
8. Find-S Algorithm
To illustrate this algorithm, let us assume that the learner is given the sequence of
following training examples from the SICK domain:
D T BP SK
x1 H H 1
x2 L L 0
x3 N H 1
The first step of FIND-S is to initialize hypothesis h to the most specific hypothesis in
H:
h = < Ø , Ø >
9. Find-S Algorithm
D T BP SK
x1 H H 1
First training example is positive:
But h = < Ø , Ø > fails over this first instance
Because h(x1) = 0, since Ø gives us 0 for any attribute
value
Since h = < Ø , Ø > is so specific that it doesn’t give even one single instance
as positive, so we change it to next more general hypothesis that fits this
particular first instance x1 of the training data set D to
h = < H , H >
10. Find-S Algorithm
< ?, ? >
< H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L >
< H, H >< H, N >< H, L > < N, H >< N, N >< N, L > < L, H >< L, N >< L, L >
< Ø , Ø >
SK = < T, BP >, T = { H, N, L } and BP = { H, N, L }
11. Find-S Algorithm
D T BP SK
x1 H H 1
x2 L L 0
Upon encountering the second example; in this case a negative example, the algorithm makes no
change to h. In fact, the FIND-S algorithm simply ignores every negative example
So the hypothesis still remains: h = < H , H >
12. Find-S Algorithm
D T BP SK
x1 H H 1
x2 L L 0
x3 N H 1
Final Hypothesis:
h = < ?, H >
What does this hypothesis state?
This hypothesis will term all the future patients which have BP = H as SICK for all the
different values of T
13. Find-S Algorithm
< ?, ? >
< H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L >
< H, H >< H, N >< H, L > < N, H >< N, N >< N, L > < L, H >< L, N >< L, L >
< Ø , Ø >
D T BP SK
x1 H H 1
x2 L L 0
x3 N H 1
14. Candidate-Elimination Algorithm
• Although FIND-S does find a consistent
hypothesis
• In general, however, there may be more
hypotheses consistent with D; of which
FIND-S only finds one
• Candidate-Elimination finds all the
hypotheses in the Version Space
15. Version Space (VS)
• Version space is a set of all the
hypotheses that are consistent with all the
training examples
• By consistent we mean
h(xi) = c(xi) , for all instances belonging to
training set D
16. Version Space
Let us take the following training set D:
D T BP SK
x1 H H 1
x2 L L 0
x3 N N 0
Another representation of this set D:
BP
H - - 1
N - 0 -
L 0 - -
L N H T
17. Version Space
Is there a hypothesis that can generate this D:
BP
H - - 1
N - 0 -
L 0 - -
L N H T
One of the consistent hypotheses can be h1 = < H, H >
BP
H 0 0 1
N 0 0 0
L 0 0 0
L N H T
18. Version Space
There are other hypotheses consistent with D, such as h2 = < H, ? >
There’s another hypothesis, h3 = < ?, H >
BP
H 1 1 1
N 0 0 0
L 0 0 0
L N H T
BP
H 0 0 1
N 0 0 1
L 0 0 1
L N H T
19. Version Space
• Version space is denoted as
VS H,D = {h1, h2, h3}
• This translates as: Version space is a
subset of hypothesis space H, composed
of h1, h2 and h3, that is consistent with D
• In other words version space is a group of
all hypotheses consistent with D, not just
one hypothesis we saw in the previous
case
20. Candidate-Elimination Algorithm
• Candidate Elimination works with two sets:
– Set G (General hypotheses)
– Set S (Specific hypotheses)
• Starts with:
– G0 = {< ? , ? >} considers negative examples only
– S0 = {< Ø , Ø >} considers positive examples only
• Within these two boundaries is the entire
Hypothesis space
21. Candidate-Elimination Algorithm
• Intuitively:
– As each training example is observed one by
one
• The S boundary is made more and more general
• The G boundary set is made more and more specific
• This eliminates from the version space any hypotheses found
inconsistent with the new training example
– At the end, we are left with VS
22. Candidate-Elimination Algorithm
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d
For each hypothesis s in S that is inconsistent with d
Remove s from S
Add to S all minimal generalization h of s, such that
h is consistent with d, and some member of G is more general than h
Remove from S any hypothesis that is more general than another one in S
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is inconsistent with d
Remove g from G
Add to G all minimal specializations h of g, such that
h is consistent with d, and some member of S is more specific than h
Remove from G any hypothesis that is less general than another one in G
24. Candidate-Elimination Algorithm
• Candidate Elimination works with two sets:
– Set G (General hypotheses)
– Set S (Specific hypotheses)
• Starts with:
– G0 = {< ? , ? >} considers negative examples only
– S0 = {< Ø , Ø >} considers positive examples only
• Within these two boundaries is the entire
Hypothesis space
25. Candidate-Elimination Algorithm
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d
For each hypothesis s in S that is inconsistent with d
Remove s from S
Add to S all minimal generalization h of s, such that
h is consistent with d, and some member of G is more general than h
Remove from S any hypothesis that is more general than another one in S
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is inconsistent with d
Remove g from G
Add to G all minimal specializations h of g, such that
h is consistent with d, and some member of S is more specific than h
Remove from G any hypothesis that is less general than another one in G
27. Candidate-Elimination Algorithm
D T BP SK
x1 H H 1
d1 = (<H, H>, 1) [a positive example]:
G1 = {< ?, ? >}
S1 = {< H, H >}
Remove < Ø, Ø > from S0 , since it is not consistent with d1
and add the next minimally general hypothesis from H to form
S1
G1 = G0 = {< ?, ? >}, since <?, ?> is consistent with d1; both
give positive outputs
G0 = {< ?, ? >}
S0 = {< Ø, Ø >}
28. Candidate-Elimination Algorithm
D T BP SK
x2 L L 0
Second training example is: d2 = (<L, L>, 0) [negative example]
G2 = {< H, ? >, < ?, H >}
S2 = {< H, H>}
Remove < ?, ? > from G1 , since it is not consistent with d2 and add the next
minimally specialized hypothesis from H to form G2 , keeping in mind one
rule:
S2 = S1 = {< H, H>}, since <H, H> is consistent with d2: both give negative outputs for x2
“Add to G all minimal specializations h of g, such that
h is consistent with d, and some member of S is more specific than h”
Now, observe that the immediate one step specialized hypotheses of < ?, ? > are:
{< H, ? >, < N, ? >, < L, ? >, < ?, H >, < ?, N >, < ?, L >}
G1 = {< ?, ? >}
S1 = {< H, H >}
29. Candidate-Elimination Algorithm
D T BP SK
x3 N H 1
Third and final training example is: d3 = (<N, H>, 1) [A positive example]
G3 = {< ?, H >}
S3 = {< ?, H >}
In S2, < H, H > is not consistent with d3, so we remove it and add minimally
general hypotheses than < H, H >. The two choices we have are: < H, ? >
and < ?, H >. We only keep < ?, H >, since the other one is not consistent
with d3
We see that in G2, < H, ? > is not consistent with d3, so we remove it.
However < ?, H > is consistent hence it is retained: G3 = {< ?, H >}
G2 = {< H, ? >, < ?, H >}
S2 = {< H, H>}
30. Conjunctive vs Disjuncvtive
Conjuntive Rule (ANDing)
h = < T=H AND BP = ? >
BP
H 1 1 1
N 0 0 1
L 0 0 1
L N H T
BP
H 0 0 1
N 0 0 1
L 0 0 1
L N H T
Disjuntive Rule (ORing)
h = < T=H AND BP = ?
OR
T=? AND BP = H >