2. CSEC-4142 & PHDCS-4 @2022
Machine Learning
• Machine learning is programming computers to optimize a performance criterion using example
data or past experience.
• There is no need to “learn” to calculate payroll
• Learning is used when:
Human expertise does not exist (navigating on Mars),
Humans are unable to explain their expertise (speech recognition)
Solution changes in time (routing on a computer network)
Solution needs to be adapted to particular cases (user biometrics)
2
3. CSEC-4142 & PHDCS-4 @2022
What to Learn
• Learning general models from a data of particular examples
• Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.
• Example in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers” (www.amazon.com)
The sales of diapers and beer were correlated on Friday evening (Walmart)
• Build a model that is a good and useful approximation to the data.
3
4. CSEC-4142 & PHDCS-4 @2022
well-posed Problem
• A learning problem is called well-posed if a solution to it exists, that solution is unique and
the solution depends on data/experience but is not sensitive to (reasonably small) change in
the data/experience.
• In general, to have a well-defined learning problem, we must identify these three features:
The class of tasks
The measure of performance to be improved
The source of experience
5. CSEC-4142 & PHDCS-4 @2022
Machine Learning Concept in Nutshell
• Machine learning is a subfield of Artificial intelligence (AI) which concerns with developing
computational theories of learning and building learning machine.
• Learning is the phenomenon or process which is concern with gaining of new symbolic
knowledge and development of cognitive skill through instruction and practice
• It is also discovery of new facts and theories through observations and experiments.
• Machine learning is programming computer to optimize a performance criteria using
example data or past experience.
• It is very hard to write program that solve problems like recognition of human face as we do
not know how our brain do it.
• Instead of writing a program by hand, it is possible to collect lots of example that specify
correct output for a given input.
6. CSEC-4142 & PHDCS-4 @2022
Concept of Machine Learning (cont.)
• A machine learning algorithm then takes these examples and produces a program that does the
job.
• Main goal of machine learning is to devise learning algorithm that do the learning
automatically without human intervention or assistance
• The machine learning paradigm can be viewed as ‘programming by example’
• Another goal is to develop computational models of human learning process and perform
computer simulation
• That is, the goal of machine learning is to build computer systems that can adapt and learn
from their experience
7. CSEC-4142 & PHDCS-4 @2022
Reason for using Machine Learning
• Machine learning algorithm can figure out how to perform important tasks by generalizing
from examples
• Machine learning algorithms discover the relationship between the variables of a system
(input, output and hidden) from direct samples of the system
• There are some real world problems, like recognizing person from voice, can not be defined
well.
• Relationship and correlation can be hidden within large amount of data
8. CSEC-4142 & PHDCS-4 @2022
Reason for using Machine Learning (cont.)
• To solve these problems, machine learning and data mining may be used to find the
relationships
• The amount of knowledge available about certain task might be too large for explicit
encoding by humans
• Environments changes time to time
• New knowledge about tasks is constantly being discovered
9. CSEC-4142 & PHDCS-4 @2022
Phases for Machine Learning
• Machine learning typically follows three phases:
1. Training: A training set of examples of correct behaviour is analysed and some
representation of the newly learnt knowledge is stored. This is some form of rules.
2. Validation: The rules are checked and if necessary, additional training is given.
Sometimes additional test data is used. A human expert or automatic knowledge based
components may be used to validate the rules. The role of the tester is often called
opponent.
3. Application: The rules are used in responding to some new situations.
10. CSEC-4142 & PHDCS-4 @2022
Designing a learning system
1. Data: Choose the training experience
D = {d1 , d2 ,.., dn }
2. Feature Selection
• Features depend on the problem. Measure ‘relevant’ quantities.
• Some techniques available to extract ‘more relevant’ quantities
from the initial measurements. (e.g., PCA)
• After feature extraction each pattern is a vector
2. Model selection:
(a) Select a model or a set of models (with parameters)
E.g. y = ax + b + ε where ε=N(0,σ)
It determines exactly what type of knowledge will be learned and
how this will be used by the performance program
11. CSEC-4142 & PHDCS-4 @2022
Designing a learning system (cont.)
(b) Select the error function to be optimized, e.g.,
1
𝑛𝑛
�
𝑖𝑖=1
𝑛𝑛
𝑦𝑦𝑖𝑖 − 𝑓𝑓(𝑥𝑥𝑖𝑖) 2
3. Learning:
Find the set of parameters optimizing the error function
– The model and parameters with the smallest error
4. Application (Evaluation):
• Apply the learned model
– E.g. predict y’s for new inputs x using learned f ( x )
12. CSEC-4142 & PHDCS-4 @2022
Concept Learning
• Inducing general functions from specific training examples is a main issue of machine
learning
• Acquiring the definition of a general category from given sample positive and sample
negative training examples of the category is know as concept learning
• A machine learning hypothesis is a candidate model that approximates a target function for
mapping inputs to outputs.
• The hypothesis space has a general to specific ordering of hypothesis and the search can be
efficiently organized by taking advantage of a naturally occurring structure over the
hypothesis space
13. CSEC-4142 & PHDCS-4 @2022
Concept Learning (cont.)
• Concept learning is formally defined as the ‘Inferring of a Boolean valued function from
training examples of its input and output’
• Concept learning involves determining a mapping from a set of input variables to a Boolean
variable. Such methods are known as inductive learning method.
• If a function can be found which maps training data to correct classifications, then it will also
work well for unseen data. This process is known as generalization
14. CSEC-4142 & PHDCS-4 @2022
A concept learning task
• An example for concept-learning is the learning of Enjoy-Sports from the given examples of
positive examples and negative examples
• We are trying to learn the definition of a concept from given examples.
Table. Enjoy Sports Training Examples
15. CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• A set of example days, and each is described by six attributes.
• The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its
attribute values.
• Each hypothesis consists of a conjunction of constraints on the instance attributes.
• A hypothesis is a vector of constraints for each attribute
16. CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• In this example, each hypothesis will be a vector of six constraints, specifying the values of
the six attributes – (Sky, AirTemp, Humidity, Wind, Water, and Forecast).
Indicate by a ? That any value is acceptable for this attribute
Specify a single required value for the attribute
Indicate by Φ that no value is acceptable
• The most general hypothesis – that every day is a positive example <?,?,?,?,?,?>
17. CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• The most specific hypothesis – that no day is a positive example
<Φ, Φ, Φ, Φ, Φ, Φ>
• If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a
positive example (h(x)=1)
• To illustrate that A enjoys his favourite sport only on cold day with high humidity
(independent of the other attributes) is represented by the expression
<?, Cold, High, ?, ?, ?>
• EnjoySport concept learning task requires learning the sets of days for which
EnjoySport=yes, describing this set by a conjunction of constraints over the instance
attributes.
18. CSEC-4142 & PHDCS-4 @2022
A concept learning task
• Given
– Instances X : set of all possible days, each described by the attributes
Sky – (values: Sunny, Cloudy, Rainy)
AirTemp – (values: Warm, Cold)
Humidity – (values: Normal, High)
Wind – (values: Strong, Weak)
Water – (values: Warm, Cold)
Forecast – (values: Same, Change)
– Target Concept (Function) c : EnjoySport : X → {0,1}
– Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
– Training Examples D : positive and negative examples of the target function
Determine
– A hypothesis h in H such that h(x) = c(x) for all x in D.
19. CSEC-4142 & PHDCS-4 @2022
Inductive learning hypothesis
• Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples
• Although the learning task is to determine a hypothesis (h) identical to the target concept
cover the entire set of instances (X), the only information available about c is its value over
the training examples
• Inductive learning algorithms can at best guarantee that the output hypothesis fits the target
concepts over the training data
• Lacking any further information, our assumption is that the best hypothesis regarding unseen
instances is the hypothesis that best fits the observed training data. This is fundamental
assumption of inductive learning
20. CSEC-4142 & PHDCS-4 @2022
Concept learning as search
• Concept learning can be viewed as the task of searching through a large space of hypothesis
implicitly defined by the hypothesis representation
• The goal of this search is to find the hypothesis that best fits the training examples.
• By selecting a hypothesis representation, the designer of the learning algorithm implicitly
defines the space of all hypothesis that the program can ever represent and therefore can ever
learn
21. CSEC-4142 & PHDCS-4 @2022
Enjoy Sport - Hypothesis Space
• Sky has 3 possible values, and other 5 attributes have 2 possible values.
• There are 96 (= 3.2.2.2.2.2) distinct instances in X.
• A similar calculation shows that there are 5120 (=5.4.4.4.4.4) syntactically distinct
hypotheses in H. – Two more values for attributes: ? and Φ
• However, every hypothesis containing one or more Φ symbols represents the empty set of
instances; that is, it classifies every instance as negative.
22. CSEC-4142 & PHDCS-4 @2022
Enjoy Sport - Hypothesis Space (cont.)
• Therefore, the number of semantically distinct hypotheses is only 973 (= 1 + 4.3.3.3.3.3).
• The EnjoySport is a very simple learning task having small, finite hypothesis space
• Most practical learning tasks have much larger (even infinite) hypothesis spaces.
23. CSEC-4142 & PHDCS-4 @2022
General-to-Specific Ordering of Hypotheses
• Many algorithms for concept learning organize the search through the hypothesis space by
relying on a general-to-specific ordering of hypotheses.
• By taking advantage of this naturally occurring structure over the hypothesis space, we can
design learning algorithms that exhaustively search even infinite hypothesis spaces without
explicitly enumerating every hypothesis.
• Consider two hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
24. CSEC-4142 & PHDCS-4 @2022
General-to-Specific Ordering of Hypotheses
(cont.)
• Now consider the sets of instances that are classified positive by h1 and by h2.
Because h2 imposes fewer constraints on the instance, it classifies more instances as
positive.
In fact, any instance classified positive by h1 will also be classified positive by h2.
Therefore, we say that h2 is more general than h1.
25. CSEC-4142 & PHDCS-4 @2022
More-General-Than Relation
• For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if
h(x) = 1.
• More-General-Than-Or-Equal Relation:
Let h1 and h2 be two Boolean-valued functions defined over X. Then h1 is more-
general-than-or-equal-to h2 (written h1 ≥ h2)
if and only if any instance that satisfies h2 also satisfies h1.
i.e., ∀𝑥𝑥 ∈ 𝑋𝑋 [ ℎ1 𝑥𝑥 = 1 → ℎ2 𝑥𝑥 = 1 ]
• h1 is more-general-than h2 ( h1 > h2) if and only if h1≥h2 is true and h2≥h1 is false.
• We also say h2 is more-specific-than h1.
26. CSEC-4142 & PHDCS-4 @2022
More-General-Relation
• Instances, hypotheses and the
more_general_than relation
• The box on the left represents the set
X of all instances
• The box on the right is the set of all
hypotheses H
• Each hypothesis corresponds to some
subset of X-the subset of instances
that is classified positive
27. CSEC-4142 & PHDCS-4 @2022
More-General-Relation (cont.)
• The arrows connecting hypotheses represents more_general_than relation with the arrow
pointing toward the less general hypothesis.
• Note that subset of instances characterize by ℎ2 subsumes the subset characterize by ℎ1, so ℎ2
is more_general_than ℎ1
• But there is no more-general relation between ℎ1 and ℎ3
28. CSEC-4142 & PHDCS-4 @2022
Find-S: Finding a Maximally Specific Hypothesis
• In Find-S algorithm, the ‘more_general_than’ partial ordering is used to organize a search
for a hypothesis consistent with the observed training examples.
• The algorithm begins with the most specific possible hypothesis in H.
• Then generalize this hypothesis each time it fails to cover an observed positive training
example, we say that a hypothesis ‘covers’ a positive example if it correctly classifies the
example as positive
29. CSEC-4142 & PHDCS-4 @2022
Find-S Algorithm
• The algorithm is given below
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
IF the constraint ai in h is satisfied by x
THEN do nothing
ELSE replace ai in h by next more general constraint satisfied by x
3. Output hypothesis h
30. CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• FIND-S algorithm ignores negative examples.
As long as the hypothesis space contains a hypothesis that describes the true target
concept, and the training data contains no errors, ignoring negative examples does
not cause to any problem.
• FIND-S algorithm finds the most specific hypothesis within H that is consistent with the
positive training examples.
The final hypothesis will also be consistent with negative examples if the correct
target concept is in H, and the training examples are correct.
To illustrate this algorithm, assume the learner is given the sequence of training examples
from the EnjoySport example
31. CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
The first step of Find-S is to initialize h to the most specific hypothesis in H
h←<Φ, Φ, Φ, Φ, Φ, Φ>
Upon observing the first training example, which happens to be positive example, it
becomes clear that our hypothesis become too specific, so it is replace by next more
general constraint that fits the example
h ← <Sunny, Warm, Normal, Strong, Warm, Same>
32. CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• The second training example, which is also a positive example, forced the algorithm to
further generalization of h, this time substituting a “?” in place of any attribute value in h
that is not satisfied by the new example.
• The redefined hypothesis is:
h ← <Sunny, Warm, ?, Strong, Warm, Same>
33. CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• Upon encountering the third training example, in this case a negative example, the algorithm
makes no change to h
• Note that in the current case, our hypothesis is still consistent with the training example, it is
always the case if the training data is correct.
• To complete our trace of FIND-S, the fourth positive example lead to a further generalization
of h
h ← <Sunny, Warm, ?, Strong, ?, ?>
35. CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm
• Has FIND-S converged to the correct target concept?
Although FIND-S will find a hypothesis consistent with the training data, it has no way to
determine whether it has found the only hypothesis in H consistent with the data (i.e., the
correct target concept), or whether there are many other consistent hypotheses as well.
We would prefer a learning algorithm that could determine whether it had converged and,
if not, at least characterize its uncertainty regarding the true identity of the target concept.
36. CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• Why prefer the most specific hypothesis?
In case there are multiple hypotheses consistent with the training examples, FIND-S will
find the most specific.
It is unclear whether we should prefer this hypothesis over, say, the most general, or some
other hypothesis of intermediate generality.
37. CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• Are the training examples consistent?
In most practical learning problems there is some chance that the training examples will
contain at least some errors or noise.
Such inconsistent sets of training examples can severely mislead FIND-S, given the fact
that it ignores negative examples.
We would prefer an algorithm that could at least detect when the training data is
inconsistent and, preferably, accommodate such errors.
38. CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• What if there are several maximally specific consistent hypotheses?
In the hypothesis language H for the EnjoySport task, there is always a unique, most
specific hypothesis consistent with any set of positive examples.
However, for other hypothesis spaces there can be several maximally specific hypotheses
consistent with the data.
In this case, FIND-S must be extended to allow it to backtrack on its choices of how to
generalize the hypothesis, to accommodate the possibility that the target concept lies
along a different branch of the partial ordering than the branch it has selected.
39. CSEC-4142 & PHDCS-4 @2022
Summary: Find-S
• Advantages:
It is simple
Outcome is independent of order of examples
• Alternative overcomes these problems
Keep all consistent hypotheses!
o Candidate elimination algorithm
40. CSEC-4142 & PHDCS-4 @2022
Summary: Find-S (cont.)
• Drawbacks:
Throws away information!
oNegative examples
Can not tell whether it has learned the concept
oDepending on H, there might be several h’s that fit Training Examples!
oPicks a maximally specific h
Can not tell when training data is inconsistent
oSince ignores negative Training Examples
41. CSEC-4142 & PHDCS-4 @2022
Consistent Hypotheses and Version Space
• A hypothesis h is consistent with a set of training examples D of target concept c
if h(x) = c(x) for each training example 〈x, c(x)〉 in D
Note that consistency is with respect to specific D.
• Notation:
Consistent (h, D) ≡ ∀〈x, c(x)〉∈D :: h(x) = c(x)
• The version space, VSH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with D
• Notation:
VSH,D = {h | h ∈ H ∧ Consistent (h, D)}
42. CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm
• List-Then-Eliminate algorithm initializes the version space to contain all hypotheses in H,
then eliminates any hypothesis found inconsistent with any training example.
• The version space of candidate hypotheses thus shrinks as more examples are observed, until
ideally just one hypothesis remains that is consistent with all the observed examples.
Presumably, this is the desired target concept.
If insufficient data is available to narrow the version space to a single hypothesis, then the
algorithm can output the entire set of hypotheses consistent with the observed data.
43. CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm (cont.)
• List-Then-Eliminate algorithm can be applied whenever the hypothesis space H is finite.
• It has many advantages, including the fact that it is guaranteed to output all hypotheses
consistent with the training data.
• Unfortunately, it requires exhaustively enumerating all hypotheses in H - an unrealistic
requirement for all but the most trivial hypothesis spaces.
44. CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm (cont.)
1. VersionSpace ← list of all hypotheses in H
2. For each training example 〈x, c(x)〉
remove from VersionSpace any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses in VersionSpace
4. This is essentially a brute force procedure
45. CSEC-4142 & PHDCS-4 @2022
Example of Find-S, Revisited
x1=〈Sunny Warm Normal Strong Warm Same〉 +
x2=〈Sunny Warm High Strong Warm Same〉 +
x3=〈Rainy Cold High Strong Warm Change〉 −
x3=〈Sunny Warm High Strong Cool Change〉 +
specific
general
h0=〈∅ ∅ ∅ ∅ ∅ ∅〉
h1=〈Sunny Warm Normal Strong Warm Same〉
h2=〈Sunny Warm ? Strong Warm Same〉
h3=〈Sunny Warm ? Strong Warm Same〉
h4=〈Sunny Warm ? Strong ? ?〉
Instances X Hypotheses H
46. CSEC-4142 & PHDCS-4 @2022
Version Space for this Example
〈Sunny Warm ? Strong ? ?〉
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
〈Sunny ? ? ? ? ?〉 〈? Warm ? ? ? ?〉
{ , }
G
{ }
S
• A version space with its general and specific boundary sets. The version space includes
all six hypotheses shown here
• It can be represented more simply by S and G. Arrows indicate instances of the more-
general-than relation.
• This is the version space for the Enjoysport concept learning problem and training
examples described above
47. CSEC-4142 & PHDCS-4 @2022
Representing Version Spaces
• Want more compact representation of VS
Store most/least general boundaries of space
Generate all intermediate h’s in VS
Idea that any h in VS must be consistent with all Training Examples (TEs)
o Generalize from most specific boundaries
o Specialize from most general boundaries
48. CSEC-4142 & PHDCS-4 @2022
Representing Version Spaces (cont.)
• The general boundary, G, of version space VSH,D is the set of its maximally general members
consistent with D
Summarizes the negative examples; anything more general will cover a negative TE
• The specific boundary, S, of version space VSH,D is the set of its maximally specific
members consistent with D
Summarizes the positive examples; anything more specific will fail to cover a positive
TE
49. CSEC-4142 & PHDCS-4 @2022
Theorem
Theorem: Every member of the version space lies between the S,G boundary
VSH,D = {h | h ∈ H ∧ ∃s∈S ∃g∈G (g ≥ h ≥ s)}
• Must prove:
1) every h satisfying RHS is in VSH,D;
2) every member of VSH,D satisfies RHS.
• For 1), let g, h, s be arbitrary members of G, H, S respectively with g>h>s
s must be satisfied by all positive (+) TEs and so must h because it is more general;
g cannot be satisfied by any negative (–) TEs, and so nor can h
h is in VSH,D since satisfied by all positive (+) TEs and no negative (–) TEs
• For 2),
Since h satisfies all positive (+) TEs and no negative (–) TEs, h ≥ s, and g ≥ h.
50. CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm
• The Candidate-Elimination algorithm computes the version space containing all hypotheses
from H that are consistent with an observed sequence of training examples.
• It begins by initializing the version space to the set of all hypotheses in H; that is, by
initializing the G boundary set to contain the most general hypothesis in H
G0 ← { <?, ?, ?, ?, ?, ?> }
• and initializing the S boundary set to contain the most specific hypothesis
S0 ← { <0, 0, 0, 0, 0, 0> }
51. CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
• These two boundary sets delimit the entire hypothesis space, because every other
hypothesis in H is both more general than S0 and more specific than G0.
• As each training example is considered, the S and G boundary sets are generalized and
specialized, respectively, to eliminate from the version space any hypotheses found
inconsistent with the new training example.
• After all examples have been processed, the computed version space contains all the
hypotheses of hypothesis space H consistent with these examples
52. CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
Initialization
G ← maximally general hypotheses in H
S ← maximally specific hypotheses in H
For each training example d, do
• If d is positive
Remove from G every hypothesis inconsistent with d
For each hypothesis s in S that is inconsistent with d
oRemove s from S
oAdd to S all minimal generalizations h of s such that
1. h is consistent with d, and
2. some member of G is more general than h
Remove from S every hypothesis that is more general than another hypothesis in S
53. CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
• If d is a negative example
Remove from S every hypothesis inconsistent with d
For each hypothesis g in G that is inconsistent with d
o Remove g from G
o Add to G all minimal specializations h of g such that
1. h is consistent with d, and
2. some member of S is more specific than h
Remove from G every hypothesis that is less general than another hypothesis in G
• Essentially use
Positive TEs to generalize S
Negative TEs to specialize G
• Independent of order of TEs
• Convergence guaranteed if:
No errors
There is h in H describing c.
54. CSEC-4142 & PHDCS-4 @2022
Example
S0
G0 {〈? ? ? ? ? ?〉}
G1 {〈? ? ? ? ? ?〉}
{〈∅ ∅ ∅ ∅ ∅ ∅〉}
S1 {〈Sunny Warm Normal Strong Warm Same〉}
〈Sunny Warm Normal Strong Warm Same〉 +
〈Sunny Warm High Strong Warm Same〉 +
S2 {〈Sunny Warm ? Strong Warm Same〉} G2 {〈? ? ? ? ? ?〉}
〈Rainy Cold High Strong Warm Change〉 −
{〈Sunny Warm ? Strong Warm Same〉}
S3 〈Sunny ? ? ? ? ?〉
{ }
, 〈? Warm ? ? ? ?〉, 〈? ? ? ? ? Same〉
G3
Current G boundary is incorrect
So, need to make it more specific.
55. CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• Given that there are six attributes that could be specified to specialize G2, there only three
new hypotheses in G3
• For example, the hypothesis h = <?, ?, Normal, ?, ?, ?> is a minimal specialization of G2
that correctly labels the new example as a negative example, but it is not included in G3.
The reason this hypothesis is excluded is that it is inconsistent with S2.
The algorithm determines this simply by noting that h is not more general than the
current specific boundary, S2.
56. CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• In fact, the S boundary of the version space forms a summary of the previously encountered
positive examples that can be used to determine whether any given hypothesis is consistent
with these examples.
• The G boundary summarizes the information from previously encountered negative
examples. Any hypothesis more specific than G is assured to be consistent with past negative
examples
58. CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• The fourth training example further generalizes the S boundary of the version space.
It also results in removing one member of the G boundary, because this member fails to
cover the new positive example.
To understand the rationale for this step, it is useful to consider why the offending
hypothesis must be removed from G.
59. CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• Notice it cannot be specialized, because specializing it would not make it cover the new
example.
• It also cannot be generalized, because by the definition of G, any more general hypothesis
will cover at least one negative training example.
• Therefore, the hypothesis must be dropped from the G boundary, thereby removing an entire
branch of the partial ordering from the version space of hypotheses remaining under
consideration
60. CSEC-4142 & PHDCS-4 @2022
Version Space of the Example
60
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
version
space
S
G
61. CSEC-4142 & PHDCS-4 @2022
Convergence of algorithm
• Convergence guaranteed if:
No errors
There is h in H describing c.
• Ambiguity removed from VS when S = G
Containing single h
When have seen enough TEs
• If have false negative TE, algorithm will remove every h consistent with TE, and hence
will remove correct target concept from VS
If observe enough TEs will find that S, G boundaries converge to empty VS
62. CSEC-4142 & PHDCS-4 @2022
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
Which Next Training Example?
Assume learner can choose the next TE
Should choose d such that
Reduces maximally the number of
hypotheses in VS
Best TE: satisfies precisely 50% hypotheses;
o Cannot always be done, if possible
correct target concept can be achieved
only log2 |𝑉𝑉𝑉𝑉| experiments
o Example: 〈Sunny Warm Normal Weak
Warm Same〉 ?
o Satisfies by only 3 hypotheses, i.e. 50%
of the total hypotheses
o If positive, generalizes S ; If negative,
specializes G
• Order of examples matters for intermediate
sizes of S,G; not for the final S, G
63. CSEC-4142 & PHDCS-4 @2022
Classifying new cases using VS
• Use voting procedure on following examples:
〈Sunny Warm Normal Strong Cool
Change〉 - (+ by all) – Classified as positive
with confidence. It is possible when all the
hypotheses of S satisfy the new instance
〈Rainy Cool Normal Weak Warm Same〉 - (-
by all) – Classified as negative with
confidence. It is possible when the new
instance is not satisfies by any hypothesis of
G.
〈Sunny Warm Normal Weak Warm Same〉 -
(+ by 3 and – by 3) – need for TEs, can not
decide
〈Sunny Cold Normal Strong Warm Same〉 -
(+ by 2 and – by 4) – Classified as negative
with 67% confidence
〈Sunny ? ? Strong ? ?〉〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
64. CSEC-4142 & PHDCS-4 @2022
Effect of incomplete hypothesis space
• Preceding algorithms work if target function is in H
Will generally not work if target function not in H
• Consider following examples which represent target function
“sky = sunny or sky = cloudy”:
〈Sunny Warm Normal Strong Cool Change〉 Y
〈Cloudy Warm Normal Strong Cool Change〉 Y
〈〈Rainy Warm Normal Strong Cool Change〉 N
65. CSEC-4142 & PHDCS-4 @2022
Effect of incomplete hypothesis space (cont.)
• If apply Candidate Elimination algorithm as before, end up with empty VS
After first two TEs, S= 〈? Warm Normal Strong Cool Change〉
New hypothesis is overly general
oit covers the third negative TE!
• Our H does not include the appropriate c
• Need more expressive hypotheses
66. CSEC-4142 & PHDCS-4 @2022
Incomplete hypothesis space
• If c not in H, then consider generalizing representation of H to contain c
For example, add disjunctions or negations to representation of hypotheses in H
• One way to avoid problem is to allow all possible representations of h’s
Equivalent to allowing all possible subsets of instances as defining the concept of
EnjoySport
oRecall that there are 96 instances in EnjoySport; hence there are 296 ≈ 1028 possible
hypotheses in full space H
oCan do this by using full propositional calculus with AND, OR, NOT
oHence H defined only by conjunctions of attributes is biased (containing only
973 h’s)
67. CSEC-4142 & PHDCS-4 @2022
Unbiased Learners and Inductive Bias
• BUT if have no limits on representation of hypotheses
(i.e., full logical representation: and, or, not), can only learn examples…no generalization
possible!
Say have 5 TEs {x1, x2, x3, x4, x5}, with x4, x5 negative TEs
• Apply Candidate Elimination algorithm
S will be disjunction of positive examples (S={x1 OR x2 OR x3})
G will be negation of disjunction of negative examples (G={not (x4 or x5)})
Need to use all instances to learn the concept!
68. CSEC-4142 & PHDCS-4 @2022
Unbiased Learners and Inductive Bias
• “A learner that makes no a priori assumptions regarding the identity of the target concept has
no rational basis for classifying any unseen instances”
• Cannot predict usefully:
TEs have unanimous vote
other h’s have 50/50 vote!
oFor every h in H that predicts +, there is another that predicts -
• Approach:
Place constraints on representation of hypotheses
oExample of limiting connectives to conjunctions
oAllows learning of generalized hypotheses
oIntroduces bias that depends on hypothesis representation
69. CSEC-4142 & PHDCS-4 @2022
Inductive System and Equivalent Deductive System
• Inductive bias (IB) of learning algorithm L is any minimal set of assertions B such that for any
target concept c and training examples D, we can logically infer value c(x) of any instance x
from B, D, and x
• L (x, D) = k implies that all members of VS_HD, including c, vote for class k (unanimous
voting). Therefore: c(x) = k = L( i, D ).
• This means, that the output of the learner L(x, D) can be logically deduced from B ∧ D ∧ x