concept_learning_presentation by tom m michael book.pptx

Concept Learning and the General-
to-Specific Ordering
You all are Encouraged to to study from the textbook referred or can
go to vtupulse.
Textbook: Tom Mitchell, ―Machine Learning, McGraw Hill, 3rd
Edition, 1997.

Concept Learning and General-to-Specific
Ordering
• Concept Learning Task
• Concept Learning as Search
• Find-S Algorithm
• Version Spaces & Candidate-Elimination
• Remarks on Version Spaces
• Inductive Bias

Concept Learning Task
• Goal: Identify the target concept from given
examples.
• Instances described by attributes (e.g., Sky,
Temp, etc.).
• Output: Hypothesis h that matches target
concept c.
• Challenge: Only have training examples, not all
instances.

Concept Learning Task
Example:

Inductive Learning Hypothesis
• If a hypothesis fits the training data well, it
will likely fit unseen data.
• Core assumption behind inductive learning.
• Cannot be proven universally — it's an
inductive bias.
• We can only guarantee:
• ℎ( )= ( ) for all training examples seen
𝑥 𝑐 𝑥
• For unseen examples, it’s a guess.

Concept Learning as Search
• Hypothesis space: All hypotheses expressible
in chosen representation.
• Learning = searching for the best hypothesis in
this space.
• Choice of representation defines what can be
learned.
• Example: EnjoySport task — small finite
hypothesis space (973 semantic hypotheses).

General-to-Specific Ordering
• Hypotheses can be arranged by how many
instances they accept.
• Most general: accepts all instances.
Example: <?,?,?,?,?,?>
• Most specific: accepts none.
Example: <Ø, Ø, Ø, Ø, Ø, Ø>
• Learning algorithms can navigate this
structure efficiently.

Example: Consider two hypothesis h1 and h2
For above example, we can say h2 is more
general than or equal to h1.

Find-S Algorithm
• Starts with most specific hypothesis.
• Generalizes to cover positive examples.
• Stops when all positive examples are covered.
• Limitation: Ignores negative examples.

Worked Example
Step 1: Initialize h0 to most specific hypothesis in H,
h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
Step 2 of Find-S Algorithm First iteration
h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
X1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Step 2 of Find-S Algorithm Second iteration
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
X2 = <Sunny, Warm, High, Strong, Warm, Same>
h2 = <Sunny, Warm, ?, Strong, Warm, Same>

Step 2 of Find-S Algorithm Third iteration
X3 = <Rainy, Cold, High, Strong, Warm, Change> (Negative) →
So,ignored
h3 = <Sunny, Warm, ?, Strong, Warm, Same> -> same as h2
Step 2 of Find-S Algorithm Fourth iteration
h3 = <Sunny, Warm, ?, Strong, Warm, Same>
X4 = <Sunny, Warm, High, Strong, Cool, Change>
h4 = <Sunny, Warm, ?, Strong, ?, ?>
Step 3
•The final maximally specific hypothesis is
<Sunny, Warm, ?, Strong, ?, ?>
•Reference Video:
FIND S Algorithm | Finding A Maximally Specific Hypothesis | Solved Example - 1 by M

Instance & Hypothesis Space Sizes
• |X| = 3×2×2×2×2×2 = 96 (instance space).
• Each hypothesis attribute can be: specific value, ‘?’,
or ‘Ø’.
• |H| (syntactic) = 5×4×4×4×4×4 = 5120.
• Semantic filtering (remove ‘Ø’ except for the all-
negative case) → 973 distinct hypotheses.

Concept Learning: Beyond Find-S

Limitations of Find-S Algorithm
• Considers only positive examples, ignores negative ones.
• Sensitive to noise – a single incorrect example can mislead the
hypothesis.
• Cannot handle incomplete or inconsistent data.
• Produces only the most specific hypothesis, does not represent
all possible consistent hypotheses.
• Assumes the target concept exists in the hypothesis space
(may fail if it doesn’t).

When is a Hypothesis Consistent?
A hypothesis is said to be consistent with the training examples if
it correctly classifies all the training examples.
That means:
● For every positive example, the hypothesis predicts positive.
● For every negative example, the hypothesis predicts negative.

● The Version Space is the set of all hypotheses that are
consistent with the training examples.
● It is bounded by the most specific hypothesis (S) and the
most general hypothesis (G).
● As more training examples are added, the version space
shrinks, narrowing down the set of possible target concepts.
Version Space

Idea: Start with all hypotheses in the hypothesis space, then eliminate those that
are inconsistent with the training examples.
Steps:
● Initialize the version space VS to contain all hypotheses in the hypothesis
space H.
● For each training example (x,c(x)):
○ Remove from VS any hypothesis h such that h(x)≠c(x)
● After processing all examples, the remaining hypotheses in VS are consistent
with the data.
Output:
● The version space containing all consistent hypotheses.
List-Then-Eliminate Algorithm

● The Candidate Elimination Algorithm finds the version space efficiently by
maintaining two boundary sets:
1. S (Specific boundary): The set of most specific hypotheses consistent with
the data.
2. G (General boundary): The set of most general hypotheses consistent with
the data.
Candidate Elimination Algorithm – Overview

● Process:
○ Initialize S to the most specific hypothesis and G to the most general hypothesis.
○ For each training example:
■ If the example is positive → generalize S just enough to include it.(Prune G)
■ If the example is negative → specialize G just enough to exclude it.(Prune S)
■ Remove hypotheses from S and G that are inconsistent with the training
data.
○ After all examples, the version space is represented by all hypotheses between S
and G.
● Key Point: Unlike Find-S, it uses both positive and negative examples and
represents all consistent hypotheses, not just one.
Candidate Elimination Algorithm – Overview

Worked Example
Dataset Attributes:
● Size {Big, Small}
∈
● Color {Red, Blue}
∈
● Shape {Circle, Triangle}
∈
Key Idea:
We will trace Specific (S) and
General (G) boundaries as
examples are processed.

● Initialization:
○ S0
=(ø, ø, ø) (most specific Boundary)
G0
=(?,?,?) (most general Boundary)
● After Example 1 (Negative):
○ S1
=(ø, ø, ø)
○ G1=(Small,?,?),(?,Blue,?),(?,?,Triangle)
○ S2=(ø, ø, ø)
○ G2 = (Small,Blue,?),(Small,?,Circle),(?,Blue,?),(Big,?,Triangle),(?,Blue,Triangle)

● After Example 3 (Positive):
○ S3=(Small,Red,Circle)
○ G3=(Small,?,Circle)
○ S4=(Small,Red,Circle)
● After Example 5 (Positive):
○ S5=(Small,?,Circle)
Learned Version Space by Candidate Elimination Algorithm for given data set is:
S: G: (Small, ?, Circle)
Reference link: Candidate Elimination Algorithm Solved Example - 2 - VTUPulse.com

Try at your own
You can refer the textbook 1 or can learn from VTU
Pulse website

Remarks on Version Spaces and Candidate-
Elimination
1.Will Candidate Elimination Converge?
● Candidate Elimination converges to the correct hypothesis if:
○ No errors in training examples.
○ Target concept is representable in hypothesis space H.
● As more examples arrive:
○ Version space shrinks, reducing ambiguity.
○ Learning is complete when S and G converge to a single identical hypothesis.
Explanation:
• If the training data has errors, the correct hypothesis may be eliminated.
• This can lead to an empty version space, meaning no consistent hypothesis exists.
• Empty space also happens when true concept isn’t expressible in H (e.g., when H only
supports conjunctions(and) but true concept is disjunctive(or).

2.What Training Example Should Be Requested Next?
● If learner can query for new examples, it should pick examples that help discriminate
among competing hypotheses.
● Good query: one that some hypotheses classify as positive and others as negative.
● Example: Instance (Sunny, Warm, Normal, Light, Warm, Same)
○ If labeled Positive → S expands.
○ If labeled Negative → G contracts.
Explanation (Simple):
● Best strategy: choose examples that split the version space roughly in half each time(half of
the hypotheses predict Positive and half predict Negative.).
● The correct target concept can therefore be found with only log2 VS
∣ ∣
● Similar to “20 Questions” game → each query narrows down the possible hypotheses
fastest.

Example: Query Strategy in Version Space
Step 0: Initial Version Space
∣VS =8 {h1,h2,h3,h4,h5,h6,h7,h8}
∣
Step 1: First Query
● Pick an instance where half predict Positive and half Negative.
● Suppose teacher says Positive → we eliminate all hypotheses that said
Negative.
● Remaining: VS =4 {h1,h2,h3,h4}
∣ ∣
Step 2: Second Query
● Again, choose an instance splitting 4 hypotheses into 2 Positive & 2 Negative.
● Teacher says Negative → eliminate the 2 Positive ones.
● Remaining: VS =2 {h3,h4}
∣ ∣

Step 3: Third Query
● Choose an instance that splits the 2 remaining hypotheses.
● Teacher says Positive → only 1 hypothesis remains.
● Remaining: VS =1 {h3}
∣ ∣
● (Target concept found )
Summary
● Started with 8 hypotheses.
● After 3 queries, found the target concept.
● Matches formula:
log⁡
2 VS
∣ ∣=log⁡
2 (8)=3

3.How Can Partially Learned Concepts Be Used?
● Even if version space has multiple hypotheses, we can sometimes classify
unseen instances confidently:
○ If all hypotheses agree Positive → classify as Positive.
○ If all hypotheses agree Negative → classify as Negative.
● Ambiguous cases = best candidates for next training examples (active
learning idea).

3.How Can Partially Learned Concepts Be Used?
Example:
• Instance A: all hypotheses → Positive → safe classification.
• Instance B: all hypotheses → Negative → safe classification.
• Instance C/D: mixed votes → uncertain; may use majority vote
Note: D can be classified as negative using majority of votes but C can’t be explicitly
classified as half of the version space classify this as positive and half as negative

Inductive Bias
1: What is Inductive Bias?
● Definition:
Inductive bias = assumptions made by a learner to generalize beyond training data.
● Without bias:
○ Any hypothesis may fit the data.
○ Learner cannot decide how to classify unseen examples.
● Key Question:
Should we make H (hypothesis space) very large (unbiased) or restrict it (biased)?
(Example: “Sky = Sunny OR Cloudy” can’t be represented in conjunctive H → shows
need for expressive H)

2: Biased vs. Unbiased Hypothesis Spaces
● Biased Hypothesis Space
○ Restricts forms of hypotheses (e.g., only conjunctions).
○ May fail if target concept is not representable in H.
● Unbiased Hypothesis Space (Power Set of X):
○ Can represent all possible concepts.
○ But → no generalization beyond training data.
○ S = disjunction of positives, G = negated disjunction of negatives → only training examples
classified.
● Example:
we present three positive examples (x1,x 2, x3) and two negative examples
(x4, x5) to the learner so S boundary will consists of {(x1 v x2 v x3)} and G boundary will
consists of
● Conclusion:
Too restrictive → miss target concept.
Too broad → cannot generalize.

3: The Futility of Bias-Free Learning
● Key Insight:
A completely unbiased learner has no rational basis for classifying unseen
data.
● Candidate-Elimination works only because of implicit bias:
○ Assumes the target concept is representable as conjunction of
attributes.
● If this assumption is wrong → misclassification is guaranteed.
● Therefore: Every inductive learner must employ some bias.

Deductive Inference
● General rule →
Specific case
● If premises are true,
conclusion is
guaranteed
● Example:
○ All humans are
mortal
○ Socrates is a
human
○
⇒ Socrates is
mortal
Inductive Bias
● Extra assumptions
that allow a learner’s
inductive guesses
to be justified as if
they were
deductive proofs
● Minimal set =
smallest
assumptions needed
● Example (Candidate
Elimination):
○ Bias = “Target
concept H”
∈
Inductive Inference
● Specific cases →
General rule
● Conclusion is
probable, not
guaranteed
● Example:
○ Socrates,
Plato, Aristotle
are mortal
○ ⇒ All humans
are mortal ?
4.Deduction vs Induction & Inductive Bias

5: Formal Definition of Inductive Bias
where the notation y |- z indicates that z follows deductively from y (i.e., that z is
provable from y).
Example with Candidate Elimination:
Bias B = “Target concept c H.”
∈
Training data D = set of labeled examples.
New instance x.
Then classification L(x,D) is provable if and only if all hypotheses in version
space agree.

6: Comparing Strength of Biases
● Rote-Learner:
○ No bias (only memorizes).
○ Classifies only seen examples.
● Candidate-Elimination:
○ Bias = “c H” (The target concept c is contained in the given hypothesis space H.)
∈
○ Classifies if all hypotheses in version space agree.
● Find-S:
○ Stronger bias:
■ Assumes “c H” (The target concept c is contained in the given hypothesis
∈
space H.)
■ Assumes examples not covered by hypothesis = negative.
● Observation:
○ Stronger bias → more generalization, but riskier if assumptions are wrong.

Thank You
For any queries you can reach out to me
Contact no. - 9162884594
Email - amresh@mitkundapura.com
Room no. - AD202(Temporary)

concept_learning_presentation by tom m michael book.pptx

More Related Content

Similar to concept_learning_presentation by tom m michael book.pptx

Recently uploaded

concept_learning_presentation by tom m michael book.pptx

Editor's Notes