SlideShare a Scribd company logo
1 of 69
Download to read offline
Lecture - 2
CSEE-4142 & PHDCS-834: Machine Learning
CSEC-4142 & PHDCS-4 @2022
Machine Learning
• Machine learning is programming computers to optimize a performance criterion using example
data or past experience.
• There is no need to “learn” to calculate payroll
• Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)
2
CSEC-4142 & PHDCS-4 @2022
What to Learn
• Learning general models from a data of particular examples
• Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.
• Example in retail: Customer transactions to consumer behavior:
 People who bought “Blink” also bought “Outliers” (www.amazon.com)
 The sales of diapers and beer were correlated on Friday evening (Walmart)
• Build a model that is a good and useful approximation to the data.
3
CSEC-4142 & PHDCS-4 @2022
well-posed Problem
• A learning problem is called well-posed if a solution to it exists, that solution is unique and
the solution depends on data/experience but is not sensitive to (reasonably small) change in
the data/experience.
• In general, to have a well-defined learning problem, we must identify these three features:
 The class of tasks
 The measure of performance to be improved
 The source of experience
CSEC-4142 & PHDCS-4 @2022
Machine Learning Concept in Nutshell
• Machine learning is a subfield of Artificial intelligence (AI) which concerns with developing
computational theories of learning and building learning machine.
• Learning is the phenomenon or process which is concern with gaining of new symbolic
knowledge and development of cognitive skill through instruction and practice
• It is also discovery of new facts and theories through observations and experiments.
• Machine learning is programming computer to optimize a performance criteria using
example data or past experience.
• It is very hard to write program that solve problems like recognition of human face as we do
not know how our brain do it.
• Instead of writing a program by hand, it is possible to collect lots of example that specify
correct output for a given input.
CSEC-4142 & PHDCS-4 @2022
Concept of Machine Learning (cont.)
• A machine learning algorithm then takes these examples and produces a program that does the
job.
• Main goal of machine learning is to devise learning algorithm that do the learning
automatically without human intervention or assistance
• The machine learning paradigm can be viewed as ‘programming by example’
• Another goal is to develop computational models of human learning process and perform
computer simulation
• That is, the goal of machine learning is to build computer systems that can adapt and learn
from their experience
CSEC-4142 & PHDCS-4 @2022
Reason for using Machine Learning
• Machine learning algorithm can figure out how to perform important tasks by generalizing
from examples
• Machine learning algorithms discover the relationship between the variables of a system
(input, output and hidden) from direct samples of the system
• There are some real world problems, like recognizing person from voice, can not be defined
well.
• Relationship and correlation can be hidden within large amount of data
CSEC-4142 & PHDCS-4 @2022
Reason for using Machine Learning (cont.)
• To solve these problems, machine learning and data mining may be used to find the
relationships
• The amount of knowledge available about certain task might be too large for explicit
encoding by humans
• Environments changes time to time
• New knowledge about tasks is constantly being discovered
CSEC-4142 & PHDCS-4 @2022
Phases for Machine Learning
• Machine learning typically follows three phases:
1. Training: A training set of examples of correct behaviour is analysed and some
representation of the newly learnt knowledge is stored. This is some form of rules.
2. Validation: The rules are checked and if necessary, additional training is given.
Sometimes additional test data is used. A human expert or automatic knowledge based
components may be used to validate the rules. The role of the tester is often called
opponent.
3. Application: The rules are used in responding to some new situations.
CSEC-4142 & PHDCS-4 @2022
Designing a learning system
1. Data: Choose the training experience
D = {d1 , d2 ,.., dn }
2. Feature Selection
• Features depend on the problem. Measure ‘relevant’ quantities.
• Some techniques available to extract ‘more relevant’ quantities
from the initial measurements. (e.g., PCA)
• After feature extraction each pattern is a vector
2. Model selection:
(a) Select a model or a set of models (with parameters)
E.g. y = ax + b + ε where ε=N(0,σ)
It determines exactly what type of knowledge will be learned and
how this will be used by the performance program
CSEC-4142 & PHDCS-4 @2022
Designing a learning system (cont.)
(b) Select the error function to be optimized, e.g.,
1
𝑛𝑛
�
𝑖𝑖=1
𝑛𝑛
𝑦𝑦𝑖𝑖 − 𝑓𝑓(𝑥𝑥𝑖𝑖) 2
3. Learning:
Find the set of parameters optimizing the error function
– The model and parameters with the smallest error
4. Application (Evaluation):
• Apply the learned model
– E.g. predict y’s for new inputs x using learned f ( x )
CSEC-4142 & PHDCS-4 @2022
Concept Learning
• Inducing general functions from specific training examples is a main issue of machine
learning
• Acquiring the definition of a general category from given sample positive and sample
negative training examples of the category is know as concept learning
• A machine learning hypothesis is a candidate model that approximates a target function for
mapping inputs to outputs.
• The hypothesis space has a general to specific ordering of hypothesis and the search can be
efficiently organized by taking advantage of a naturally occurring structure over the
hypothesis space
CSEC-4142 & PHDCS-4 @2022
Concept Learning (cont.)
• Concept learning is formally defined as the ‘Inferring of a Boolean valued function from
training examples of its input and output’
• Concept learning involves determining a mapping from a set of input variables to a Boolean
variable. Such methods are known as inductive learning method.
• If a function can be found which maps training data to correct classifications, then it will also
work well for unseen data. This process is known as generalization
CSEC-4142 & PHDCS-4 @2022
A concept learning task
• An example for concept-learning is the learning of Enjoy-Sports from the given examples of
positive examples and negative examples
• We are trying to learn the definition of a concept from given examples.
Table. Enjoy Sports Training Examples
CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• A set of example days, and each is described by six attributes.
• The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its
attribute values.
• Each hypothesis consists of a conjunction of constraints on the instance attributes.
• A hypothesis is a vector of constraints for each attribute
CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• In this example, each hypothesis will be a vector of six constraints, specifying the values of
the six attributes – (Sky, AirTemp, Humidity, Wind, Water, and Forecast).
 Indicate by a ? That any value is acceptable for this attribute
 Specify a single required value for the attribute
 Indicate by Φ that no value is acceptable
• The most general hypothesis – that every day is a positive example <?,?,?,?,?,?>
CSEC-4142 & PHDCS-4 @2022
A concept learning task (cont.)
• The most specific hypothesis – that no day is a positive example
<Φ, Φ, Φ, Φ, Φ, Φ>
• If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a
positive example (h(x)=1)
• To illustrate that A enjoys his favourite sport only on cold day with high humidity
(independent of the other attributes) is represented by the expression
<?, Cold, High, ?, ?, ?>
• EnjoySport concept learning task requires learning the sets of days for which
EnjoySport=yes, describing this set by a conjunction of constraints over the instance
attributes.
CSEC-4142 & PHDCS-4 @2022
A concept learning task
• Given
– Instances X : set of all possible days, each described by the attributes
Sky – (values: Sunny, Cloudy, Rainy)
AirTemp – (values: Warm, Cold)
Humidity – (values: Normal, High)
Wind – (values: Strong, Weak)
Water – (values: Warm, Cold)
Forecast – (values: Same, Change)
– Target Concept (Function) c : EnjoySport : X → {0,1}
– Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
– Training Examples D : positive and negative examples of the target function
Determine
– A hypothesis h in H such that h(x) = c(x) for all x in D.
CSEC-4142 & PHDCS-4 @2022
Inductive learning hypothesis
• Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples
• Although the learning task is to determine a hypothesis (h) identical to the target concept
cover the entire set of instances (X), the only information available about c is its value over
the training examples
• Inductive learning algorithms can at best guarantee that the output hypothesis fits the target
concepts over the training data
• Lacking any further information, our assumption is that the best hypothesis regarding unseen
instances is the hypothesis that best fits the observed training data. This is fundamental
assumption of inductive learning
CSEC-4142 & PHDCS-4 @2022
Concept learning as search
• Concept learning can be viewed as the task of searching through a large space of hypothesis
implicitly defined by the hypothesis representation
• The goal of this search is to find the hypothesis that best fits the training examples.
• By selecting a hypothesis representation, the designer of the learning algorithm implicitly
defines the space of all hypothesis that the program can ever represent and therefore can ever
learn
CSEC-4142 & PHDCS-4 @2022
Enjoy Sport - Hypothesis Space
• Sky has 3 possible values, and other 5 attributes have 2 possible values.
• There are 96 (= 3.2.2.2.2.2) distinct instances in X.
• A similar calculation shows that there are 5120 (=5.4.4.4.4.4) syntactically distinct
hypotheses in H. – Two more values for attributes: ? and Φ
• However, every hypothesis containing one or more Φ symbols represents the empty set of
instances; that is, it classifies every instance as negative.
CSEC-4142 & PHDCS-4 @2022
Enjoy Sport - Hypothesis Space (cont.)
• Therefore, the number of semantically distinct hypotheses is only 973 (= 1 + 4.3.3.3.3.3).
• The EnjoySport is a very simple learning task having small, finite hypothesis space
• Most practical learning tasks have much larger (even infinite) hypothesis spaces.
CSEC-4142 & PHDCS-4 @2022
General-to-Specific Ordering of Hypotheses
• Many algorithms for concept learning organize the search through the hypothesis space by
relying on a general-to-specific ordering of hypotheses.
• By taking advantage of this naturally occurring structure over the hypothesis space, we can
design learning algorithms that exhaustively search even infinite hypothesis spaces without
explicitly enumerating every hypothesis.
• Consider two hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
CSEC-4142 & PHDCS-4 @2022
General-to-Specific Ordering of Hypotheses
(cont.)
• Now consider the sets of instances that are classified positive by h1 and by h2.
 Because h2 imposes fewer constraints on the instance, it classifies more instances as
positive.
 In fact, any instance classified positive by h1 will also be classified positive by h2.
 Therefore, we say that h2 is more general than h1.
CSEC-4142 & PHDCS-4 @2022
More-General-Than Relation
• For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if
h(x) = 1.
• More-General-Than-Or-Equal Relation:
Let h1 and h2 be two Boolean-valued functions defined over X. Then h1 is more-
general-than-or-equal-to h2 (written h1 ≥ h2)
if and only if any instance that satisfies h2 also satisfies h1.
i.e., ∀𝑥𝑥 ∈ 𝑋𝑋 [ ℎ1 𝑥𝑥 = 1 → ℎ2 𝑥𝑥 = 1 ]
• h1 is more-general-than h2 ( h1 > h2) if and only if h1≥h2 is true and h2≥h1 is false.
• We also say h2 is more-specific-than h1.
CSEC-4142 & PHDCS-4 @2022
More-General-Relation
• Instances, hypotheses and the
more_general_than relation
• The box on the left represents the set
X of all instances
• The box on the right is the set of all
hypotheses H
• Each hypothesis corresponds to some
subset of X-the subset of instances
that is classified positive
CSEC-4142 & PHDCS-4 @2022
More-General-Relation (cont.)
• The arrows connecting hypotheses represents more_general_than relation with the arrow
pointing toward the less general hypothesis.
• Note that subset of instances characterize by ℎ2 subsumes the subset characterize by ℎ1, so ℎ2
is more_general_than ℎ1
• But there is no more-general relation between ℎ1 and ℎ3
CSEC-4142 & PHDCS-4 @2022
Find-S: Finding a Maximally Specific Hypothesis
• In Find-S algorithm, the ‘more_general_than’ partial ordering is used to organize a search
for a hypothesis consistent with the observed training examples.
• The algorithm begins with the most specific possible hypothesis in H.
• Then generalize this hypothesis each time it fails to cover an observed positive training
example, we say that a hypothesis ‘covers’ a positive example if it correctly classifies the
example as positive
CSEC-4142 & PHDCS-4 @2022
Find-S Algorithm
• The algorithm is given below
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
IF the constraint ai in h is satisfied by x
THEN do nothing
ELSE replace ai in h by next more general constraint satisfied by x
3. Output hypothesis h
CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• FIND-S algorithm ignores negative examples.
 As long as the hypothesis space contains a hypothesis that describes the true target
concept, and the training data contains no errors, ignoring negative examples does
not cause to any problem.
• FIND-S algorithm finds the most specific hypothesis within H that is consistent with the
positive training examples.
 The final hypothesis will also be consistent with negative examples if the correct
target concept is in H, and the training examples are correct.
 To illustrate this algorithm, assume the learner is given the sequence of training examples
from the EnjoySport example
CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
 The first step of Find-S is to initialize h to the most specific hypothesis in H
h←<Φ, Φ, Φ, Φ, Φ, Φ>
 Upon observing the first training example, which happens to be positive example, it
becomes clear that our hypothesis become too specific, so it is replace by next more
general constraint that fits the example
h ← <Sunny, Warm, Normal, Strong, Warm, Same>
CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• The second training example, which is also a positive example, forced the algorithm to
further generalization of h, this time substituting a “?” in place of any attribute value in h
that is not satisfied by the new example.
• The redefined hypothesis is:
h ← <Sunny, Warm, ?, Strong, Warm, Same>
CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
• Upon encountering the third training example, in this case a negative example, the algorithm
makes no change to h
• Note that in the current case, our hypothesis is still consistent with the training example, it is
always the case if the training data is correct.
• To complete our trace of FIND-S, the fourth positive example lead to a further generalization
of h
h ← <Sunny, Warm, ?, Strong, ?, ?>
CSEC-4142 & PHDCS-4 @2022
Find-S (cont.)
CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm
• Has FIND-S converged to the correct target concept?
 Although FIND-S will find a hypothesis consistent with the training data, it has no way to
determine whether it has found the only hypothesis in H consistent with the data (i.e., the
correct target concept), or whether there are many other consistent hypotheses as well.
 We would prefer a learning algorithm that could determine whether it had converged and,
if not, at least characterize its uncertainty regarding the true identity of the target concept.
CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• Why prefer the most specific hypothesis?
 In case there are multiple hypotheses consistent with the training examples, FIND-S will
find the most specific.
 It is unclear whether we should prefer this hypothesis over, say, the most general, or some
other hypothesis of intermediate generality.
CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• Are the training examples consistent?
 In most practical learning problems there is some chance that the training examples will
contain at least some errors or noise.
 Such inconsistent sets of training examples can severely mislead FIND-S, given the fact
that it ignores negative examples.
 We would prefer an algorithm that could at least detect when the training data is
inconsistent and, preferably, accommodate such errors.
CSEC-4142 & PHDCS-4 @2022
Unanswered Questions by FIND-S Algorithm (cont.)
• What if there are several maximally specific consistent hypotheses?
 In the hypothesis language H for the EnjoySport task, there is always a unique, most
specific hypothesis consistent with any set of positive examples.
 However, for other hypothesis spaces there can be several maximally specific hypotheses
consistent with the data.
 In this case, FIND-S must be extended to allow it to backtrack on its choices of how to
generalize the hypothesis, to accommodate the possibility that the target concept lies
along a different branch of the partial ordering than the branch it has selected.
CSEC-4142 & PHDCS-4 @2022
Summary: Find-S
• Advantages:
 It is simple
 Outcome is independent of order of examples
• Alternative overcomes these problems
 Keep all consistent hypotheses!
o Candidate elimination algorithm
CSEC-4142 & PHDCS-4 @2022
Summary: Find-S (cont.)
• Drawbacks:
 Throws away information!
oNegative examples
 Can not tell whether it has learned the concept
oDepending on H, there might be several h’s that fit Training Examples!
oPicks a maximally specific h
 Can not tell when training data is inconsistent
oSince ignores negative Training Examples
CSEC-4142 & PHDCS-4 @2022
Consistent Hypotheses and Version Space
• A hypothesis h is consistent with a set of training examples D of target concept c
if h(x) = c(x) for each training example 〈x, c(x)〉 in D
 Note that consistency is with respect to specific D.
• Notation:
Consistent (h, D) ≡ ∀〈x, c(x)〉∈D :: h(x) = c(x)
• The version space, VSH,D , with respect to hypothesis space H and training examples D, is the
subset of hypotheses from H consistent with D
• Notation:
VSH,D = {h | h ∈ H ∧ Consistent (h, D)}
CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm
• List-Then-Eliminate algorithm initializes the version space to contain all hypotheses in H,
then eliminates any hypothesis found inconsistent with any training example.
• The version space of candidate hypotheses thus shrinks as more examples are observed, until
ideally just one hypothesis remains that is consistent with all the observed examples.
 Presumably, this is the desired target concept.
 If insufficient data is available to narrow the version space to a single hypothesis, then the
algorithm can output the entire set of hypotheses consistent with the observed data.
CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm (cont.)
• List-Then-Eliminate algorithm can be applied whenever the hypothesis space H is finite.
• It has many advantages, including the fact that it is guaranteed to output all hypotheses
consistent with the training data.
• Unfortunately, it requires exhaustively enumerating all hypotheses in H - an unrealistic
requirement for all but the most trivial hypothesis spaces.
CSEC-4142 & PHDCS-4 @2022
List-Then-Eliminate Algorithm (cont.)
1. VersionSpace ← list of all hypotheses in H
2. For each training example 〈x, c(x)〉
remove from VersionSpace any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses in VersionSpace
4. This is essentially a brute force procedure
CSEC-4142 & PHDCS-4 @2022
Example of Find-S, Revisited
x1=〈Sunny Warm Normal Strong Warm Same〉 +
x2=〈Sunny Warm High Strong Warm Same〉 +
x3=〈Rainy Cold High Strong Warm Change〉 −
x3=〈Sunny Warm High Strong Cool Change〉 +
specific
general
h0=〈∅ ∅ ∅ ∅ ∅ ∅〉
h1=〈Sunny Warm Normal Strong Warm Same〉
h2=〈Sunny Warm ? Strong Warm Same〉
h3=〈Sunny Warm ? Strong Warm Same〉
h4=〈Sunny Warm ? Strong ? ?〉
Instances X Hypotheses H
CSEC-4142 & PHDCS-4 @2022
Version Space for this Example
〈Sunny Warm ? Strong ? ?〉
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
〈Sunny ? ? ? ? ?〉 〈? Warm ? ? ? ?〉
{ , }
G
{ }
S
• A version space with its general and specific boundary sets. The version space includes
all six hypotheses shown here
• It can be represented more simply by S and G. Arrows indicate instances of the more-
general-than relation.
• This is the version space for the Enjoysport concept learning problem and training
examples described above
CSEC-4142 & PHDCS-4 @2022
Representing Version Spaces
• Want more compact representation of VS
 Store most/least general boundaries of space
 Generate all intermediate h’s in VS
 Idea that any h in VS must be consistent with all Training Examples (TEs)
o Generalize from most specific boundaries
o Specialize from most general boundaries
CSEC-4142 & PHDCS-4 @2022
Representing Version Spaces (cont.)
• The general boundary, G, of version space VSH,D is the set of its maximally general members
consistent with D
 Summarizes the negative examples; anything more general will cover a negative TE
• The specific boundary, S, of version space VSH,D is the set of its maximally specific
members consistent with D
 Summarizes the positive examples; anything more specific will fail to cover a positive
TE
CSEC-4142 & PHDCS-4 @2022
Theorem
Theorem: Every member of the version space lies between the S,G boundary
VSH,D = {h | h ∈ H ∧ ∃s∈S ∃g∈G (g ≥ h ≥ s)}
• Must prove:
1) every h satisfying RHS is in VSH,D;
2) every member of VSH,D satisfies RHS.
• For 1), let g, h, s be arbitrary members of G, H, S respectively with g>h>s
 s must be satisfied by all positive (+) TEs and so must h because it is more general;
 g cannot be satisfied by any negative (–) TEs, and so nor can h
 h is in VSH,D since satisfied by all positive (+) TEs and no negative (–) TEs
• For 2),
 Since h satisfies all positive (+) TEs and no negative (–) TEs, h ≥ s, and g ≥ h.
CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm
• The Candidate-Elimination algorithm computes the version space containing all hypotheses
from H that are consistent with an observed sequence of training examples.
• It begins by initializing the version space to the set of all hypotheses in H; that is, by
initializing the G boundary set to contain the most general hypothesis in H
G0 ← { <?, ?, ?, ?, ?, ?> }
• and initializing the S boundary set to contain the most specific hypothesis
S0 ← { <0, 0, 0, 0, 0, 0> }
CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
• These two boundary sets delimit the entire hypothesis space, because every other
hypothesis in H is both more general than S0 and more specific than G0.
• As each training example is considered, the S and G boundary sets are generalized and
specialized, respectively, to eliminate from the version space any hypotheses found
inconsistent with the new training example.
• After all examples have been processed, the computed version space contains all the
hypotheses of hypothesis space H consistent with these examples
CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
Initialization
G ← maximally general hypotheses in H
S ← maximally specific hypotheses in H
For each training example d, do
• If d is positive
 Remove from G every hypothesis inconsistent with d
 For each hypothesis s in S that is inconsistent with d
oRemove s from S
oAdd to S all minimal generalizations h of s such that
1. h is consistent with d, and
2. some member of G is more general than h
 Remove from S every hypothesis that is more general than another hypothesis in S
CSEC-4142 & PHDCS-4 @2022
Candidate Elimination Algorithm (cont.)
• If d is a negative example
 Remove from S every hypothesis inconsistent with d
 For each hypothesis g in G that is inconsistent with d
o Remove g from G
o Add to G all minimal specializations h of g such that
1. h is consistent with d, and
2. some member of S is more specific than h
 Remove from G every hypothesis that is less general than another hypothesis in G
• Essentially use
 Positive TEs to generalize S
 Negative TEs to specialize G
• Independent of order of TEs
• Convergence guaranteed if:
 No errors
 There is h in H describing c.
CSEC-4142 & PHDCS-4 @2022
Example
S0
G0 {〈? ? ? ? ? ?〉}
G1 {〈? ? ? ? ? ?〉}
{〈∅ ∅ ∅ ∅ ∅ ∅〉}
S1 {〈Sunny Warm Normal Strong Warm Same〉}
〈Sunny Warm Normal Strong Warm Same〉 +
〈Sunny Warm High Strong Warm Same〉 +
S2 {〈Sunny Warm ? Strong Warm Same〉} G2 {〈? ? ? ? ? ?〉}
〈Rainy Cold High Strong Warm Change〉 −
{〈Sunny Warm ? Strong Warm Same〉}
S3 〈Sunny ? ? ? ? ?〉
{ }
, 〈? Warm ? ? ? ?〉, 〈? ? ? ? ? Same〉
G3
Current G boundary is incorrect
So, need to make it more specific.
CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• Given that there are six attributes that could be specified to specialize G2, there only three
new hypotheses in G3
• For example, the hypothesis h = <?, ?, Normal, ?, ?, ?> is a minimal specialization of G2
that correctly labels the new example as a negative example, but it is not included in G3.
 The reason this hypothesis is excluded is that it is inconsistent with S2.
 The algorithm determines this simply by noting that h is not more general than the
current specific boundary, S2.
CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• In fact, the S boundary of the version space forms a summary of the previously encountered
positive examples that can be used to determine whether any given hypothesis is consistent
with these examples.
• The G boundary summarizes the information from previously encountered negative
examples. Any hypothesis more specific than G is assured to be consistent with past negative
examples
CSEC-4142 & PHDCS-4 @2022
Example (cont.)
{〈Sunny Warm ? Strong Warm Same〉}
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉, 〈? ? ? ? ?Same〉}
S3
G3
〈Sunny Warm High Strong Cool Change〉 +
{〈Sunny Warm ? Strong ? ?〉}
S4
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G4
CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• The fourth training example further generalizes the S boundary of the version space.
 It also results in removing one member of the G boundary, because this member fails to
cover the new positive example.
 To understand the rationale for this step, it is useful to consider why the offending
hypothesis must be removed from G.
CSEC-4142 & PHDCS-4 @2022
Example (cont.)
• Notice it cannot be specialized, because specializing it would not make it cover the new
example.
• It also cannot be generalized, because by the definition of G, any more general hypothesis
will cover at least one negative training example.
• Therefore, the hypothesis must be dropped from the G boundary, thereby removing an entire
branch of the partial ordering from the version space of hypotheses remaining under
consideration
CSEC-4142 & PHDCS-4 @2022
Version Space of the Example
60
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
version
space
S
G
CSEC-4142 & PHDCS-4 @2022
Convergence of algorithm
• Convergence guaranteed if:
 No errors
 There is h in H describing c.
• Ambiguity removed from VS when S = G
 Containing single h
 When have seen enough TEs
• If have false negative TE, algorithm will remove every h consistent with TE, and hence
will remove correct target concept from VS
 If observe enough TEs will find that S, G boundaries converge to empty VS
CSEC-4142 & PHDCS-4 @2022
〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
Which Next Training Example?
Assume learner can choose the next TE
Should choose d such that
 Reduces maximally the number of
hypotheses in VS
 Best TE: satisfies precisely 50% hypotheses;
o Cannot always be done, if possible
correct target concept can be achieved
only log2 |𝑉𝑉𝑉𝑉| experiments
o Example: 〈Sunny Warm Normal Weak
Warm Same〉 ?
o Satisfies by only 3 hypotheses, i.e. 50%
of the total hypotheses
o If positive, generalizes S ; If negative,
specializes G
• Order of examples matters for intermediate
sizes of S,G; not for the final S, G
CSEC-4142 & PHDCS-4 @2022
Classifying new cases using VS
• Use voting procedure on following examples:
 〈Sunny Warm Normal Strong Cool
Change〉 - (+ by all) – Classified as positive
with confidence. It is possible when all the
hypotheses of S satisfy the new instance
 〈Rainy Cool Normal Weak Warm Same〉 - (-
by all) – Classified as negative with
confidence. It is possible when the new
instance is not satisfies by any hypothesis of
G.
 〈Sunny Warm Normal Weak Warm Same〉 -
(+ by 3 and – by 3) – need for TEs, can not
decide
 〈Sunny Cold Normal Strong Warm Same〉 -
(+ by 2 and – by 4) – Classified as negative
with 67% confidence
〈Sunny ? ? Strong ? ?〉〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉
{〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉}
G
{〈Sunny Warm ? Strong ? ?〉}
S
CSEC-4142 & PHDCS-4 @2022
Effect of incomplete hypothesis space
• Preceding algorithms work if target function is in H
 Will generally not work if target function not in H
• Consider following examples which represent target function
“sky = sunny or sky = cloudy”:
 〈Sunny Warm Normal Strong Cool Change〉 Y
 〈Cloudy Warm Normal Strong Cool Change〉 Y
 〈〈Rainy Warm Normal Strong Cool Change〉 N
CSEC-4142 & PHDCS-4 @2022
Effect of incomplete hypothesis space (cont.)
• If apply Candidate Elimination algorithm as before, end up with empty VS
 After first two TEs, S= 〈? Warm Normal Strong Cool Change〉
 New hypothesis is overly general
oit covers the third negative TE!
• Our H does not include the appropriate c
• Need more expressive hypotheses
CSEC-4142 & PHDCS-4 @2022
Incomplete hypothesis space
• If c not in H, then consider generalizing representation of H to contain c
 For example, add disjunctions or negations to representation of hypotheses in H
• One way to avoid problem is to allow all possible representations of h’s
 Equivalent to allowing all possible subsets of instances as defining the concept of
EnjoySport
oRecall that there are 96 instances in EnjoySport; hence there are 296 ≈ 1028 possible
hypotheses in full space H
oCan do this by using full propositional calculus with AND, OR, NOT
oHence H defined only by conjunctions of attributes is biased (containing only
973 h’s)
CSEC-4142 & PHDCS-4 @2022
Unbiased Learners and Inductive Bias
• BUT if have no limits on representation of hypotheses
(i.e., full logical representation: and, or, not), can only learn examples…no generalization
possible!
 Say have 5 TEs {x1, x2, x3, x4, x5}, with x4, x5 negative TEs
• Apply Candidate Elimination algorithm
 S will be disjunction of positive examples (S={x1 OR x2 OR x3})
 G will be negation of disjunction of negative examples (G={not (x4 or x5)})
 Need to use all instances to learn the concept!
CSEC-4142 & PHDCS-4 @2022
Unbiased Learners and Inductive Bias
• “A learner that makes no a priori assumptions regarding the identity of the target concept has
no rational basis for classifying any unseen instances”
• Cannot predict usefully:
 TEs have unanimous vote
 other h’s have 50/50 vote!
oFor every h in H that predicts +, there is another that predicts -
• Approach:
 Place constraints on representation of hypotheses
oExample of limiting connectives to conjunctions
oAllows learning of generalized hypotheses
oIntroduces bias that depends on hypothesis representation
CSEC-4142 & PHDCS-4 @2022
Inductive System and Equivalent Deductive System
• Inductive bias (IB) of learning algorithm L is any minimal set of assertions B such that for any
target concept c and training examples D, we can logically infer value c(x) of any instance x
from B, D, and x
• L (x, D) = k implies that all members of VS_HD, including c, vote for class k (unanimous
voting). Therefore: c(x) = k = L( i, D ).
• This means, that the output of the learner L(x, D) can be logically deduced from B ∧ D ∧ x

More Related Content

Similar to Lecture-2.pdf

AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxMohammadAsim91
 
CocomoModels MGK .ppt
CocomoModels MGK .pptCocomoModels MGK .ppt
CocomoModels MGK .pptssuser3d1dad3
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819HODCSE21
 
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...Khalil Alhatab
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]ssuser23e4f31
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshareSathyaseelanK1
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxBhagyasriPatel2
 
LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4Hendrik Drachsler
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data miningALIZAIB KHAN
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...Jihun Park
 

Similar to Lecture-2.pdf (20)

Lesson 33
Lesson 33Lesson 33
Lesson 33
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
CocomoModels MGK .ppt
CocomoModels MGK .pptCocomoModels MGK .ppt
CocomoModels MGK .ppt
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819
 
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...
Lecture 2 Basic Concepts of Optimal Design and Optimization Techniques final1...
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshare
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 
LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
ML Basics
ML BasicsML Basics
ML Basics
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 

Recently uploaded

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Recently uploaded (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Lecture-2.pdf

  • 1. Lecture - 2 CSEE-4142 & PHDCS-834: Machine Learning
  • 2. CSEC-4142 & PHDCS-4 @2022 Machine Learning • Machine learning is programming computers to optimize a performance criterion using example data or past experience. • There is no need to “learn” to calculate payroll • Learning is used when:  Human expertise does not exist (navigating on Mars),  Humans are unable to explain their expertise (speech recognition)  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics) 2
  • 3. CSEC-4142 & PHDCS-4 @2022 What to Learn • Learning general models from a data of particular examples • Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. • Example in retail: Customer transactions to consumer behavior:  People who bought “Blink” also bought “Outliers” (www.amazon.com)  The sales of diapers and beer were correlated on Friday evening (Walmart) • Build a model that is a good and useful approximation to the data. 3
  • 4. CSEC-4142 & PHDCS-4 @2022 well-posed Problem • A learning problem is called well-posed if a solution to it exists, that solution is unique and the solution depends on data/experience but is not sensitive to (reasonably small) change in the data/experience. • In general, to have a well-defined learning problem, we must identify these three features:  The class of tasks  The measure of performance to be improved  The source of experience
  • 5. CSEC-4142 & PHDCS-4 @2022 Machine Learning Concept in Nutshell • Machine learning is a subfield of Artificial intelligence (AI) which concerns with developing computational theories of learning and building learning machine. • Learning is the phenomenon or process which is concern with gaining of new symbolic knowledge and development of cognitive skill through instruction and practice • It is also discovery of new facts and theories through observations and experiments. • Machine learning is programming computer to optimize a performance criteria using example data or past experience. • It is very hard to write program that solve problems like recognition of human face as we do not know how our brain do it. • Instead of writing a program by hand, it is possible to collect lots of example that specify correct output for a given input.
  • 6. CSEC-4142 & PHDCS-4 @2022 Concept of Machine Learning (cont.) • A machine learning algorithm then takes these examples and produces a program that does the job. • Main goal of machine learning is to devise learning algorithm that do the learning automatically without human intervention or assistance • The machine learning paradigm can be viewed as ‘programming by example’ • Another goal is to develop computational models of human learning process and perform computer simulation • That is, the goal of machine learning is to build computer systems that can adapt and learn from their experience
  • 7. CSEC-4142 & PHDCS-4 @2022 Reason for using Machine Learning • Machine learning algorithm can figure out how to perform important tasks by generalizing from examples • Machine learning algorithms discover the relationship between the variables of a system (input, output and hidden) from direct samples of the system • There are some real world problems, like recognizing person from voice, can not be defined well. • Relationship and correlation can be hidden within large amount of data
  • 8. CSEC-4142 & PHDCS-4 @2022 Reason for using Machine Learning (cont.) • To solve these problems, machine learning and data mining may be used to find the relationships • The amount of knowledge available about certain task might be too large for explicit encoding by humans • Environments changes time to time • New knowledge about tasks is constantly being discovered
  • 9. CSEC-4142 & PHDCS-4 @2022 Phases for Machine Learning • Machine learning typically follows three phases: 1. Training: A training set of examples of correct behaviour is analysed and some representation of the newly learnt knowledge is stored. This is some form of rules. 2. Validation: The rules are checked and if necessary, additional training is given. Sometimes additional test data is used. A human expert or automatic knowledge based components may be used to validate the rules. The role of the tester is often called opponent. 3. Application: The rules are used in responding to some new situations.
  • 10. CSEC-4142 & PHDCS-4 @2022 Designing a learning system 1. Data: Choose the training experience D = {d1 , d2 ,.., dn } 2. Feature Selection • Features depend on the problem. Measure ‘relevant’ quantities. • Some techniques available to extract ‘more relevant’ quantities from the initial measurements. (e.g., PCA) • After feature extraction each pattern is a vector 2. Model selection: (a) Select a model or a set of models (with parameters) E.g. y = ax + b + ε where ε=N(0,σ) It determines exactly what type of knowledge will be learned and how this will be used by the performance program
  • 11. CSEC-4142 & PHDCS-4 @2022 Designing a learning system (cont.) (b) Select the error function to be optimized, e.g., 1 𝑛𝑛 � 𝑖𝑖=1 𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑓𝑓(𝑥𝑥𝑖𝑖) 2 3. Learning: Find the set of parameters optimizing the error function – The model and parameters with the smallest error 4. Application (Evaluation): • Apply the learned model – E.g. predict y’s for new inputs x using learned f ( x )
  • 12. CSEC-4142 & PHDCS-4 @2022 Concept Learning • Inducing general functions from specific training examples is a main issue of machine learning • Acquiring the definition of a general category from given sample positive and sample negative training examples of the category is know as concept learning • A machine learning hypothesis is a candidate model that approximates a target function for mapping inputs to outputs. • The hypothesis space has a general to specific ordering of hypothesis and the search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space
  • 13. CSEC-4142 & PHDCS-4 @2022 Concept Learning (cont.) • Concept learning is formally defined as the ‘Inferring of a Boolean valued function from training examples of its input and output’ • Concept learning involves determining a mapping from a set of input variables to a Boolean variable. Such methods are known as inductive learning method. • If a function can be found which maps training data to correct classifications, then it will also work well for unseen data. This process is known as generalization
  • 14. CSEC-4142 & PHDCS-4 @2022 A concept learning task • An example for concept-learning is the learning of Enjoy-Sports from the given examples of positive examples and negative examples • We are trying to learn the definition of a concept from given examples. Table. Enjoy Sports Training Examples
  • 15. CSEC-4142 & PHDCS-4 @2022 A concept learning task (cont.) • A set of example days, and each is described by six attributes. • The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its attribute values. • Each hypothesis consists of a conjunction of constraints on the instance attributes. • A hypothesis is a vector of constraints for each attribute
  • 16. CSEC-4142 & PHDCS-4 @2022 A concept learning task (cont.) • In this example, each hypothesis will be a vector of six constraints, specifying the values of the six attributes – (Sky, AirTemp, Humidity, Wind, Water, and Forecast).  Indicate by a ? That any value is acceptable for this attribute  Specify a single required value for the attribute  Indicate by Φ that no value is acceptable • The most general hypothesis – that every day is a positive example <?,?,?,?,?,?>
  • 17. CSEC-4142 & PHDCS-4 @2022 A concept learning task (cont.) • The most specific hypothesis – that no day is a positive example <Φ, Φ, Φ, Φ, Φ, Φ> • If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x)=1) • To illustrate that A enjoys his favourite sport only on cold day with high humidity (independent of the other attributes) is represented by the expression <?, Cold, High, ?, ?, ?> • EnjoySport concept learning task requires learning the sets of days for which EnjoySport=yes, describing this set by a conjunction of constraints over the instance attributes.
  • 18. CSEC-4142 & PHDCS-4 @2022 A concept learning task • Given – Instances X : set of all possible days, each described by the attributes Sky – (values: Sunny, Cloudy, Rainy) AirTemp – (values: Warm, Cold) Humidity – (values: Normal, High) Wind – (values: Strong, Weak) Water – (values: Warm, Cold) Forecast – (values: Same, Change) – Target Concept (Function) c : EnjoySport : X → {0,1} – Hypotheses H : Each hypothesis is described by a conjunction of constraints on the attributes. – Training Examples D : positive and negative examples of the target function Determine – A hypothesis h in H such that h(x) = c(x) for all x in D.
  • 19. CSEC-4142 & PHDCS-4 @2022 Inductive learning hypothesis • Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples • Although the learning task is to determine a hypothesis (h) identical to the target concept cover the entire set of instances (X), the only information available about c is its value over the training examples • Inductive learning algorithms can at best guarantee that the output hypothesis fits the target concepts over the training data • Lacking any further information, our assumption is that the best hypothesis regarding unseen instances is the hypothesis that best fits the observed training data. This is fundamental assumption of inductive learning
  • 20. CSEC-4142 & PHDCS-4 @2022 Concept learning as search • Concept learning can be viewed as the task of searching through a large space of hypothesis implicitly defined by the hypothesis representation • The goal of this search is to find the hypothesis that best fits the training examples. • By selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypothesis that the program can ever represent and therefore can ever learn
  • 21. CSEC-4142 & PHDCS-4 @2022 Enjoy Sport - Hypothesis Space • Sky has 3 possible values, and other 5 attributes have 2 possible values. • There are 96 (= 3.2.2.2.2.2) distinct instances in X. • A similar calculation shows that there are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses in H. – Two more values for attributes: ? and Φ • However, every hypothesis containing one or more Φ symbols represents the empty set of instances; that is, it classifies every instance as negative.
  • 22. CSEC-4142 & PHDCS-4 @2022 Enjoy Sport - Hypothesis Space (cont.) • Therefore, the number of semantically distinct hypotheses is only 973 (= 1 + 4.3.3.3.3.3). • The EnjoySport is a very simple learning task having small, finite hypothesis space • Most practical learning tasks have much larger (even infinite) hypothesis spaces.
  • 23. CSEC-4142 & PHDCS-4 @2022 General-to-Specific Ordering of Hypotheses • Many algorithms for concept learning organize the search through the hypothesis space by relying on a general-to-specific ordering of hypotheses. • By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every hypothesis. • Consider two hypotheses h1 = (Sunny, ?, ?, Strong, ?, ?) h2 = (Sunny, ?, ?, ?, ?, ?)
  • 24. CSEC-4142 & PHDCS-4 @2022 General-to-Specific Ordering of Hypotheses (cont.) • Now consider the sets of instances that are classified positive by h1 and by h2.  Because h2 imposes fewer constraints on the instance, it classifies more instances as positive.  In fact, any instance classified positive by h1 will also be classified positive by h2.  Therefore, we say that h2 is more general than h1.
  • 25. CSEC-4142 & PHDCS-4 @2022 More-General-Than Relation • For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. • More-General-Than-Or-Equal Relation: Let h1 and h2 be two Boolean-valued functions defined over X. Then h1 is more- general-than-or-equal-to h2 (written h1 ≥ h2) if and only if any instance that satisfies h2 also satisfies h1. i.e., ∀𝑥𝑥 ∈ 𝑋𝑋 [ ℎ1 𝑥𝑥 = 1 → ℎ2 𝑥𝑥 = 1 ] • h1 is more-general-than h2 ( h1 > h2) if and only if h1≥h2 is true and h2≥h1 is false. • We also say h2 is more-specific-than h1.
  • 26. CSEC-4142 & PHDCS-4 @2022 More-General-Relation • Instances, hypotheses and the more_general_than relation • The box on the left represents the set X of all instances • The box on the right is the set of all hypotheses H • Each hypothesis corresponds to some subset of X-the subset of instances that is classified positive
  • 27. CSEC-4142 & PHDCS-4 @2022 More-General-Relation (cont.) • The arrows connecting hypotheses represents more_general_than relation with the arrow pointing toward the less general hypothesis. • Note that subset of instances characterize by ℎ2 subsumes the subset characterize by ℎ1, so ℎ2 is more_general_than ℎ1 • But there is no more-general relation between ℎ1 and ℎ3
  • 28. CSEC-4142 & PHDCS-4 @2022 Find-S: Finding a Maximally Specific Hypothesis • In Find-S algorithm, the ‘more_general_than’ partial ordering is used to organize a search for a hypothesis consistent with the observed training examples. • The algorithm begins with the most specific possible hypothesis in H. • Then generalize this hypothesis each time it fails to cover an observed positive training example, we say that a hypothesis ‘covers’ a positive example if it correctly classifies the example as positive
  • 29. CSEC-4142 & PHDCS-4 @2022 Find-S Algorithm • The algorithm is given below 1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint ai in h IF the constraint ai in h is satisfied by x THEN do nothing ELSE replace ai in h by next more general constraint satisfied by x 3. Output hypothesis h
  • 30. CSEC-4142 & PHDCS-4 @2022 Find-S (cont.) • FIND-S algorithm ignores negative examples.  As long as the hypothesis space contains a hypothesis that describes the true target concept, and the training data contains no errors, ignoring negative examples does not cause to any problem. • FIND-S algorithm finds the most specific hypothesis within H that is consistent with the positive training examples.  The final hypothesis will also be consistent with negative examples if the correct target concept is in H, and the training examples are correct.  To illustrate this algorithm, assume the learner is given the sequence of training examples from the EnjoySport example
  • 31. CSEC-4142 & PHDCS-4 @2022 Find-S (cont.)  The first step of Find-S is to initialize h to the most specific hypothesis in H h←<Φ, Φ, Φ, Φ, Φ, Φ>  Upon observing the first training example, which happens to be positive example, it becomes clear that our hypothesis become too specific, so it is replace by next more general constraint that fits the example h ← <Sunny, Warm, Normal, Strong, Warm, Same>
  • 32. CSEC-4142 & PHDCS-4 @2022 Find-S (cont.) • The second training example, which is also a positive example, forced the algorithm to further generalization of h, this time substituting a “?” in place of any attribute value in h that is not satisfied by the new example. • The redefined hypothesis is: h ← <Sunny, Warm, ?, Strong, Warm, Same>
  • 33. CSEC-4142 & PHDCS-4 @2022 Find-S (cont.) • Upon encountering the third training example, in this case a negative example, the algorithm makes no change to h • Note that in the current case, our hypothesis is still consistent with the training example, it is always the case if the training data is correct. • To complete our trace of FIND-S, the fourth positive example lead to a further generalization of h h ← <Sunny, Warm, ?, Strong, ?, ?>
  • 34. CSEC-4142 & PHDCS-4 @2022 Find-S (cont.)
  • 35. CSEC-4142 & PHDCS-4 @2022 Unanswered Questions by FIND-S Algorithm • Has FIND-S converged to the correct target concept?  Although FIND-S will find a hypothesis consistent with the training data, it has no way to determine whether it has found the only hypothesis in H consistent with the data (i.e., the correct target concept), or whether there are many other consistent hypotheses as well.  We would prefer a learning algorithm that could determine whether it had converged and, if not, at least characterize its uncertainty regarding the true identity of the target concept.
  • 36. CSEC-4142 & PHDCS-4 @2022 Unanswered Questions by FIND-S Algorithm (cont.) • Why prefer the most specific hypothesis?  In case there are multiple hypotheses consistent with the training examples, FIND-S will find the most specific.  It is unclear whether we should prefer this hypothesis over, say, the most general, or some other hypothesis of intermediate generality.
  • 37. CSEC-4142 & PHDCS-4 @2022 Unanswered Questions by FIND-S Algorithm (cont.) • Are the training examples consistent?  In most practical learning problems there is some chance that the training examples will contain at least some errors or noise.  Such inconsistent sets of training examples can severely mislead FIND-S, given the fact that it ignores negative examples.  We would prefer an algorithm that could at least detect when the training data is inconsistent and, preferably, accommodate such errors.
  • 38. CSEC-4142 & PHDCS-4 @2022 Unanswered Questions by FIND-S Algorithm (cont.) • What if there are several maximally specific consistent hypotheses?  In the hypothesis language H for the EnjoySport task, there is always a unique, most specific hypothesis consistent with any set of positive examples.  However, for other hypothesis spaces there can be several maximally specific hypotheses consistent with the data.  In this case, FIND-S must be extended to allow it to backtrack on its choices of how to generalize the hypothesis, to accommodate the possibility that the target concept lies along a different branch of the partial ordering than the branch it has selected.
  • 39. CSEC-4142 & PHDCS-4 @2022 Summary: Find-S • Advantages:  It is simple  Outcome is independent of order of examples • Alternative overcomes these problems  Keep all consistent hypotheses! o Candidate elimination algorithm
  • 40. CSEC-4142 & PHDCS-4 @2022 Summary: Find-S (cont.) • Drawbacks:  Throws away information! oNegative examples  Can not tell whether it has learned the concept oDepending on H, there might be several h’s that fit Training Examples! oPicks a maximally specific h  Can not tell when training data is inconsistent oSince ignores negative Training Examples
  • 41. CSEC-4142 & PHDCS-4 @2022 Consistent Hypotheses and Version Space • A hypothesis h is consistent with a set of training examples D of target concept c if h(x) = c(x) for each training example 〈x, c(x)〉 in D  Note that consistency is with respect to specific D. • Notation: Consistent (h, D) ≡ ∀〈x, c(x)〉∈D :: h(x) = c(x) • The version space, VSH,D , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with D • Notation: VSH,D = {h | h ∈ H ∧ Consistent (h, D)}
  • 42. CSEC-4142 & PHDCS-4 @2022 List-Then-Eliminate Algorithm • List-Then-Eliminate algorithm initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. • The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples.  Presumably, this is the desired target concept.  If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire set of hypotheses consistent with the observed data.
  • 43. CSEC-4142 & PHDCS-4 @2022 List-Then-Eliminate Algorithm (cont.) • List-Then-Eliminate algorithm can be applied whenever the hypothesis space H is finite. • It has many advantages, including the fact that it is guaranteed to output all hypotheses consistent with the training data. • Unfortunately, it requires exhaustively enumerating all hypotheses in H - an unrealistic requirement for all but the most trivial hypothesis spaces.
  • 44. CSEC-4142 & PHDCS-4 @2022 List-Then-Eliminate Algorithm (cont.) 1. VersionSpace ← list of all hypotheses in H 2. For each training example 〈x, c(x)〉 remove from VersionSpace any hypothesis h for which h(x) ≠ c(x) 3. Output the list of hypotheses in VersionSpace 4. This is essentially a brute force procedure
  • 45. CSEC-4142 & PHDCS-4 @2022 Example of Find-S, Revisited x1=〈Sunny Warm Normal Strong Warm Same〉 + x2=〈Sunny Warm High Strong Warm Same〉 + x3=〈Rainy Cold High Strong Warm Change〉 − x3=〈Sunny Warm High Strong Cool Change〉 + specific general h0=〈∅ ∅ ∅ ∅ ∅ ∅〉 h1=〈Sunny Warm Normal Strong Warm Same〉 h2=〈Sunny Warm ? Strong Warm Same〉 h3=〈Sunny Warm ? Strong Warm Same〉 h4=〈Sunny Warm ? Strong ? ?〉 Instances X Hypotheses H
  • 46. CSEC-4142 & PHDCS-4 @2022 Version Space for this Example 〈Sunny Warm ? Strong ? ?〉 〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉 〈Sunny ? ? ? ? ?〉 〈? Warm ? ? ? ?〉 { , } G { } S • A version space with its general and specific boundary sets. The version space includes all six hypotheses shown here • It can be represented more simply by S and G. Arrows indicate instances of the more- general-than relation. • This is the version space for the Enjoysport concept learning problem and training examples described above
  • 47. CSEC-4142 & PHDCS-4 @2022 Representing Version Spaces • Want more compact representation of VS  Store most/least general boundaries of space  Generate all intermediate h’s in VS  Idea that any h in VS must be consistent with all Training Examples (TEs) o Generalize from most specific boundaries o Specialize from most general boundaries
  • 48. CSEC-4142 & PHDCS-4 @2022 Representing Version Spaces (cont.) • The general boundary, G, of version space VSH,D is the set of its maximally general members consistent with D  Summarizes the negative examples; anything more general will cover a negative TE • The specific boundary, S, of version space VSH,D is the set of its maximally specific members consistent with D  Summarizes the positive examples; anything more specific will fail to cover a positive TE
  • 49. CSEC-4142 & PHDCS-4 @2022 Theorem Theorem: Every member of the version space lies between the S,G boundary VSH,D = {h | h ∈ H ∧ ∃s∈S ∃g∈G (g ≥ h ≥ s)} • Must prove: 1) every h satisfying RHS is in VSH,D; 2) every member of VSH,D satisfies RHS. • For 1), let g, h, s be arbitrary members of G, H, S respectively with g>h>s  s must be satisfied by all positive (+) TEs and so must h because it is more general;  g cannot be satisfied by any negative (–) TEs, and so nor can h  h is in VSH,D since satisfied by all positive (+) TEs and no negative (–) TEs • For 2),  Since h satisfies all positive (+) TEs and no negative (–) TEs, h ≥ s, and g ≥ h.
  • 50. CSEC-4142 & PHDCS-4 @2022 Candidate Elimination Algorithm • The Candidate-Elimination algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples. • It begins by initializing the version space to the set of all hypotheses in H; that is, by initializing the G boundary set to contain the most general hypothesis in H G0 ← { <?, ?, ?, ?, ?, ?> } • and initializing the S boundary set to contain the most specific hypothesis S0 ← { <0, 0, 0, 0, 0, 0> }
  • 51. CSEC-4142 & PHDCS-4 @2022 Candidate Elimination Algorithm (cont.) • These two boundary sets delimit the entire hypothesis space, because every other hypothesis in H is both more general than S0 and more specific than G0. • As each training example is considered, the S and G boundary sets are generalized and specialized, respectively, to eliminate from the version space any hypotheses found inconsistent with the new training example. • After all examples have been processed, the computed version space contains all the hypotheses of hypothesis space H consistent with these examples
  • 52. CSEC-4142 & PHDCS-4 @2022 Candidate Elimination Algorithm (cont.) Initialization G ← maximally general hypotheses in H S ← maximally specific hypotheses in H For each training example d, do • If d is positive  Remove from G every hypothesis inconsistent with d  For each hypothesis s in S that is inconsistent with d oRemove s from S oAdd to S all minimal generalizations h of s such that 1. h is consistent with d, and 2. some member of G is more general than h  Remove from S every hypothesis that is more general than another hypothesis in S
  • 53. CSEC-4142 & PHDCS-4 @2022 Candidate Elimination Algorithm (cont.) • If d is a negative example  Remove from S every hypothesis inconsistent with d  For each hypothesis g in G that is inconsistent with d o Remove g from G o Add to G all minimal specializations h of g such that 1. h is consistent with d, and 2. some member of S is more specific than h  Remove from G every hypothesis that is less general than another hypothesis in G • Essentially use  Positive TEs to generalize S  Negative TEs to specialize G • Independent of order of TEs • Convergence guaranteed if:  No errors  There is h in H describing c.
  • 54. CSEC-4142 & PHDCS-4 @2022 Example S0 G0 {〈? ? ? ? ? ?〉} G1 {〈? ? ? ? ? ?〉} {〈∅ ∅ ∅ ∅ ∅ ∅〉} S1 {〈Sunny Warm Normal Strong Warm Same〉} 〈Sunny Warm Normal Strong Warm Same〉 + 〈Sunny Warm High Strong Warm Same〉 + S2 {〈Sunny Warm ? Strong Warm Same〉} G2 {〈? ? ? ? ? ?〉} 〈Rainy Cold High Strong Warm Change〉 − {〈Sunny Warm ? Strong Warm Same〉} S3 〈Sunny ? ? ? ? ?〉 { } , 〈? Warm ? ? ? ?〉, 〈? ? ? ? ? Same〉 G3 Current G boundary is incorrect So, need to make it more specific.
  • 55. CSEC-4142 & PHDCS-4 @2022 Example (cont.) • Given that there are six attributes that could be specified to specialize G2, there only three new hypotheses in G3 • For example, the hypothesis h = <?, ?, Normal, ?, ?, ?> is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3.  The reason this hypothesis is excluded is that it is inconsistent with S2.  The algorithm determines this simply by noting that h is not more general than the current specific boundary, S2.
  • 56. CSEC-4142 & PHDCS-4 @2022 Example (cont.) • In fact, the S boundary of the version space forms a summary of the previously encountered positive examples that can be used to determine whether any given hypothesis is consistent with these examples. • The G boundary summarizes the information from previously encountered negative examples. Any hypothesis more specific than G is assured to be consistent with past negative examples
  • 57. CSEC-4142 & PHDCS-4 @2022 Example (cont.) {〈Sunny Warm ? Strong Warm Same〉} {〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉, 〈? ? ? ? ?Same〉} S3 G3 〈Sunny Warm High Strong Cool Change〉 + {〈Sunny Warm ? Strong ? ?〉} S4 {〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉} G4
  • 58. CSEC-4142 & PHDCS-4 @2022 Example (cont.) • The fourth training example further generalizes the S boundary of the version space.  It also results in removing one member of the G boundary, because this member fails to cover the new positive example.  To understand the rationale for this step, it is useful to consider why the offending hypothesis must be removed from G.
  • 59. CSEC-4142 & PHDCS-4 @2022 Example (cont.) • Notice it cannot be specialized, because specializing it would not make it cover the new example. • It also cannot be generalized, because by the definition of G, any more general hypothesis will cover at least one negative training example. • Therefore, the hypothesis must be dropped from the G boundary, thereby removing an entire branch of the partial ordering from the version space of hypotheses remaining under consideration
  • 60. CSEC-4142 & PHDCS-4 @2022 Version Space of the Example 60 〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉 {〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉} G {〈Sunny Warm ? Strong ? ?〉} S version space S G
  • 61. CSEC-4142 & PHDCS-4 @2022 Convergence of algorithm • Convergence guaranteed if:  No errors  There is h in H describing c. • Ambiguity removed from VS when S = G  Containing single h  When have seen enough TEs • If have false negative TE, algorithm will remove every h consistent with TE, and hence will remove correct target concept from VS  If observe enough TEs will find that S, G boundaries converge to empty VS
  • 62. CSEC-4142 & PHDCS-4 @2022 〈Sunny ? ? Strong ? ?〉 〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉 {〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉} G {〈Sunny Warm ? Strong ? ?〉} S Which Next Training Example? Assume learner can choose the next TE Should choose d such that  Reduces maximally the number of hypotheses in VS  Best TE: satisfies precisely 50% hypotheses; o Cannot always be done, if possible correct target concept can be achieved only log2 |𝑉𝑉𝑉𝑉| experiments o Example: 〈Sunny Warm Normal Weak Warm Same〉 ? o Satisfies by only 3 hypotheses, i.e. 50% of the total hypotheses o If positive, generalizes S ; If negative, specializes G • Order of examples matters for intermediate sizes of S,G; not for the final S, G
  • 63. CSEC-4142 & PHDCS-4 @2022 Classifying new cases using VS • Use voting procedure on following examples:  〈Sunny Warm Normal Strong Cool Change〉 - (+ by all) – Classified as positive with confidence. It is possible when all the hypotheses of S satisfy the new instance  〈Rainy Cool Normal Weak Warm Same〉 - (- by all) – Classified as negative with confidence. It is possible when the new instance is not satisfies by any hypothesis of G.  〈Sunny Warm Normal Weak Warm Same〉 - (+ by 3 and – by 3) – need for TEs, can not decide  〈Sunny Cold Normal Strong Warm Same〉 - (+ by 2 and – by 4) – Classified as negative with 67% confidence 〈Sunny ? ? Strong ? ?〉〈Sunny Warm ? ? ? ?〉 〈? Warm ? Strong ? ?〉 {〈Sunny ? ? ? ? ?〉, 〈? Warm ? ? ? ?〉} G {〈Sunny Warm ? Strong ? ?〉} S
  • 64. CSEC-4142 & PHDCS-4 @2022 Effect of incomplete hypothesis space • Preceding algorithms work if target function is in H  Will generally not work if target function not in H • Consider following examples which represent target function “sky = sunny or sky = cloudy”:  〈Sunny Warm Normal Strong Cool Change〉 Y  〈Cloudy Warm Normal Strong Cool Change〉 Y  〈〈Rainy Warm Normal Strong Cool Change〉 N
  • 65. CSEC-4142 & PHDCS-4 @2022 Effect of incomplete hypothesis space (cont.) • If apply Candidate Elimination algorithm as before, end up with empty VS  After first two TEs, S= 〈? Warm Normal Strong Cool Change〉  New hypothesis is overly general oit covers the third negative TE! • Our H does not include the appropriate c • Need more expressive hypotheses
  • 66. CSEC-4142 & PHDCS-4 @2022 Incomplete hypothesis space • If c not in H, then consider generalizing representation of H to contain c  For example, add disjunctions or negations to representation of hypotheses in H • One way to avoid problem is to allow all possible representations of h’s  Equivalent to allowing all possible subsets of instances as defining the concept of EnjoySport oRecall that there are 96 instances in EnjoySport; hence there are 296 ≈ 1028 possible hypotheses in full space H oCan do this by using full propositional calculus with AND, OR, NOT oHence H defined only by conjunctions of attributes is biased (containing only 973 h’s)
  • 67. CSEC-4142 & PHDCS-4 @2022 Unbiased Learners and Inductive Bias • BUT if have no limits on representation of hypotheses (i.e., full logical representation: and, or, not), can only learn examples…no generalization possible!  Say have 5 TEs {x1, x2, x3, x4, x5}, with x4, x5 negative TEs • Apply Candidate Elimination algorithm  S will be disjunction of positive examples (S={x1 OR x2 OR x3})  G will be negation of disjunction of negative examples (G={not (x4 or x5)})  Need to use all instances to learn the concept!
  • 68. CSEC-4142 & PHDCS-4 @2022 Unbiased Learners and Inductive Bias • “A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances” • Cannot predict usefully:  TEs have unanimous vote  other h’s have 50/50 vote! oFor every h in H that predicts +, there is another that predicts - • Approach:  Place constraints on representation of hypotheses oExample of limiting connectives to conjunctions oAllows learning of generalized hypotheses oIntroduces bias that depends on hypothesis representation
  • 69. CSEC-4142 & PHDCS-4 @2022 Inductive System and Equivalent Deductive System • Inductive bias (IB) of learning algorithm L is any minimal set of assertions B such that for any target concept c and training examples D, we can logically infer value c(x) of any instance x from B, D, and x • L (x, D) = k implies that all members of VS_HD, including c, vote for class k (unanimous voting). Therefore: c(x) = k = L( i, D ). • This means, that the output of the learner L(x, D) can be logically deduced from B ∧ D ∧ x