SlideShare a Scribd company logo
Machine Learning
Dr.M.Rajendiran
Dept. of Computer Science and Engineering
Panimalar Engineering College
Contents
Types of Machine Learning
Supervised Learning
The Brain and the Neuron
Design a Learning System
Perspectives and Issues in Machine Learning
Concept Learning Task
Concept Learning as Search
Finding a Maximally Specific Hypothesis
Version Spaces and Candidate Elimination Algorithm
Perceptron
Linear Separability
Linear Regression.
2
TYPES OF MACHINE LEARNING
Example: webpage
 To predict what software a visitor to the website might buy based on
information that you can collect.
There are a couple of interesting things in there.
 The first is the data. It might be useful to know what software visitors have
bought before, and how old they are.
 It is not possible to get that information from their web browser. So we can’t
use that information.
 Picking the variables that you want to use very important part of finding
good solutions to problems.
 The second is how to process the data can be important.
 This can be seen in the example in the time of access. Your computer can
store in millisecond. But that isn’t very useful, because of similar patterns
between users.
TYPES OF MACHINE LEARNING
 To define learning as meaning getting better at some task through
practice.
This leads to a couple of vital questions:
 How does the computer know whether it is getting better or not?
 How does it know how to improve?
 There are several different possible answers to these questions and
they produce different types of machine learning.
(1)Supervised Learning
(2)Unsupervised Learning
(3)Reinforcement Learning
(4)Evolutionary Learning
Supervised Learning
 The correct classes of the training data are known.
 A training set of examples with the correct responses (targets) is
provided.
 Based on this training set, the algorithm generalizes to respond
correctly to all possible inputs.
 This is also called learning from exemplars.
Unsupervised Learning
 The correct classes of the training data are not known.
 Correct responses are not provided, but instead the algorithm tries to
identify similarities between the inputs.
 So that inputs that have something in common are categorised together.
 The statistical approach to unsupervised learning is known as density
estimation.
Reinforcement Learning
 Allows the machine or software agent to learn its behavior based on feedback from
the environment.
 This behavior can be learnt once and for all, or keep on adapting as time goes by.
 This is somewhere between supervised and unsupervised learning.
 The algorithm gets told when the answer is wrong, but does not get told how to
correct it.
 It has to explore and try out different possibilities until it works out how to get the
answer right.
 Reinforcement learning is sometime called learning with a critic because of this
monitor that scores the answer, but does not suggest improvements.
Evolutionary learning
 Biological evolution can be seen as a learning process:
Biological organisms adapt to improve their survival rates
and chance of having offspring in their environment.
 We’ll look at how we can model this in a computer, using an
idea of fitness, which corresponds to a score for how good
the current solution is.
SUPERVISED LEARNING
 The people look at the cars and label
them:
 (1) Positive examples the cars that
they believe are family cars are
positive examples
 The other cars are negative
examples.
 Class learning is finding a description
that is shared by all positive examples
and none of the negative examples.
INPUT REPRESENTATION
 These two attributes are the inputs to
the class recognizer.
 when we decide on this particular
input representation, we are ignoring
various other attributes as irrelevant.
 x1 denote price as the first input
attribute x1 (e.g., in U.S. dollars)
 x2 denote engine power as the second
attribute x2 (e.g., engine volume in
cubic centimeters).
 Thus we represent each car using two
numeric values
SUPERVISED LEARNING  Our training data can now be plotted
in the two-dimensional (x1, x2) space
where each instance t is a data point
at coordinates and its type,
namely, positive versus negative, is
given by rt (see figure 2.1).
 The further discussions with the
expert and the analysis of the data,
such as price and engine power
should be in a certain range for
suitable values of p1, p2, e1, and e2..
HypothesisClass
 Equation 2.4 fixes H, the hypothesis
class from which C is drawn, namely,
the set of rectangles.
Hypothesis
 The learning algorithm then finds the
particular hypothesis, h∈H, to
approximate C as closely as possible.
),( 21
tt
xx
SUPERVISED LEARNING
 The aim is to find h∈H that is as similar
as possible to C.
 The hypothesis h makes a prediction for
an instance x such that
 We do not know C(x) and how to
evaluate h(x) matches C(x).
Empirical Error
 The training set X, which is a small
subset of the set of all possible empirical
error x.
 The empirical error is the proportion of
training instances where predictions of h
do not match the required values given
in X.
 The error of hypothesis h given the
training set X is
 where 1(a b) is 1 if a b and is
0 if a = b (figure 2.3).
 
SUPERVISED LEARNING
Generalizaion
 if x1 and x2 are real-valued, there are
infinitely many such h for which this is
satisfied, which the error E is 0.
 But future example somewhere close
to the boundary between positive and
negative examples.
 Different candidate hypotheses may
make different generalization
predictions.
 This is the problem of generalization.
Most Specific Hypothesis
 To find the most specific hypothesis
S, that is the tightest rectangle that
includes all the positive examples and
none of the negative examples
( figure 2.4).
 This gives us one hypothesis, h = S,
as our induced class.
 The actual class C may be larger than
S but is most general never smaller.
Figure 2.4 S is the most specific and G is the
most general hypothesis.
Most General Hypothesis
 The most general hypothesis G, is the
largest rectangle we can draw that
includes all the positive examples and
none of the negative examples (figure
2.4).
SUPERVISED LEARNING
Version space
 Any h∈ H between S and G is a
valid hypothesis with no error, said
to be consistent with the training
set, and such h make up the
version space.
Margin
 The margin, which is the distance
between the boundary and the
instances closest to it (figure 2.5).
 *
 Figure 2.5 We choose the hypothesis with
the largest margin, for best separation.
The shaded instances are those that
define (or support) the margin; other
instances can be removed without
affecting h.
Machine Learning Technique
 .
14
Machine Learning Technique
Ever since computers were invented, we have wondered
whether they might be made to learn.
If we could understand how to program them to learn-
to improve automatically with experience-the impact
would be dramatic.
Imagine computers learning from medical records
which treatments are most effective for new diseases.
Houses learning from experience to optimize energy
costs based on the particular usage patterns of their
occupant or
Personal software assistants learning the evolving
interests of their users in order to highlight especially
relevant stories from the online morning newspaper.
15
Machine Learning Technique
A detailed understanding of information processing
algorithms for machine learning might lead to a better
understanding of human learning abilities as well.
Many practical computer programs have been developed to
exhibit useful types of learning, and significant commercial
applications have begun to appear.
Problems such as speech recognition, algorithms based on
machine learning outperform all other approaches that
have been attempted to date.
In the field known as data mining, machine learning
algorithms are being used routinely to discover valuable
knowledge from large commercial databases containing
equipment maintenance records, loan applications,
financial transactions, medical records, and the like.
16
Machine Learning Technique
The field of machine learning, describing a variety of
learning paradigms, algorithms, theoretical results,
and applications.
Machine learning is inherently a multidisciplinary
field.
Such as artificial intelligence, probability and statistics,
computational complexity theory, control theory,
information theory, philosophy, psychology,
neurobiology, and other fields.
17
WELL-POSED LEARNING PROBLEMS
Let us begin our study of machine learning by considering a
few learning tasks.
We will define learning broadly, to include any computer
program that improves its performance at some task through
experience.
Definition: A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
Well-defined learning problem, we must identity these
 Learning to recognize spoken words.
 Learning to drive an autonomous vehicle
 Learning to classify new astronomical structures
 Learning to play world-class backgammon
18
WELL-POSED LEARNING PROBLEMS
1.Learning to recognize spoken words.
All of the most successful speech recognition systems employ
machine learning in some form.
Ex: The SPHINX system
This system learns speaker-specific strategies for recognizing
the primitive sounds and words from the observed speech
signal.
Neural network learning methods and methods for learning
hidden Markov models are effective for automatically
customizing to, individual speakers, vocabularies,
microphone characteristics, background noise, etc.
19
WELL-POSED LEARNING PROBLEMS
2.Learning to drive an autonomous vehicle
Machine learning methods have been used to train computer-
controlled vehicles to steer correctly when driving on a variety
of road types.
Example: the ALVINN system.
It has used its learned strategies to drive unassisted at 70 miles
per hour for 90 miles on public highways among other cars.
Similar techniques have possible applications in many sensor-
based control problems.
20
WELL-POSED LEARNING PROBLEMS
3.Learning to classify new astronomical structures
Machine learning methods have been applied to a variety of
large databases to learn general regularities implicit in the
data.
Example: Decision tree learning algorithms have been used by
NASA to learn how to classify celestial objects from the
second Palomar Observatory Sky Survey.
This system is now used to automatically classify all objects in
the Sky Survey, which consists of three terrabytes of image
data.
21
WELL-POSED LEARNING PROBLEMS
4.Learning to play world-class backgammon
The most successful computer programs for playing games
such as backgammon are based on machine learning
algorithms.
Example: The world's top computer program for
backgammon,TD-GAMMON, learned its strategy by playing
over one million practice games against itself.
 It now plays at a level competitive with the human world
champion.
Similar techniques have applications in many practical
problems where very large search spaces must be examined
efficiently.
22
WELL-POSED LEARNING PROBLEMS
Some successful applications of machine learning.
Three features:
1.The class of tasks
2.The measure of performance to be improved, and
3.The source of experience.
A checkers learning problem:
Task T: playing checkers
Performance measure P: percent of games won against
opponents
Training experience E: playing practice games against itself.
23
WELL-POSED LEARNING PROBLEMS
A handwriting recognition learning problem:
Task T: recognizing and classifying handwritten words within
images
Performance measure P: percent of words correctly
classified.
 Artificial intelligence
 Bayesian methods
 Computational complexity theory
 Control theory
 Information theory
 Philosophy
 Psychology and neurobiology
 Statistics
24
WELL-POSED LEARNING PROBLEMS
A robot driving learning problem:
Task T: driving on public four-lane highways using vision
sensors.
Performance measure P: average distance traveled before an
error (as judged by human overseer)
Training experience E: a sequence of images and steering
commands recorded while observing a human driver.
25
DESIGNING A LEARNING SYSTEM
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
5. Estimating Training Values
6. Adjusting the weights
7. The Final Design
26
DESIGNING A LEARNING SYSTEM
1.Choosing the Training Experience
 The first design is to choose the type of training experience
from which our system will learn.
 The type of training experience available can have a
significant impact on success or failure of the learner.
 One key attribute is whether the training experience
provides direct or indirect feedback regarding the choices
made by the performance system.
 A second important attribute of the training experience is
the degree to which the learner controls the sequence of
training examples.
27
DESIGNING A LEARNING SYSTEM
1.Choosing the Training Experience
 A third important attribute of the training experience is how
well it represents the distribution of examples over which the
final system performance P must be measured.
 In our checkers learning scenario, the performance metric
P is the percent of games the system wins in the world
tournament.
 If its training experience E consists only of games played
against itself.
A checkers learning problem:
 Task T: playing checkers
 Performance measure P: percent of games won in the world
tournament
 Training experience E: games played against itself
28
DESIGNING A LEARNING SYSTEM
In order to complete the design of the learning system, we must
now choose
1. The exact type of knowledge to be, learned
2. A representation for this target knowledge
3. A learning mechanism
29
DESIGNING A LEARNING SYSTEM
2. Choosing the Target Function
The next design choice is to determine exactly what type of
knowledge will be learned and how this will be used by the
performance program.
Let us begin with a checkers-playing program that can
generate the legal moves from any board state.
The program needs only to learn how to choose the best move
from among these legal moves.
We must learn to choose among the legal moves, the most
obvious choice for the type of information to be learned is
a program or function.
30
DESIGNING A LEARNING SYSTEM
2. Choosing the Target Function
Let us call this function ChooseMove and use the notation
ChooseMove : B -> M
It indicate this function accepts as input any board from the
set of legal board states B and produces as output some
move from the set of legal moves M.
We will find it useful to reduce the problem of improving
performance P at task T to the problem of learning some
particular target function such as ChooseMove.
ChooseMove is an obvious choice for the target function in
our example.
31
DESIGNING A LEARNING SYSTEM
2. Choosing the Target Function
This function will turn out to be very difficult to learn given
the kind of indirect training experience available to our
system.
Let us call this target function V and again use the notation
V : B -> R to denote that V maps any legal board state from the
set B to some real value R
We intend for this target function V to assign higher scores to
better board states.
If the system can successfully learn such a target function V,
then it can easily use it to select the best move from any
current board position.
32
DESIGNING A LEARNING SYSTEM
2. Choosing the Target Function
Therefore define the target value V(b) for an arbitrary board
state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V(b) = V(b'),
where b' is the best final board state that can be achieved
starting from b and playing optimally until the end of the
game
33
DESIGNING A LEARNING SYSTEM
2. Choosing the Target Function
This recursive definition specifies a value of V(b) for every
board state b, this definition is not usable by our checkers
player because it is not efficiently computable.
The goal of learning in this case is to discover an operational
description of V.
We have reduced the learning task in this case to the problem
of discovering an operational description of the ideal target
function V.
It may be very difficult in general to learn such an operational
form of V perfectly.
The process of learning the target function is often called
function approximation.
34
DESIGNING A LEARNING SYSTEM
3. Choosing a Representation for the Target Function
We have specified the ideal target function V.
We must choose a representation that the learning program
will use to describe the function that it will learn.
Example:
 The program to represent using a large table with a distinct
entry specifying the value for each distinct board state. (OR)
 It represent using a collection of rules that match against
features of the board state. (OR)
 A quadratic polynomial function of predefined board features
(OR)
 An artificial neural network.
35
DESIGNING A LEARNING SYSTEM
3. Choosing a Representation for the Target Function
Let us choose a simple representation: for any given board
state, the function will be calculated as a linear
combination of the following board features:
 xl: the number of black pieces on the board
 x2: the number of red pieces on the board
 x3: the number of black kings on the board
 x4: the number of red kings on the board
 x5: the number of black pieces threatened by red
(i.e., which can be captured on red's next turn)
 X6: the number of red pieces threatened by black
36
DESIGNING A LEARNING SYSTEM
3. Choosing a Representation for the Target Function
 Where w0 through w6 are numerical coefficients or weights, to
be chosen by the learning algorithm.
 Learned values for the weights w1 through w6 will determine
the relative importance of the various board features in
determining the value of the board
 Whereas the weight w0 will provide an additive constant to
the board value.
37
DESIGNING A LEARNING SYSTEM
3. Choosing a Representation for the Target Function
 The first three items above correspond to the specification of
the learning task.
 Whereas the final two items constitute design choices for the
implementation of the learning program.
 The net effect of this set of design choices is to reduce the
problem of learning a checkers strategy to the problem of
learning values for the coefficients w0 through w6 in the target
function representation.
38
DESIGNING A LEARNING SYSTEM
4.Choosing a Function Approximation Algorithm
 In order to learn the target function we require a set of
training examples.
 Each describing a specific board state b and the training value
Vtrain(b) for b.
 Each training example is an ordered pair of the form
(b, Vtrain(b)).
 The following training example describes a board state b in
which black has won the game (note x2 = 0 indicates that red
has no remaining pieces) and for which the target function
value
Vtrain(b) is therefore +100.
39
DESIGNING A LEARNING SYSTEM
4.1 ESTIMATING TRAINING VALUES
 Only training information available to our learner is whether the game
was eventually won or lost.
 On the other hand, we require training examples that assign specific
scores to specific board states.
 The fact that the game was eventually won or lost does not necessarily
indicate that every board state along the game path was necessarily
good or bad.
Example:
 If the program loses the game, it may still be the case that board states
occurring early in the game should be rated very highly and that the
cause of the loss was a subsequent poor move.
 Despite the ambiguity inherent in estimating training values for
intermediate board states, one simple approach has been found to be
surprisingly successful.
40
DESIGNING A LEARNING SYSTEM
4.1 ESTIMATING TRAINING VALUES
 .
 This rule for estimating training values can be summarized as
Rule for estimating training values.
 The current version of to estimate training values that will be
used to refine this very same function.
 We are using estimates of the value of the Successor(b) to estimate
the value of board state b.
41
DESIGNING A LEARNING SYSTEM
4.2 ADJUSTING THE WEIGHTS.
The learning algorithm for choosing the weights wi to best fit
the set of training examples {(b,Vtrain(b)}.
As a first step we must define the best fit to the training data.
One common approach is to define the best hypothesis, or set
of weights, as that which minimizes the squared error E
between the training values and the values predicted by the
hypothesis .
Several algorithms are known for finding weights of a linear
function that minimize E defined in this way.
42
DESIGNING A LEARNING SYSTEM
4.2 ADJUSTING THE WEIGHTS.
In this case, we require an algorithm that will incrementally
refine the weights as new training examples become available
and that will be robust to errors in these estimated training
values.
One such algorithm is called the Least Mean Squares, or LMS
training rule.
The LMS algorithm is defined as follows:
43
DESIGNING A LEARNING SYSTEM
4.2 ADJUSTING THE WEIGHTS.
The LMS algorithm is defined as follows:
44
DESIGNING A LEARNING SYSTEM
5.The Final Design
 The final design of our checkers
learning system can be naturally
described by four distinct program
modules that represent the central
components in many learning systems.
 These four modules, summarized in
Figure 1.1, are as follows:
 FIGURE 1.1 Final design of the
checkers learning program.
 The Performance System is the
module that must solve the given
performance task, in this case
playing checkers, by using the
learned target function(s).
 It takes an instance of a new
problem (new game) as input and
produces a trace of its solution
(game history) as output.
 The strategy used by the
Performance System to select its
next move at each step is
determined by the learned .
evaluation function.
 Therefore, we expect its
performance to improve as this
evaluation function becomes
increasingly accurate.
45
DESIGNING A LEARNING SYSTEM
5.The Final Design
 FIGURE 1.1 Final design of the
checkers learning program.
 The Critic takes as input the history or
trace of the game and produces as
output a set of training examples of the
target function.
 Each training example in this case
corresponds to some game state in the
trace, along with an estimate Vtrain of
the target function value for this
example.
 The Critic corresponds to the training
rule given by Equation.
 The Generalizer takes as input the
training examples and produces an
output hypothesis.
 That is its estimate of the target function.
 It generalizes from the specific training
examples, hypothesizing a general
function that covers these examples and
other cases beyond the training
examples.
 The Generalizer corresponds to the LMS
algorithm, and the output hypothesis is
the function * described by the learned
weights w0, . . . , W6.
DESIGNING A LEARNING SYSTEM
5.The Final Design
 FIGURE 1.1 Final design of the
checkers learning program.
 The Experiment Generator takes as
input the current hypothesis and outputs
a new problem (i.e., initial board state)
for the Performance System to explore.
 Its role is to pick new practice problems
that will maximize the learning rate of the
overall system.
 The Experiment Generator follows a
very simple strategy: It always proposes
the same initial game board to begin a
new game.
 More sophisticated strategies could
involve creating board positions
designed to explore particular regions of
the state space.
DESIGNING A LEARNING SYSTEM
5.The Final Design
 The sequence of design choices made
for the checkers program is summarized
in Figure 1.2.
 These design choices have constrained
the learning task in a number of ways.
 We have restricted the type of
knowledge that can be acquired to a
single linear evaluation function.
 We have constrained this evaluation
function to depend on only the six
specific board features provided.
 .
DESIGNING A LEARNING SYSTEM
PERSPECTIVES AND ISSUES IN MACHINE LEARNING
 One useful perspective on machine learning is that it involves searching
a very large space of possible hypotheses to determine one that best fits
the observed data and any prior knowledge held by the learner.
Example:
 Consider the space of hypotheses that could in principle be output by
the above checkers learner.
 This hypothesis space consists of all evaluation functions that can be
represented by some choice of values for the weights w0 through w6.
 The LMS algorithm for fitting weights achieves this goal by iteratively
tuning the weights, adding a correction to each weight each time the
hypothesized evaluation function predicts a value that differs from the
training value.
 This algorithm works well when the hypothesis representation
considered by the learner defines a continuously parameterized space
of potential hypotheses.
49
DESIGNING A LEARNING SYSTEM
PERSPECTIVES AND ISSUES IN MACHINE LEARNING
 Many algorithms that search a hypothesis space defined by some
underlying representation.
 Example:
Linear functions
Logical descriptions
Decision trees
Artificial neural networks
 These different hypothesis representations are appropriate for learning
different kinds of target functions.
 This perspective of learning as a search problem in order to characterize
learning methods by their search strategies and by the underlying
structure of the search spaces they explore.
50
DESIGNING A LEARNING SYSTEM
Issues in Machine Learning
 The checkers raises a number of generic questions about machine
learning.
 What algorithms exist for learning general target functions from specific
training examples?
 In what algorithms converge to the desired function, given sufficient
training data?
 Which algorithms perform best for which types of problems and
representations?
 How much training data is sufficient?
 What general bounds can be found to relate the confidence in learned
hypotheses to the amount of training experience?
 When and how can prior knowledge held by the learner guide the
process of generalizing from examples?
51
DESIGNING A LEARNING SYSTEM
Issues in Machine Learning
 Can prior knowledge be helpful even when it is only approximately
correct?
 What is the best strategy for choosing a useful next training experience?
 How does the choice of this strategy alter the complexity of the learning
problem?
 What is the best way to reduce the learning task to one or more function
approximation problems?
 What specific functions should the system attempt to learn?
 Can this process itself be automated?
 How can the learner automatically alter its representation to improve its
ability?
52
CONCEPT LEARNING
 Concept Learning: The definition of a general category
given a sample of positive and negative training examples
of the category.
 Concept learning can be formulated as a problem of
searching through a predefined space of potential
hypotheses that best fits the training examples.
 The general definition of some concept, labelled as
members or non members of the concept.
 This task is commonly referred to as concept learning or
approximating a boolean-valued function from examples.
 It Inferring a boolean-valued function from training examples
of its input and output.
53
A CONCEPT LEARNING TASK
 The task of learning the target concept "days on which my friend Aldo
enjoys his favorite water sport."
 Table 2.1 describes a set of example days, each represented by a set of
attributes.
 The attribute EnjoySport indicates whether or not Aldo enjoys his
favorite water sport on this day.
 The task is to learn to predict the value of EnjoySport for an arbitrary
day, based on the values of its other attributes.
 What hypothesis representation shall we provide to the learner in this
case?
54
A CONCEPT LEARNING TASK
 Each hypothesis be a vector of six constraints, specifying the
values of the six attributes Sky, AirTemp, Humidity, Wind, Water,
and Forecast.
 For each attribute, the hypothesis will either
 indicate by a "?' that any value is acceptable for this attribute,
specify a single required value (e.g., Warm) for the attribute, or
indicate by a "0" that no value is acceptable.
55
A CONCEPT LEARNING TASK
 If some instance x satisfies all the constraints of hypothesis h, then h
classifies x as a positive example (h(x) = 1)
 To illustrate, the hypothesis that Aldo enjoys his favorite sport only on
cold days with high humidity is represented by the expression
<?, Cold, High, ?, ?, ?>
 The most general hypothesis-that every day is a positive example-is
represented by
<?, ?, ?, ?, ?, ?>
The most specific possible hypothesis-that no day is a positive example-is
represented by <0,0,0,0,0,0>
56
A CONCEPT LEARNING TASK
 The EnjoySport concept learning task requires learning the set of
days for which EnjoySport = yes, describing this set by a
conjunction of constraints over the instance attributes.
 In general, any concept learning task can be described by the set
of instances over which the target function is defined, the target
function, the set of candidate hypotheses considered by the
learner, and the set of available training examples.
57
A CONCEPT LEARNING TASK
Notation
The following terminology when discussing concept learning
problems.
 The set of items over which the concept is defined is called the
set of instances, which we denote by X.
 X is the set of all possible days, each represented by the
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.
 The concept or function to be learned is called the target concept,
which we denote by c.
 In general, c can be any boolean-valued function defined over the
instances X;
that is, c : X-> {0,1}.
58
A CONCEPT LEARNING TASK - NOTATION
 The target concept corresponds to the value of the attribute EnjoySport
i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No
59
A CONCEPT LEARNING TASK - NOTATION
The learner is presented a set of training examples, each
consisting of an instance x from X, along with its target concept
value c(x).
Instances for which c(x) = 1 are called positive examples or
members of the target concept.
Instances for which c(x) = 0 are called negative examples or
nonmembers of the target concept.
write the ordered pair (x, c(x)) to describe the training example
consisting of the instance x and its target concept value c(x).
The symbol D to denote the set of available training examples.
60
A CONCEPT LEARNING TASK - NOTATION
Given a set of training examples of the target concept c, the
problem faced by the learner is to hypothesize or estimate c.
The symbol H to denote the set of all possible hypotheses that
the learner may consider regarding the identity of the target
concept.
Usually H is determined by the human designer's choice of
hypothesis representation.
Each hypothesis h in H represents a boolean-valued function
defined over X;
that is h : X -> {0,1).
The goal of the learner is to find a hypothesis h such that
h(x) = c(x) for all x in X.
61
CONCEPT LEARNING AS SEARCH
Concept learning can be viewed as the task of searching
through a large space of hypotheses implicitly defined by the
hypothesis representation.
The goal of this search is to find the hypothesis that best fits
the training examples.
It is important to note that by selecting a hypothesis
representation, the designer of the learning algorithm
implicitly defines the space of all hypotheses that the program
can ever represent and therefore can ever learn.
 Consider for example, the instances X and hypotheses H in
the EnjoySport learning task.
 The attribute Sky has three possible values and that AirTemp,
Humidity, Wind, Water, and Forecast each have two possible
values.
62
CONCEPT LEARNING AS SEARCH
The instance space X contains exactly 3 .2 2 .2 2 .2 = 96
distinct instances.
A similar calculation shows that there are 5.4-4 -4 -4.4 = 5 120
syntactically distinct hypotheses within H.
The number of semantically distinct hypotheses is only
1 + (4.3.3.3.3.3) = 973.
General-to-Specific Ordering of Hypotheses
We can design learning algorithms that exhaustively search
even infinite hypothesis spaces without explicitly enumerating
every hypothesis.
To illustrate the general-to-specific ordering, consider the two
hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
63
CONCEPT LEARNING AS SEARCH
The sets of instances that are classified positive by h1 and by
h2.
Because h2 imposes fewer constraints on the instance, it
classifies more instances as positive.
Any instance classified positive by h1 will also be classified
positive by h2. Therefore, we say that h2 is more general than
h1.
This intuitive "more general than" relationship between
hypotheses can be defined more precisely as follows.
First, for any instance x in X and hypothesis h in H, we say that
x satisfies h if and only if h(x) = 1.
The more-general-than_or -equalr-to relation in terms of the
sets of instances that satisfy the two hypotheses:
Given hypotheses hj and hk, hj is more-general-
than_or_equal_to hk if and only if any instance that satisfies
hk also satisfies hj.
64
CONCEPT LEARNING AS SEARCH
say that x satisfies h if and only if h(x) = 1.
The more-general-than_or -equalr-to relation in terms of the
sets of instances that satisfy the two hypotheses:
Given hypotheses hj and hk, hj is more-general-
than_or_equal_to hk if and only if any instance that satisfies
hk also satisfies hj.
Also find it useful to consider cases where one hypothesis is
strictly more general than the other.
Therefore, we will say that hj is more-general-than
65
CONCEPT LEARNING AS SEARCH
.
66
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
To illustrate this algorithm, assume the learner is given the
sequence of training examples from Table 2.1 for the
EnjoySport task.
The first step of FIND S is to initialize h to the most specific
hypothesis in H
67
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
observing the first training example from Table 2.1, which
happens to be a positive example, it becomes clear that our
hypothesis is too specific.
If none of the "0" constraints in h are satisfied by this example
then each is replaced by the next more general constraint that
fits the example; namely, the attribute values for this training
example.
This h is still very specific, all instances are negative except for
the single positive training example.
Next, the second training example forces the algorithm to
further generalize h, this time substituting a "?' in place of any
attribute value in h that is not satisfied by the new example.
The refined hypothesis in this case is
68
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
Upon encountering the third training example-in this case a
negative example- the algorithm makes no change to h.
The FIND-S algorithm simply ignores every negative example,
While this may at first seem strange, notice that in the current
case our hypothesis h is already consistent with the new
negative example, and hence no revision is needed.
To complete our trace of FIND-S, the fourth (positive)
example leads to a further generalization of h.
The FIND-S algorithm illustrates one way in which the more-
general-than partial ordering can be used to organize the
search for an acceptable hypothesis.
The search moves from hypothesis to hypothesis, searching
from the most specific to progressively more general
hypotheses along one chain of the partial ordering.
69
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
Figure 2.2 illustrates this search in terms of the instance and
hypothesis spaces.
At each step, the hypothesis is generalized only as far as
necessary to cover the new positive example.
Therefore, at each stage the hypothesis is the most specific
hypothesis consistent with the training examples observed up
to this point (hence the name FIND-S)
70
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
.
71
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
The key property of the FIND-S algorithm is that for
hypothesis spaces described by conjunctions of attribute
constraints (such as H for the EnjoySport task).
FIND-S is guaranteed to output the most specific hypothesis
within H that is consistent with the positive training
examples.
Its final hypothesis will also be consistent with the negative
examples provided the correct target concept is contained in
H, and provided the training examples are correct.
72
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
This is a second approach to concept learning.
The CANDIDATE ELIMINATION algorithm, that addresses several
of the limitations of FIND-S.
FIND-S outputs a hypothesis from H, that is consistent with the
training examples, this is just one of many hypotheses from H that
might fit the training data equally well.
The key idea in the CANDIDATE-ELIMINATION algorithm is to
output a description of the set of all hypotheses consistent with the
training examples.
It computes the description of this set without explicitly
enumerating all of its members.
The CANDIDATE-ELIMINATION algorithm has been applied to
problems such as learning regularities in chemical mass
spectroscopy and learning control rules for heuristic search.
73
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
The CANDIDATE-ELIMINATION and FIND-S algorithms
both perform poorly when given noisy training data.
It provides a useful conceptual framework for introducing
several fundamental issues in machine learning.
Representation
The CANDIDATE-ELIMINATION algorithm finds all
describable hypotheses that are consistent with the observed
training examples.
A hypothesis is consistent with the training examples if it
correctly classifies these examples.
Definition: A hypothesis h is consistent with a set of training
examples D if and only if h(x) = c(x) for each example (x, c(x))
in D.
74
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
Where x is said to satisfy hypothesis h when h(x) = 1, regardless of
whether x is a positive or negative example of the target concept.
However, whether such an example is consistent with h depends
on the target concept, and in particular, whether h(x) = c(x).
The CANDIDATE-ELIMINATION algorithm represents the set of all
hypotheses consistent with the observed training examples.
This subset of all hypotheses is called the version space with
respect to the hypothesis space H and the training examples D,
because it contains all plausible versions of the target concept.
75
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
Definition: The version space, denoted V SH,D, with respect to
hypothesis space H and training examples D, is the subset of
hypotheses from H consistent with the training examples in
D.
The LIST-THEN-ELIMINATION algorithm
The version space is simply to list all of its members.
This leads to a simple learning algorithm, which we might call
the LIST-THENELIMINATE algorithm, defined in Table 2.4.
This algorithm first initializes the version space to contain all
hypotheses in H, then eliminates any hypothesis found
inconsistent with any training example.
76
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
The version space of candidate hypotheses thus shrinks as
more examples are observed, until ideally just one hypothesis
remains that is consistent with all the observed examples.
If insufficient data is available to narrow the version space to a
single hypothesis, then the algorithm can output the entire set
of hypotheses consistent with the observed data.
The algorithm can be applied whenever the hypothesis space
H is finite.
It has many advantages, including guaranteed to output all
hypotheses consistent with the training data.
77
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
A More Compact
Representation for Version
Spaces
 The CANDIDATE-ELIMINATlON
algorithm works on the same
principle as LIST-THEN-
ELIMINATION algorithm.
 However, it employs a much more
compact representation of the
version space.
 The version space is represented by
its most general and least general
members.
 These members form general and
specific boundary sets that delimit
the version space within the partially
ordered hypothesis space.
 .
78
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
 Recall that given the four training examples
from Table 2.1, FIND-S outputs the
hypothesis
 This is one of six different hypotheses from
H that are consistent with these training
examples.
 All six hypotheses are shown in Figure 2.3.
They constitute the version space relative to
this set of data and this hypothesis
representation.
 The arrows among these six
hypotheses in Figure 2.3 indicate
instances of the more-general-than
relation.
 The CANDIDATE-ELIMINATION
algorithm represents the version
space by storing only its most general
members (labeled G in Figure 2.3)
and its most specific (labeled S in the
figure).
 The arrows among these six
hypotheses in Figure 2.3 indicate
instances of the more-general-than
relation.
79
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
 Only two sets S and G, it is
possible to enumerate all
members of the version space as
needed by generating the
hypotheses that lie between these
two sets in the general-to-specific
partial ordering over hypotheses.
 we define the boundary sets G
and S precisely and prove that
these sets do in fact represent the
version space.
 Version space for a "rectangle" hypothesis
language in two dimensions.
 Green pluses are positive examples, and
red circles are negative examples.
 GB is the maximally general positive
hypothesis boundary.
 SB is the maximally specific positive
hypothesis boundary.
 The intermediate (thin) rectangles
represent the hypotheses in the version
space.
80
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
 Only two sets S and G, it is possible to enumerate all members of the
version space as needed by generating the hypotheses that lie between
these two sets in the general-to-specific partial ordering over hypotheses.
 The boundary sets G and S precisely and prove that these sets do in fact
represent the version space.
 The version space is precisely the set of hypotheses contained in G, plus
those contained in S, plus those that lie between G and S in the partially
ordered hypothesis space. This is stated precisely in Theorem 2.1.
81
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
To prove the theorem it suffices to show that
(1) Every h satisfying the right hand side of the above expression is in V SH,D
(2) Every member of V SH,D satisfies the right-hand side of the expression.
Let g be an arbitrary member of G, s be an arbitrary member of S, and h
be an arbitrary member of H.
82
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
CANDIDATE-ELIMINATION Algorithm
The CANDIDATE-ELIMINATION algorithm computes the version
space containing all hypotheses from H that are consistent with an
observed sequence of training examples.
It begins by initializing the version space to the set of all hypotheses
in H, that is, by initializing the G boundary set to contain the most
general hypothesis in H
and initializing the S boundary set to contain the most specific
hypothesis
These two boundary sets delimit the entire hypothesis space,
because every other hypothesis in H is both more general than So
and more specific than Go.
83
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
CANDIDATE-ELIMINATION Algorithm
Each training example is considered, the S and G boundary sets are
generalized and specialized respectively
To eliminate from the version space any hypotheses found
inconsistent with the new training example.
The computed version space contains all the hypotheses consistent
with these examples and only these hypotheses.
This algorithm is summarized in Table 2.5.
84
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
CANDIDATE-ELIMINATION Algorithm
 .
 Training Examples:
 T1:
(Sunny,Warm,Normal,Strong,Warm,Same),Yes
 T2: (Sunny,Warm,High,Strong,Warm,Same),Yes
 T3: (Rainy,Cold,High,Strong,Warm,Change),No
 T4: (Sunny,Warm,High,Strong,Cool,Change),Yes
85
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
Figure 2.4 traces the
CANDIDATE -ELIMINATlON
Algorithm applied to the first two
training examples from Table 2.1.
First boundary initialized to G0
and S0, the most general and
most specific hypotheses in H,
respectively.
When the first training example
is presented , the CEA checks
the S boundary and finds that it
is overly specific-it fails to cover
the positive example.
 .
 Fig:2.4 CANDIDATE-ELIMINATION
Trace 1. S0 and G0 are the initial
boundary sets corresponding to the
most specific and most general
hypotheses.
 Training examples 1 and 2 force the S
boundary to become more general, as
in the FIND-S algorithm. They have no
effect on the G boundary.
86
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
 The boundary is revised by moving it to
the least more general hypothesis that
covers this new example.
 This revised boundary is shown as S1 in
Figure 2.4
 No update of the G boundary is
needed in response to this training
example because G0 correctly covers
this example.
 When the second training example is
observed, it has a similar effect of
generalizing S further to S2, leaving G
again unchanged (i.e., G2 = G1 = G0).
 .
 Fig:2.4 CANDIDATE-ELIMINATION
Trace 1. S0 and G0 are the initial
boundary sets corresponding to the
most specific and most general
hypotheses.
 Training examples 1 and 2 force the S
boundary to become more general, as
in the FIND-S algorithm. They have no
effect on the G boundary.
87
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
Negative training examples play
the complimentary role of forcing
the G boundary to become
increasingly specific.
Consider the third training
example, shown in Figure 2.5.
The hypothesis in the G
boundary must therefore be
specialized until it correctly
classifies this new negative
example.
Figure 2.5, there are several
alternative minimally more
specific hypotheses.
 .
 Figure 2.5: CEA Trace 2: Training
example 3 is a negative example
that forces the G2 boundary to
be specialized to G3.
 Note several alternative
maximally general hypotheses
are included in G3.
88
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
The fourth training example,
as shown in Figure 2.6.
Further generalizes the S
boundary of the version space.
It also results in removing one
member of the G boundary,
because this member fails to
cover the new positive example.
After processing these four
examples, the boundary sets S4
and G4 delimit the version space
of all hypotheses consistent with
the set of incrementally observed
training examples.
 .
 Figure 2.6. CEA Trace T3:. The
positive training example
generalizes the S boundary, from
S3 to S4.
 One member of G3 must also be
deleted, because it is no longer
more general than the S4
boundary.
89
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
The entire version space,
including those hypotheses
bounded by S4 and G4, is shown
in Figure 2.7.
This learned version space is
independent of the sequence in
which the training examples are
presented (because in the end it
contains all hypotheses
consistent with the set of
examples).
 .
 Figure 2.7. The final version
space for the EnjoySport concept
learning problem and training
examples described earlier.
90
Linear Separability
91
Linear separability
 Consider two-input patterns (X1,X2)
being classified into two classes as
shown in figure.
 Each point with either symbol of
x or o represents a pattern with a set of
values (X1,X2).
 Each pattern is classified into one of
two classes.
 These classes can be separated with
a single line L. They are known as
linearly separable patterns.
 Linear separability refers to the fact that
classes of patterns with
n-dimensional vector x = (x1, x2, ... ,
xn) can be separated with a single
decision surface.
 The line L represents the decision
surface.
 Example.
Linear separability
 The processing unit of a single-layer
perceptron network is able to
categorize a set of patterns into two
classes as the linear threshold function
defines their linear separability.
 The two classes must be linearly
separable in order for the perceptron
network to function correctly. This is the
main limitation of a single-layer
perceptron network.
Example:
 The most classic example of linearly
inseparable pattern is a logical
exclusive-OR (XOR) function. Shown in
figure. 0 for black dot and 1 for white
dot, cannot be separated with a single
line.
 The solution seems that patterns of
(X1,X2) can be logically classified with
two lines L1 and L2.
 Example.
Linear separability
 FIGURE 3.4 Data for the OR logic function and a
plot of the four datapoints.
 The graph on the right side of
Figure 3.4, to draw a straight line that
separates out the crosses from the circles
without difficulty. (it is done in Figure 3.6).
 What the Perceptron does? It tries to
find a straight line where the neuron fires
on one side of the line, and doesn’t on the
other.
 This line is called the decision boundary
or discriminant function (Figure 3.7.)
 Example.
 FIGURE 3.6 The decision boundary computed by a
Perceptron for the OR function.
 Figure 3.6 shows the decision boundary,
which shows when the decision about which
class to categorise the input as changes
from crosses to circles.
 FIGURE 3.7 A decision boundary separating two classes of
data.
Linear separability
 Consider one input vector x. The
neuron fires if
 w is the row of W that connects the
inputs to one particular neuron.
 wT denotes the transpose of w and is
used to make both of the vectors into
column vectors.
 The a · b notation describes the inner
or scalar product between two
vectors.
 Where is the angle between a and b
and is the length of the vector a.
 In perceptron the boundary case is
where to find an input vector x1 that
has x1 · wT = 0.
 Another input vector x2 that satisfies x2
· wT = 0.
 Putting these two equations together we
get:
 FIGURE 3.7 A decision boundary separating
two classes of data.
Linear separability
 It is worth thinking about what happens
when we have more than one output
neuron.
 The weights for each neuron separately
describe a straight line.
 Several neurons we get several straight
lines that each try to separate different
parts of the space.
 Figure 3.8 shows an example of
decision boundaries computed by a
Perceptron with four neurons.
 FIGURE 3.8 Different decision boundaries
computed by a Perceptron with four neurons.
The Perceptron Convergence Theorem
 Perceptron will converge to a solution
that separates the classes, and that it
will do it after a finite number of
iterations.
 The number of iterations is bounded by
 is the distance between the
separating hyperplane and the closest
datapoint to it.
 Assume that the length of every input
vector that they are bounded by
some constant R.
 We know that there is some weight
vector w* that separates the data and
it is linearly separable.
 The Perceptron learning algorithm aims
to find some vector w that is parallel
to w*.
 When two vectors are parallel we use
the inner product w* . w.
 When the two vectors are parallel, the
angle between them is = 0 and so
cos = 1. So the size inner product is
maximum.
 If each weight update W* . W increases,
then algorithm will converge.
 We do need a little bit more,
 There are two checks that we need to
make:
(1) The value of w*·w (2) the length of w.
 Suppose the tth iteration of the algorithm,
we get
 (t-1) index means the weights at the
(t-1)st step. This weight update will be
φ
φ
The Perceptron Convergence Theorem
 We need to do some computation, how
this changes the two values.
 Where is that smallest distance
between the optimal hyperplane
defined by W* and any datapoint.
 This means that inner product
increases by at least , after t updates
of the weights,
 By using the Cauchy–Schwartz
inequality,
 The length of the weight vector after t
steps is:
 .
 So, . Hence after we have made
that many updates the algorithm must
have converged.
γ
γ
LINEAR REGRESSION
99
Linear separability
Linear regression is a linear
approach for modeling the
relationship between a scalar
dependent variable y and one
or more explanatory variables
(or independent variables)
denoted X.
The case of one explanatory
variable is called simple
linear regression.
 Example.
 In linear regression, the observations
(red) are assumed to be the result of
random deviations (green) from an
underlying relationship (blue) between
the dependent variable (y) and
independent variable (x).
Linear separability
 It is common to turn classification
problems into regression problems.
 This can be done in two ways.
 First by introducing an indicator
variable, which simply says which class
each datapoint belongs to.
 The problem is now to use the data to
predict the indicator variable, which is a
regression problem.
 The second approach is to do repeated
regression, once for each class, with
the indicator value being 1 for
examples in the class and 0 for all of
the others.
 Since classification can be replaced by
regression using these methods.
 The difference between the Perceptron
and more statistical approaches is in
the way that the problem is set up.
 Example.
 FIGURE 3.13 Linear regression in two
and three dimensions.
Linear separability
 For regression we are making a
prediction about an unknown value y
by computing some function of known
values xi.
 y such as the indicator variable for
classes or a future value of some data.
 The output y is going to be a sum of the
xi values, each multiplied by a
constant parameter:
 The define a straight line that goes
through the datapoints.
 Figure 3.13 shows this in two and three
dimensions.
 We define the line that best fits the
data.
 Example.
 FIGURE 3.13 Linear regression in two
and three dimensions.
Linear separability
 The most common solution is to try to
minimise the distance between each
datapoint and the line that we fit.
 we can use Pythagoras’ theorem to
know the distance.
 Now, we can try to minimise an error
function that measures the sum of all
these distances.
 If we ignore the square roots and
minimise the sum-of-squares of the
errors, then we get the most common
minimisation, which is known as least-
squares optimisation.
 In order to minimise the squared
difference between the prediction and
the actual data value, summed over all
of the datapoints.
 That is, we have:
 .
 This can be written in matrix form as:
 where t is a column vector containing the
targets
 X is the matrix of input values (even
including the bias inputs), just as for the
Perceptron.
 Computing the smallest value of this
means differentiating it with respect to
the (column) parameter vector and
setting the derivative to 0.
 Therefore

Linear separability
 Remembering that
and the term
 which has the solution
Reference:
1.Ethem Alpaydin, ―Introduction to Machine Learning 3e (Adaptive
Computation and Machine Learning Series)‖, Third Edition, MIT
Press, 2014
2.Tom M Mitchell, ―Machine Learning‖, First Edition, McGraw Hill
Education, 2013.
105

More Related Content

What's hot

Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
Back propagation
Back propagationBack propagation
Back propagation
Nagarajan
 
Learning by analogy
Learning by analogyLearning by analogy
Learning by analogy
Nitesh Singh
 
Deep learning
Deep learningDeep learning
Deep learning
Mohamed Loey
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
Student Conscious Club
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ASHOK KUMAR
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
SwatiTripathi44
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
swapnac12
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
Nagarajan
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Mohamed Talaat
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Machine learning
Machine learningMachine learning
Machine learning
omaraldabash
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
Amar Jukuntla
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 

What's hot (20)

Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Back propagation
Back propagationBack propagation
Back propagation
 
Learning by analogy
Learning by analogyLearning by analogy
Learning by analogy
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Machine learning
Machine learningMachine learning
Machine learning
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 

Similar to Introduction to Machine Learning

Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
Osama Ghandour Geris
 
Statistical foundations of ml
Statistical foundations of mlStatistical foundations of ml
Statistical foundations of ml
Vipul Kalamkar
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
HassanElalfy4
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
Abdulrahman181781
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
VikramKumar790542
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.ppt
SeshuSrinivas2
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine LearningPWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
Tristan Penman
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
nep_test_account
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.butest
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics course
gideymichael
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
Rizwan Shaukat
 
Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introduction
ARVIND SARDAR
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
ARVIND SARDAR
 

Similar to Introduction to Machine Learning (20)

Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
Statistical foundations of ml
Statistical foundations of mlStatistical foundations of ml
Statistical foundations of ml
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.ppt
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine LearningPWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics course
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introduction
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 

More from Panimalar Engineering College

Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
Panimalar Engineering College
 
Python for Beginners(v3)
Python for Beginners(v3)Python for Beginners(v3)
Python for Beginners(v3)
Panimalar Engineering College
 
Python for Beginners(v2)
Python for Beginners(v2)Python for Beginners(v2)
Python for Beginners(v2)
Panimalar Engineering College
 
Python for Beginners(v1)
Python for Beginners(v1)Python for Beginners(v1)
Python for Beginners(v1)
Panimalar Engineering College
 
Cellular Networks
Cellular NetworksCellular Networks
Wireless Networks
Wireless NetworksWireless Networks
Wireless Networks
Wireless NetworksWireless Networks
Multiplexing,LAN Cabling,Routers,Core and Distribution Networks
Multiplexing,LAN Cabling,Routers,Core and Distribution NetworksMultiplexing,LAN Cabling,Routers,Core and Distribution Networks
Multiplexing,LAN Cabling,Routers,Core and Distribution Networks
Panimalar Engineering College
 

More from Panimalar Engineering College (8)

Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Python for Beginners(v3)
Python for Beginners(v3)Python for Beginners(v3)
Python for Beginners(v3)
 
Python for Beginners(v2)
Python for Beginners(v2)Python for Beginners(v2)
Python for Beginners(v2)
 
Python for Beginners(v1)
Python for Beginners(v1)Python for Beginners(v1)
Python for Beginners(v1)
 
Cellular Networks
Cellular NetworksCellular Networks
Cellular Networks
 
Wireless Networks
Wireless NetworksWireless Networks
Wireless Networks
 
Wireless Networks
Wireless NetworksWireless Networks
Wireless Networks
 
Multiplexing,LAN Cabling,Routers,Core and Distribution Networks
Multiplexing,LAN Cabling,Routers,Core and Distribution NetworksMultiplexing,LAN Cabling,Routers,Core and Distribution Networks
Multiplexing,LAN Cabling,Routers,Core and Distribution Networks
 

Recently uploaded

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 

Recently uploaded (20)

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 

Introduction to Machine Learning

  • 1. Machine Learning Dr.M.Rajendiran Dept. of Computer Science and Engineering Panimalar Engineering College
  • 2. Contents Types of Machine Learning Supervised Learning The Brain and the Neuron Design a Learning System Perspectives and Issues in Machine Learning Concept Learning Task Concept Learning as Search Finding a Maximally Specific Hypothesis Version Spaces and Candidate Elimination Algorithm Perceptron Linear Separability Linear Regression. 2
  • 3. TYPES OF MACHINE LEARNING Example: webpage  To predict what software a visitor to the website might buy based on information that you can collect. There are a couple of interesting things in there.  The first is the data. It might be useful to know what software visitors have bought before, and how old they are.  It is not possible to get that information from their web browser. So we can’t use that information.  Picking the variables that you want to use very important part of finding good solutions to problems.  The second is how to process the data can be important.  This can be seen in the example in the time of access. Your computer can store in millisecond. But that isn’t very useful, because of similar patterns between users.
  • 4. TYPES OF MACHINE LEARNING  To define learning as meaning getting better at some task through practice. This leads to a couple of vital questions:  How does the computer know whether it is getting better or not?  How does it know how to improve?  There are several different possible answers to these questions and they produce different types of machine learning. (1)Supervised Learning (2)Unsupervised Learning (3)Reinforcement Learning (4)Evolutionary Learning
  • 5. Supervised Learning  The correct classes of the training data are known.  A training set of examples with the correct responses (targets) is provided.  Based on this training set, the algorithm generalizes to respond correctly to all possible inputs.  This is also called learning from exemplars.
  • 6. Unsupervised Learning  The correct classes of the training data are not known.  Correct responses are not provided, but instead the algorithm tries to identify similarities between the inputs.  So that inputs that have something in common are categorised together.  The statistical approach to unsupervised learning is known as density estimation.
  • 7. Reinforcement Learning  Allows the machine or software agent to learn its behavior based on feedback from the environment.  This behavior can be learnt once and for all, or keep on adapting as time goes by.  This is somewhere between supervised and unsupervised learning.  The algorithm gets told when the answer is wrong, but does not get told how to correct it.  It has to explore and try out different possibilities until it works out how to get the answer right.  Reinforcement learning is sometime called learning with a critic because of this monitor that scores the answer, but does not suggest improvements.
  • 8. Evolutionary learning  Biological evolution can be seen as a learning process: Biological organisms adapt to improve their survival rates and chance of having offspring in their environment.  We’ll look at how we can model this in a computer, using an idea of fitness, which corresponds to a score for how good the current solution is.
  • 9. SUPERVISED LEARNING  The people look at the cars and label them:  (1) Positive examples the cars that they believe are family cars are positive examples  The other cars are negative examples.  Class learning is finding a description that is shared by all positive examples and none of the negative examples. INPUT REPRESENTATION  These two attributes are the inputs to the class recognizer.  when we decide on this particular input representation, we are ignoring various other attributes as irrelevant.  x1 denote price as the first input attribute x1 (e.g., in U.S. dollars)  x2 denote engine power as the second attribute x2 (e.g., engine volume in cubic centimeters).  Thus we represent each car using two numeric values
  • 10. SUPERVISED LEARNING  Our training data can now be plotted in the two-dimensional (x1, x2) space where each instance t is a data point at coordinates and its type, namely, positive versus negative, is given by rt (see figure 2.1).  The further discussions with the expert and the analysis of the data, such as price and engine power should be in a certain range for suitable values of p1, p2, e1, and e2.. HypothesisClass  Equation 2.4 fixes H, the hypothesis class from which C is drawn, namely, the set of rectangles. Hypothesis  The learning algorithm then finds the particular hypothesis, h∈H, to approximate C as closely as possible. ),( 21 tt xx
  • 11. SUPERVISED LEARNING  The aim is to find h∈H that is as similar as possible to C.  The hypothesis h makes a prediction for an instance x such that  We do not know C(x) and how to evaluate h(x) matches C(x). Empirical Error  The training set X, which is a small subset of the set of all possible empirical error x.  The empirical error is the proportion of training instances where predictions of h do not match the required values given in X.  The error of hypothesis h given the training set X is  where 1(a b) is 1 if a b and is 0 if a = b (figure 2.3).  
  • 12. SUPERVISED LEARNING Generalizaion  if x1 and x2 are real-valued, there are infinitely many such h for which this is satisfied, which the error E is 0.  But future example somewhere close to the boundary between positive and negative examples.  Different candidate hypotheses may make different generalization predictions.  This is the problem of generalization. Most Specific Hypothesis  To find the most specific hypothesis S, that is the tightest rectangle that includes all the positive examples and none of the negative examples ( figure 2.4).  This gives us one hypothesis, h = S, as our induced class.  The actual class C may be larger than S but is most general never smaller. Figure 2.4 S is the most specific and G is the most general hypothesis. Most General Hypothesis  The most general hypothesis G, is the largest rectangle we can draw that includes all the positive examples and none of the negative examples (figure 2.4).
  • 13. SUPERVISED LEARNING Version space  Any h∈ H between S and G is a valid hypothesis with no error, said to be consistent with the training set, and such h make up the version space. Margin  The margin, which is the distance between the boundary and the instances closest to it (figure 2.5).  *  Figure 2.5 We choose the hypothesis with the largest margin, for best separation. The shaded instances are those that define (or support) the margin; other instances can be removed without affecting h.
  • 15. Machine Learning Technique Ever since computers were invented, we have wondered whether they might be made to learn. If we could understand how to program them to learn- to improve automatically with experience-the impact would be dramatic. Imagine computers learning from medical records which treatments are most effective for new diseases. Houses learning from experience to optimize energy costs based on the particular usage patterns of their occupant or Personal software assistants learning the evolving interests of their users in order to highlight especially relevant stories from the online morning newspaper. 15
  • 16. Machine Learning Technique A detailed understanding of information processing algorithms for machine learning might lead to a better understanding of human learning abilities as well. Many practical computer programs have been developed to exhibit useful types of learning, and significant commercial applications have begun to appear. Problems such as speech recognition, algorithms based on machine learning outperform all other approaches that have been attempted to date. In the field known as data mining, machine learning algorithms are being used routinely to discover valuable knowledge from large commercial databases containing equipment maintenance records, loan applications, financial transactions, medical records, and the like. 16
  • 17. Machine Learning Technique The field of machine learning, describing a variety of learning paradigms, algorithms, theoretical results, and applications. Machine learning is inherently a multidisciplinary field. Such as artificial intelligence, probability and statistics, computational complexity theory, control theory, information theory, philosophy, psychology, neurobiology, and other fields. 17
  • 18. WELL-POSED LEARNING PROBLEMS Let us begin our study of machine learning by considering a few learning tasks. We will define learning broadly, to include any computer program that improves its performance at some task through experience. Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Well-defined learning problem, we must identity these  Learning to recognize spoken words.  Learning to drive an autonomous vehicle  Learning to classify new astronomical structures  Learning to play world-class backgammon 18
  • 19. WELL-POSED LEARNING PROBLEMS 1.Learning to recognize spoken words. All of the most successful speech recognition systems employ machine learning in some form. Ex: The SPHINX system This system learns speaker-specific strategies for recognizing the primitive sounds and words from the observed speech signal. Neural network learning methods and methods for learning hidden Markov models are effective for automatically customizing to, individual speakers, vocabularies, microphone characteristics, background noise, etc. 19
  • 20. WELL-POSED LEARNING PROBLEMS 2.Learning to drive an autonomous vehicle Machine learning methods have been used to train computer- controlled vehicles to steer correctly when driving on a variety of road types. Example: the ALVINN system. It has used its learned strategies to drive unassisted at 70 miles per hour for 90 miles on public highways among other cars. Similar techniques have possible applications in many sensor- based control problems. 20
  • 21. WELL-POSED LEARNING PROBLEMS 3.Learning to classify new astronomical structures Machine learning methods have been applied to a variety of large databases to learn general regularities implicit in the data. Example: Decision tree learning algorithms have been used by NASA to learn how to classify celestial objects from the second Palomar Observatory Sky Survey. This system is now used to automatically classify all objects in the Sky Survey, which consists of three terrabytes of image data. 21
  • 22. WELL-POSED LEARNING PROBLEMS 4.Learning to play world-class backgammon The most successful computer programs for playing games such as backgammon are based on machine learning algorithms. Example: The world's top computer program for backgammon,TD-GAMMON, learned its strategy by playing over one million practice games against itself.  It now plays at a level competitive with the human world champion. Similar techniques have applications in many practical problems where very large search spaces must be examined efficiently. 22
  • 23. WELL-POSED LEARNING PROBLEMS Some successful applications of machine learning. Three features: 1.The class of tasks 2.The measure of performance to be improved, and 3.The source of experience. A checkers learning problem: Task T: playing checkers Performance measure P: percent of games won against opponents Training experience E: playing practice games against itself. 23
  • 24. WELL-POSED LEARNING PROBLEMS A handwriting recognition learning problem: Task T: recognizing and classifying handwritten words within images Performance measure P: percent of words correctly classified.  Artificial intelligence  Bayesian methods  Computational complexity theory  Control theory  Information theory  Philosophy  Psychology and neurobiology  Statistics 24
  • 25. WELL-POSED LEARNING PROBLEMS A robot driving learning problem: Task T: driving on public four-lane highways using vision sensors. Performance measure P: average distance traveled before an error (as judged by human overseer) Training experience E: a sequence of images and steering commands recorded while observing a human driver. 25
  • 26. DESIGNING A LEARNING SYSTEM 1. Choosing the Training Experience 2. Choosing the Target Function 3. Choosing a Representation for the Target Function 4. Choosing a Function Approximation Algorithm 5. Estimating Training Values 6. Adjusting the weights 7. The Final Design 26
  • 27. DESIGNING A LEARNING SYSTEM 1.Choosing the Training Experience  The first design is to choose the type of training experience from which our system will learn.  The type of training experience available can have a significant impact on success or failure of the learner.  One key attribute is whether the training experience provides direct or indirect feedback regarding the choices made by the performance system.  A second important attribute of the training experience is the degree to which the learner controls the sequence of training examples. 27
  • 28. DESIGNING A LEARNING SYSTEM 1.Choosing the Training Experience  A third important attribute of the training experience is how well it represents the distribution of examples over which the final system performance P must be measured.  In our checkers learning scenario, the performance metric P is the percent of games the system wins in the world tournament.  If its training experience E consists only of games played against itself. A checkers learning problem:  Task T: playing checkers  Performance measure P: percent of games won in the world tournament  Training experience E: games played against itself 28
  • 29. DESIGNING A LEARNING SYSTEM In order to complete the design of the learning system, we must now choose 1. The exact type of knowledge to be, learned 2. A representation for this target knowledge 3. A learning mechanism 29
  • 30. DESIGNING A LEARNING SYSTEM 2. Choosing the Target Function The next design choice is to determine exactly what type of knowledge will be learned and how this will be used by the performance program. Let us begin with a checkers-playing program that can generate the legal moves from any board state. The program needs only to learn how to choose the best move from among these legal moves. We must learn to choose among the legal moves, the most obvious choice for the type of information to be learned is a program or function. 30
  • 31. DESIGNING A LEARNING SYSTEM 2. Choosing the Target Function Let us call this function ChooseMove and use the notation ChooseMove : B -> M It indicate this function accepts as input any board from the set of legal board states B and produces as output some move from the set of legal moves M. We will find it useful to reduce the problem of improving performance P at task T to the problem of learning some particular target function such as ChooseMove. ChooseMove is an obvious choice for the target function in our example. 31
  • 32. DESIGNING A LEARNING SYSTEM 2. Choosing the Target Function This function will turn out to be very difficult to learn given the kind of indirect training experience available to our system. Let us call this target function V and again use the notation V : B -> R to denote that V maps any legal board state from the set B to some real value R We intend for this target function V to assign higher scores to better board states. If the system can successfully learn such a target function V, then it can easily use it to select the best move from any current board position. 32
  • 33. DESIGNING A LEARNING SYSTEM 2. Choosing the Target Function Therefore define the target value V(b) for an arbitrary board state b in B, as follows: 1. if b is a final board state that is won, then V(b) = 100 2. if b is a final board state that is lost, then V(b) = -100 3. if b is a final board state that is drawn, then V(b) = 0 4. if b is a not a final state in the game, then V(b) = V(b'), where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game 33
  • 34. DESIGNING A LEARNING SYSTEM 2. Choosing the Target Function This recursive definition specifies a value of V(b) for every board state b, this definition is not usable by our checkers player because it is not efficiently computable. The goal of learning in this case is to discover an operational description of V. We have reduced the learning task in this case to the problem of discovering an operational description of the ideal target function V. It may be very difficult in general to learn such an operational form of V perfectly. The process of learning the target function is often called function approximation. 34
  • 35. DESIGNING A LEARNING SYSTEM 3. Choosing a Representation for the Target Function We have specified the ideal target function V. We must choose a representation that the learning program will use to describe the function that it will learn. Example:  The program to represent using a large table with a distinct entry specifying the value for each distinct board state. (OR)  It represent using a collection of rules that match against features of the board state. (OR)  A quadratic polynomial function of predefined board features (OR)  An artificial neural network. 35
  • 36. DESIGNING A LEARNING SYSTEM 3. Choosing a Representation for the Target Function Let us choose a simple representation: for any given board state, the function will be calculated as a linear combination of the following board features:  xl: the number of black pieces on the board  x2: the number of red pieces on the board  x3: the number of black kings on the board  x4: the number of red kings on the board  x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)  X6: the number of red pieces threatened by black 36
  • 37. DESIGNING A LEARNING SYSTEM 3. Choosing a Representation for the Target Function  Where w0 through w6 are numerical coefficients or weights, to be chosen by the learning algorithm.  Learned values for the weights w1 through w6 will determine the relative importance of the various board features in determining the value of the board  Whereas the weight w0 will provide an additive constant to the board value. 37
  • 38. DESIGNING A LEARNING SYSTEM 3. Choosing a Representation for the Target Function  The first three items above correspond to the specification of the learning task.  Whereas the final two items constitute design choices for the implementation of the learning program.  The net effect of this set of design choices is to reduce the problem of learning a checkers strategy to the problem of learning values for the coefficients w0 through w6 in the target function representation. 38
  • 39. DESIGNING A LEARNING SYSTEM 4.Choosing a Function Approximation Algorithm  In order to learn the target function we require a set of training examples.  Each describing a specific board state b and the training value Vtrain(b) for b.  Each training example is an ordered pair of the form (b, Vtrain(b)).  The following training example describes a board state b in which black has won the game (note x2 = 0 indicates that red has no remaining pieces) and for which the target function value Vtrain(b) is therefore +100. 39
  • 40. DESIGNING A LEARNING SYSTEM 4.1 ESTIMATING TRAINING VALUES  Only training information available to our learner is whether the game was eventually won or lost.  On the other hand, we require training examples that assign specific scores to specific board states.  The fact that the game was eventually won or lost does not necessarily indicate that every board state along the game path was necessarily good or bad. Example:  If the program loses the game, it may still be the case that board states occurring early in the game should be rated very highly and that the cause of the loss was a subsequent poor move.  Despite the ambiguity inherent in estimating training values for intermediate board states, one simple approach has been found to be surprisingly successful. 40
  • 41. DESIGNING A LEARNING SYSTEM 4.1 ESTIMATING TRAINING VALUES  .  This rule for estimating training values can be summarized as Rule for estimating training values.  The current version of to estimate training values that will be used to refine this very same function.  We are using estimates of the value of the Successor(b) to estimate the value of board state b. 41
  • 42. DESIGNING A LEARNING SYSTEM 4.2 ADJUSTING THE WEIGHTS. The learning algorithm for choosing the weights wi to best fit the set of training examples {(b,Vtrain(b)}. As a first step we must define the best fit to the training data. One common approach is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the training values and the values predicted by the hypothesis . Several algorithms are known for finding weights of a linear function that minimize E defined in this way. 42
  • 43. DESIGNING A LEARNING SYSTEM 4.2 ADJUSTING THE WEIGHTS. In this case, we require an algorithm that will incrementally refine the weights as new training examples become available and that will be robust to errors in these estimated training values. One such algorithm is called the Least Mean Squares, or LMS training rule. The LMS algorithm is defined as follows: 43
  • 44. DESIGNING A LEARNING SYSTEM 4.2 ADJUSTING THE WEIGHTS. The LMS algorithm is defined as follows: 44
  • 45. DESIGNING A LEARNING SYSTEM 5.The Final Design  The final design of our checkers learning system can be naturally described by four distinct program modules that represent the central components in many learning systems.  These four modules, summarized in Figure 1.1, are as follows:  FIGURE 1.1 Final design of the checkers learning program.  The Performance System is the module that must solve the given performance task, in this case playing checkers, by using the learned target function(s).  It takes an instance of a new problem (new game) as input and produces a trace of its solution (game history) as output.  The strategy used by the Performance System to select its next move at each step is determined by the learned . evaluation function.  Therefore, we expect its performance to improve as this evaluation function becomes increasingly accurate. 45
  • 46. DESIGNING A LEARNING SYSTEM 5.The Final Design  FIGURE 1.1 Final design of the checkers learning program.  The Critic takes as input the history or trace of the game and produces as output a set of training examples of the target function.  Each training example in this case corresponds to some game state in the trace, along with an estimate Vtrain of the target function value for this example.  The Critic corresponds to the training rule given by Equation.  The Generalizer takes as input the training examples and produces an output hypothesis.  That is its estimate of the target function.  It generalizes from the specific training examples, hypothesizing a general function that covers these examples and other cases beyond the training examples.  The Generalizer corresponds to the LMS algorithm, and the output hypothesis is the function * described by the learned weights w0, . . . , W6.
  • 47. DESIGNING A LEARNING SYSTEM 5.The Final Design  FIGURE 1.1 Final design of the checkers learning program.  The Experiment Generator takes as input the current hypothesis and outputs a new problem (i.e., initial board state) for the Performance System to explore.  Its role is to pick new practice problems that will maximize the learning rate of the overall system.  The Experiment Generator follows a very simple strategy: It always proposes the same initial game board to begin a new game.  More sophisticated strategies could involve creating board positions designed to explore particular regions of the state space.
  • 48. DESIGNING A LEARNING SYSTEM 5.The Final Design  The sequence of design choices made for the checkers program is summarized in Figure 1.2.  These design choices have constrained the learning task in a number of ways.  We have restricted the type of knowledge that can be acquired to a single linear evaluation function.  We have constrained this evaluation function to depend on only the six specific board features provided.  .
  • 49. DESIGNING A LEARNING SYSTEM PERSPECTIVES AND ISSUES IN MACHINE LEARNING  One useful perspective on machine learning is that it involves searching a very large space of possible hypotheses to determine one that best fits the observed data and any prior knowledge held by the learner. Example:  Consider the space of hypotheses that could in principle be output by the above checkers learner.  This hypothesis space consists of all evaluation functions that can be represented by some choice of values for the weights w0 through w6.  The LMS algorithm for fitting weights achieves this goal by iteratively tuning the weights, adding a correction to each weight each time the hypothesized evaluation function predicts a value that differs from the training value.  This algorithm works well when the hypothesis representation considered by the learner defines a continuously parameterized space of potential hypotheses. 49
  • 50. DESIGNING A LEARNING SYSTEM PERSPECTIVES AND ISSUES IN MACHINE LEARNING  Many algorithms that search a hypothesis space defined by some underlying representation.  Example: Linear functions Logical descriptions Decision trees Artificial neural networks  These different hypothesis representations are appropriate for learning different kinds of target functions.  This perspective of learning as a search problem in order to characterize learning methods by their search strategies and by the underlying structure of the search spaces they explore. 50
  • 51. DESIGNING A LEARNING SYSTEM Issues in Machine Learning  The checkers raises a number of generic questions about machine learning.  What algorithms exist for learning general target functions from specific training examples?  In what algorithms converge to the desired function, given sufficient training data?  Which algorithms perform best for which types of problems and representations?  How much training data is sufficient?  What general bounds can be found to relate the confidence in learned hypotheses to the amount of training experience?  When and how can prior knowledge held by the learner guide the process of generalizing from examples? 51
  • 52. DESIGNING A LEARNING SYSTEM Issues in Machine Learning  Can prior knowledge be helpful even when it is only approximately correct?  What is the best strategy for choosing a useful next training experience?  How does the choice of this strategy alter the complexity of the learning problem?  What is the best way to reduce the learning task to one or more function approximation problems?  What specific functions should the system attempt to learn?  Can this process itself be automated?  How can the learner automatically alter its representation to improve its ability? 52
  • 53. CONCEPT LEARNING  Concept Learning: The definition of a general category given a sample of positive and negative training examples of the category.  Concept learning can be formulated as a problem of searching through a predefined space of potential hypotheses that best fits the training examples.  The general definition of some concept, labelled as members or non members of the concept.  This task is commonly referred to as concept learning or approximating a boolean-valued function from examples.  It Inferring a boolean-valued function from training examples of its input and output. 53
  • 54. A CONCEPT LEARNING TASK  The task of learning the target concept "days on which my friend Aldo enjoys his favorite water sport."  Table 2.1 describes a set of example days, each represented by a set of attributes.  The attribute EnjoySport indicates whether or not Aldo enjoys his favorite water sport on this day.  The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the values of its other attributes.  What hypothesis representation shall we provide to the learner in this case? 54
  • 55. A CONCEPT LEARNING TASK  Each hypothesis be a vector of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.  For each attribute, the hypothesis will either  indicate by a "?' that any value is acceptable for this attribute, specify a single required value (e.g., Warm) for the attribute, or indicate by a "0" that no value is acceptable. 55
  • 56. A CONCEPT LEARNING TASK  If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 1)  To illustrate, the hypothesis that Aldo enjoys his favorite sport only on cold days with high humidity is represented by the expression <?, Cold, High, ?, ?, ?>  The most general hypothesis-that every day is a positive example-is represented by <?, ?, ?, ?, ?, ?> The most specific possible hypothesis-that no day is a positive example-is represented by <0,0,0,0,0,0> 56
  • 57. A CONCEPT LEARNING TASK  The EnjoySport concept learning task requires learning the set of days for which EnjoySport = yes, describing this set by a conjunction of constraints over the instance attributes.  In general, any concept learning task can be described by the set of instances over which the target function is defined, the target function, the set of candidate hypotheses considered by the learner, and the set of available training examples. 57
  • 58. A CONCEPT LEARNING TASK Notation The following terminology when discussing concept learning problems.  The set of items over which the concept is defined is called the set of instances, which we denote by X.  X is the set of all possible days, each represented by the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.  The concept or function to be learned is called the target concept, which we denote by c.  In general, c can be any boolean-valued function defined over the instances X; that is, c : X-> {0,1}. 58
  • 59. A CONCEPT LEARNING TASK - NOTATION  The target concept corresponds to the value of the attribute EnjoySport i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No 59
  • 60. A CONCEPT LEARNING TASK - NOTATION The learner is presented a set of training examples, each consisting of an instance x from X, along with its target concept value c(x). Instances for which c(x) = 1 are called positive examples or members of the target concept. Instances for which c(x) = 0 are called negative examples or nonmembers of the target concept. write the ordered pair (x, c(x)) to describe the training example consisting of the instance x and its target concept value c(x). The symbol D to denote the set of available training examples. 60
  • 61. A CONCEPT LEARNING TASK - NOTATION Given a set of training examples of the target concept c, the problem faced by the learner is to hypothesize or estimate c. The symbol H to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept. Usually H is determined by the human designer's choice of hypothesis representation. Each hypothesis h in H represents a boolean-valued function defined over X; that is h : X -> {0,1). The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X. 61
  • 62. CONCEPT LEARNING AS SEARCH Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find the hypothesis that best fits the training examples. It is important to note that by selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ever represent and therefore can ever learn.  Consider for example, the instances X and hypotheses H in the EnjoySport learning task.  The attribute Sky has three possible values and that AirTemp, Humidity, Wind, Water, and Forecast each have two possible values. 62
  • 63. CONCEPT LEARNING AS SEARCH The instance space X contains exactly 3 .2 2 .2 2 .2 = 96 distinct instances. A similar calculation shows that there are 5.4-4 -4 -4.4 = 5 120 syntactically distinct hypotheses within H. The number of semantically distinct hypotheses is only 1 + (4.3.3.3.3.3) = 973. General-to-Specific Ordering of Hypotheses We can design learning algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every hypothesis. To illustrate the general-to-specific ordering, consider the two hypotheses h1 = (Sunny, ?, ?, Strong, ?, ?) h2 = (Sunny, ?, ?, ?, ?, ?) 63
  • 64. CONCEPT LEARNING AS SEARCH The sets of instances that are classified positive by h1 and by h2. Because h2 imposes fewer constraints on the instance, it classifies more instances as positive. Any instance classified positive by h1 will also be classified positive by h2. Therefore, we say that h2 is more general than h1. This intuitive "more general than" relationship between hypotheses can be defined more precisely as follows. First, for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. The more-general-than_or -equalr-to relation in terms of the sets of instances that satisfy the two hypotheses: Given hypotheses hj and hk, hj is more-general- than_or_equal_to hk if and only if any instance that satisfies hk also satisfies hj. 64
  • 65. CONCEPT LEARNING AS SEARCH say that x satisfies h if and only if h(x) = 1. The more-general-than_or -equalr-to relation in terms of the sets of instances that satisfy the two hypotheses: Given hypotheses hj and hk, hj is more-general- than_or_equal_to hk if and only if any instance that satisfies hk also satisfies hj. Also find it useful to consider cases where one hypothesis is strictly more general than the other. Therefore, we will say that hj is more-general-than 65
  • 66. CONCEPT LEARNING AS SEARCH . 66
  • 67. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS To illustrate this algorithm, assume the learner is given the sequence of training examples from Table 2.1 for the EnjoySport task. The first step of FIND S is to initialize h to the most specific hypothesis in H 67
  • 68. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS observing the first training example from Table 2.1, which happens to be a positive example, it becomes clear that our hypothesis is too specific. If none of the "0" constraints in h are satisfied by this example then each is replaced by the next more general constraint that fits the example; namely, the attribute values for this training example. This h is still very specific, all instances are negative except for the single positive training example. Next, the second training example forces the algorithm to further generalize h, this time substituting a "?' in place of any attribute value in h that is not satisfied by the new example. The refined hypothesis in this case is 68
  • 69. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS Upon encountering the third training example-in this case a negative example- the algorithm makes no change to h. The FIND-S algorithm simply ignores every negative example, While this may at first seem strange, notice that in the current case our hypothesis h is already consistent with the new negative example, and hence no revision is needed. To complete our trace of FIND-S, the fourth (positive) example leads to a further generalization of h. The FIND-S algorithm illustrates one way in which the more- general-than partial ordering can be used to organize the search for an acceptable hypothesis. The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses along one chain of the partial ordering. 69
  • 70. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS Figure 2.2 illustrates this search in terms of the instance and hypothesis spaces. At each step, the hypothesis is generalized only as far as necessary to cover the new positive example. Therefore, at each stage the hypothesis is the most specific hypothesis consistent with the training examples observed up to this point (hence the name FIND-S) 70
  • 71. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS . 71
  • 72. FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS The key property of the FIND-S algorithm is that for hypothesis spaces described by conjunctions of attribute constraints (such as H for the EnjoySport task). FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples. Its final hypothesis will also be consistent with the negative examples provided the correct target concept is contained in H, and provided the training examples are correct. 72
  • 73. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM This is a second approach to concept learning. The CANDIDATE ELIMINATION algorithm, that addresses several of the limitations of FIND-S. FIND-S outputs a hypothesis from H, that is consistent with the training examples, this is just one of many hypotheses from H that might fit the training data equally well. The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the set of all hypotheses consistent with the training examples. It computes the description of this set without explicitly enumerating all of its members. The CANDIDATE-ELIMINATION algorithm has been applied to problems such as learning regularities in chemical mass spectroscopy and learning control rules for heuristic search. 73
  • 74. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM The CANDIDATE-ELIMINATION and FIND-S algorithms both perform poorly when given noisy training data. It provides a useful conceptual framework for introducing several fundamental issues in machine learning. Representation The CANDIDATE-ELIMINATION algorithm finds all describable hypotheses that are consistent with the observed training examples. A hypothesis is consistent with the training examples if it correctly classifies these examples. Definition: A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example (x, c(x)) in D. 74
  • 75. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM Where x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the target concept. However, whether such an example is consistent with h depends on the target concept, and in particular, whether h(x) = c(x). The CANDIDATE-ELIMINATION algorithm represents the set of all hypotheses consistent with the observed training examples. This subset of all hypotheses is called the version space with respect to the hypothesis space H and the training examples D, because it contains all plausible versions of the target concept. 75
  • 76. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM Definition: The version space, denoted V SH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D. The LIST-THEN-ELIMINATION algorithm The version space is simply to list all of its members. This leads to a simple learning algorithm, which we might call the LIST-THENELIMINATE algorithm, defined in Table 2.4. This algorithm first initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. 76
  • 77. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples. If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire set of hypotheses consistent with the observed data. The algorithm can be applied whenever the hypothesis space H is finite. It has many advantages, including guaranteed to output all hypotheses consistent with the training data. 77
  • 78. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM A More Compact Representation for Version Spaces  The CANDIDATE-ELIMINATlON algorithm works on the same principle as LIST-THEN- ELIMINATION algorithm.  However, it employs a much more compact representation of the version space.  The version space is represented by its most general and least general members.  These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.  . 78
  • 79. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM  Recall that given the four training examples from Table 2.1, FIND-S outputs the hypothesis  This is one of six different hypotheses from H that are consistent with these training examples.  All six hypotheses are shown in Figure 2.3. They constitute the version space relative to this set of data and this hypothesis representation.  The arrows among these six hypotheses in Figure 2.3 indicate instances of the more-general-than relation.  The CANDIDATE-ELIMINATION algorithm represents the version space by storing only its most general members (labeled G in Figure 2.3) and its most specific (labeled S in the figure).  The arrows among these six hypotheses in Figure 2.3 indicate instances of the more-general-than relation. 79
  • 80. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM  Only two sets S and G, it is possible to enumerate all members of the version space as needed by generating the hypotheses that lie between these two sets in the general-to-specific partial ordering over hypotheses.  we define the boundary sets G and S precisely and prove that these sets do in fact represent the version space.  Version space for a "rectangle" hypothesis language in two dimensions.  Green pluses are positive examples, and red circles are negative examples.  GB is the maximally general positive hypothesis boundary.  SB is the maximally specific positive hypothesis boundary.  The intermediate (thin) rectangles represent the hypotheses in the version space. 80
  • 81. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM  Only two sets S and G, it is possible to enumerate all members of the version space as needed by generating the hypotheses that lie between these two sets in the general-to-specific partial ordering over hypotheses.  The boundary sets G and S precisely and prove that these sets do in fact represent the version space.  The version space is precisely the set of hypotheses contained in G, plus those contained in S, plus those that lie between G and S in the partially ordered hypothesis space. This is stated precisely in Theorem 2.1. 81
  • 82. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM To prove the theorem it suffices to show that (1) Every h satisfying the right hand side of the above expression is in V SH,D (2) Every member of V SH,D satisfies the right-hand side of the expression. Let g be an arbitrary member of G, s be an arbitrary member of S, and h be an arbitrary member of H. 82
  • 83. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM CANDIDATE-ELIMINATION Algorithm The CANDIDATE-ELIMINATION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples. It begins by initializing the version space to the set of all hypotheses in H, that is, by initializing the G boundary set to contain the most general hypothesis in H and initializing the S boundary set to contain the most specific hypothesis These two boundary sets delimit the entire hypothesis space, because every other hypothesis in H is both more general than So and more specific than Go. 83
  • 84. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM CANDIDATE-ELIMINATION Algorithm Each training example is considered, the S and G boundary sets are generalized and specialized respectively To eliminate from the version space any hypotheses found inconsistent with the new training example. The computed version space contains all the hypotheses consistent with these examples and only these hypotheses. This algorithm is summarized in Table 2.5. 84
  • 85. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM CANDIDATE-ELIMINATION Algorithm  .  Training Examples:  T1: (Sunny,Warm,Normal,Strong,Warm,Same),Yes  T2: (Sunny,Warm,High,Strong,Warm,Same),Yes  T3: (Rainy,Cold,High,Strong,Warm,Change),No  T4: (Sunny,Warm,High,Strong,Cool,Change),Yes 85
  • 86. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM Figure 2.4 traces the CANDIDATE -ELIMINATlON Algorithm applied to the first two training examples from Table 2.1. First boundary initialized to G0 and S0, the most general and most specific hypotheses in H, respectively. When the first training example is presented , the CEA checks the S boundary and finds that it is overly specific-it fails to cover the positive example.  .  Fig:2.4 CANDIDATE-ELIMINATION Trace 1. S0 and G0 are the initial boundary sets corresponding to the most specific and most general hypotheses.  Training examples 1 and 2 force the S boundary to become more general, as in the FIND-S algorithm. They have no effect on the G boundary. 86
  • 87. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM  The boundary is revised by moving it to the least more general hypothesis that covers this new example.  This revised boundary is shown as S1 in Figure 2.4  No update of the G boundary is needed in response to this training example because G0 correctly covers this example.  When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged (i.e., G2 = G1 = G0).  .  Fig:2.4 CANDIDATE-ELIMINATION Trace 1. S0 and G0 are the initial boundary sets corresponding to the most specific and most general hypotheses.  Training examples 1 and 2 force the S boundary to become more general, as in the FIND-S algorithm. They have no effect on the G boundary. 87
  • 88. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM Negative training examples play the complimentary role of forcing the G boundary to become increasingly specific. Consider the third training example, shown in Figure 2.5. The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example. Figure 2.5, there are several alternative minimally more specific hypotheses.  .  Figure 2.5: CEA Trace 2: Training example 3 is a negative example that forces the G2 boundary to be specialized to G3.  Note several alternative maximally general hypotheses are included in G3. 88
  • 89. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM The fourth training example, as shown in Figure 2.6. Further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example. After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples.  .  Figure 2.6. CEA Trace T3:. The positive training example generalizes the S boundary, from S3 to S4.  One member of G3 must also be deleted, because it is no longer more general than the S4 boundary. 89
  • 90. VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM The entire version space, including those hypotheses bounded by S4 and G4, is shown in Figure 2.7. This learned version space is independent of the sequence in which the training examples are presented (because in the end it contains all hypotheses consistent with the set of examples).  .  Figure 2.7. The final version space for the EnjoySport concept learning problem and training examples described earlier. 90
  • 92. Linear separability  Consider two-input patterns (X1,X2) being classified into two classes as shown in figure.  Each point with either symbol of x or o represents a pattern with a set of values (X1,X2).  Each pattern is classified into one of two classes.  These classes can be separated with a single line L. They are known as linearly separable patterns.  Linear separability refers to the fact that classes of patterns with n-dimensional vector x = (x1, x2, ... , xn) can be separated with a single decision surface.  The line L represents the decision surface.  Example.
  • 93. Linear separability  The processing unit of a single-layer perceptron network is able to categorize a set of patterns into two classes as the linear threshold function defines their linear separability.  The two classes must be linearly separable in order for the perceptron network to function correctly. This is the main limitation of a single-layer perceptron network. Example:  The most classic example of linearly inseparable pattern is a logical exclusive-OR (XOR) function. Shown in figure. 0 for black dot and 1 for white dot, cannot be separated with a single line.  The solution seems that patterns of (X1,X2) can be logically classified with two lines L1 and L2.  Example.
  • 94. Linear separability  FIGURE 3.4 Data for the OR logic function and a plot of the four datapoints.  The graph on the right side of Figure 3.4, to draw a straight line that separates out the crosses from the circles without difficulty. (it is done in Figure 3.6).  What the Perceptron does? It tries to find a straight line where the neuron fires on one side of the line, and doesn’t on the other.  This line is called the decision boundary or discriminant function (Figure 3.7.)  Example.  FIGURE 3.6 The decision boundary computed by a Perceptron for the OR function.  Figure 3.6 shows the decision boundary, which shows when the decision about which class to categorise the input as changes from crosses to circles.  FIGURE 3.7 A decision boundary separating two classes of data.
  • 95. Linear separability  Consider one input vector x. The neuron fires if  w is the row of W that connects the inputs to one particular neuron.  wT denotes the transpose of w and is used to make both of the vectors into column vectors.  The a · b notation describes the inner or scalar product between two vectors.  Where is the angle between a and b and is the length of the vector a.  In perceptron the boundary case is where to find an input vector x1 that has x1 · wT = 0.  Another input vector x2 that satisfies x2 · wT = 0.  Putting these two equations together we get:  FIGURE 3.7 A decision boundary separating two classes of data.
  • 96. Linear separability  It is worth thinking about what happens when we have more than one output neuron.  The weights for each neuron separately describe a straight line.  Several neurons we get several straight lines that each try to separate different parts of the space.  Figure 3.8 shows an example of decision boundaries computed by a Perceptron with four neurons.  FIGURE 3.8 Different decision boundaries computed by a Perceptron with four neurons.
  • 97. The Perceptron Convergence Theorem  Perceptron will converge to a solution that separates the classes, and that it will do it after a finite number of iterations.  The number of iterations is bounded by  is the distance between the separating hyperplane and the closest datapoint to it.  Assume that the length of every input vector that they are bounded by some constant R.  We know that there is some weight vector w* that separates the data and it is linearly separable.  The Perceptron learning algorithm aims to find some vector w that is parallel to w*.  When two vectors are parallel we use the inner product w* . w.  When the two vectors are parallel, the angle between them is = 0 and so cos = 1. So the size inner product is maximum.  If each weight update W* . W increases, then algorithm will converge.  We do need a little bit more,  There are two checks that we need to make: (1) The value of w*·w (2) the length of w.  Suppose the tth iteration of the algorithm, we get  (t-1) index means the weights at the (t-1)st step. This weight update will be φ φ
  • 98. The Perceptron Convergence Theorem  We need to do some computation, how this changes the two values.  Where is that smallest distance between the optimal hyperplane defined by W* and any datapoint.  This means that inner product increases by at least , after t updates of the weights,  By using the Cauchy–Schwartz inequality,  The length of the weight vector after t steps is:  .  So, . Hence after we have made that many updates the algorithm must have converged. γ γ
  • 100. Linear separability Linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.  Example.  In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between the dependent variable (y) and independent variable (x).
  • 101. Linear separability  It is common to turn classification problems into regression problems.  This can be done in two ways.  First by introducing an indicator variable, which simply says which class each datapoint belongs to.  The problem is now to use the data to predict the indicator variable, which is a regression problem.  The second approach is to do repeated regression, once for each class, with the indicator value being 1 for examples in the class and 0 for all of the others.  Since classification can be replaced by regression using these methods.  The difference between the Perceptron and more statistical approaches is in the way that the problem is set up.  Example.  FIGURE 3.13 Linear regression in two and three dimensions.
  • 102. Linear separability  For regression we are making a prediction about an unknown value y by computing some function of known values xi.  y such as the indicator variable for classes or a future value of some data.  The output y is going to be a sum of the xi values, each multiplied by a constant parameter:  The define a straight line that goes through the datapoints.  Figure 3.13 shows this in two and three dimensions.  We define the line that best fits the data.  Example.  FIGURE 3.13 Linear regression in two and three dimensions.
  • 103. Linear separability  The most common solution is to try to minimise the distance between each datapoint and the line that we fit.  we can use Pythagoras’ theorem to know the distance.  Now, we can try to minimise an error function that measures the sum of all these distances.  If we ignore the square roots and minimise the sum-of-squares of the errors, then we get the most common minimisation, which is known as least- squares optimisation.  In order to minimise the squared difference between the prediction and the actual data value, summed over all of the datapoints.  That is, we have:  .  This can be written in matrix form as:  where t is a column vector containing the targets  X is the matrix of input values (even including the bias inputs), just as for the Perceptron.  Computing the smallest value of this means differentiating it with respect to the (column) parameter vector and setting the derivative to 0.  Therefore 
  • 104. Linear separability  Remembering that and the term  which has the solution
  • 105. Reference: 1.Ethem Alpaydin, ―Introduction to Machine Learning 3e (Adaptive Computation and Machine Learning Series)‖, Third Edition, MIT Press, 2014 2.Tom M Mitchell, ―Machine Learning‖, First Edition, McGraw Hill Education, 2013. 105