2. Outlines
1 Introduction to Supervised Learning
2 Motivation:
3 Scope of Machine Learning:
4 References
Dr. Varun Kumar Lecture 3 2 / 13
3. Introduction to Supervised Learning:
Introduction:
In supervised learning, the aim is to learn a mapping from the input to an
output whose correct values are provided by a supervisor.
Ex- To find the family car based on classification.
Attributes:
It should be not expensive, i.e, p1 < price < p2.
Its engine capacity should be better, e1 hp < ec < e2 hp
It should be spacious.
Key Points:
Other attributes like, comfort, milage, top speed so on has not
been considered in the case. For analytical simplicity, we consider only
price and engine capacity.
Class learning is finding a description that is shared by all positive
examples and none of the negative examples.
Dr. Varun Kumar Lecture 3 3 / 13
4. Positive and Negative Example:
Positive and Negative Example:
When car price and their engine capacity follow the relation, i.e,
(p1 < price < p2) & (e1 < ec < e2) called as positive otherwise negative
example.
Training set for the class of a family car.
Figure: Each data point corresponds to one example car. ’+’ denotes a positive
example of the class, and ’-’ denotes a negative example.
Dr. Varun Kumar Lecture 3 4 / 13
5. Actual Hypothesis H
Actual Hypothesis H
The class of family car is a rectangle in the price-engine power space.
Actual hypothesis H contain all possibilities, where the car comes under
the family car at given constraints.
Dr. Varun Kumar Lecture 3 5 / 13
6. Mathematical Intuition:
Mathematical representation:
Let input attribute is expressed as
x =
x1
x2
x1 → Price & x2 → Engine capacity
Similarly r is a vector that reveals the condition of +ve and -ve example.
r =
1, if x is a positive example
0, if x is a negative example
Each car is represented by such an ordered pair (x, r).
Training set: Let X is the training set that gives the information about N
different cars. Mathematically,
X = {xt
, rt
}N
t=1 t → refers indexes for different examples in the set.
Dr. Varun Kumar Lecture 3 6 / 13
7. Learning Hypothesis h:
Aim of learning hypothesis h
Let C is the class of family cars. Aim of particular hypothesis is as follow
i. To reduce the search space.
ii. h ∈ H and H ≥ h.
The learning algorithm finds the particular hypothesis h such that h ∈ H,
to approximate C as closely as possible.
Let us say the hypothesis h makes a prediction for an instance x such that
h(x) =
1, if h classifies x as a positive example
0, if h classifies x as a negative example
Key points :
In real life we do not know C(x).
We cannot evaluate how well h(x) matches C(x).
We have a training set X, which is a small subset of large problem.
Dr. Varun Kumar Lecture 3 7 / 13
8. Error of hypothesis h:
Error of hypothesis h:
The error of hypothesis h given the training set X is
E(h|X) =
N
t=1
1(h(xt
) = rt
)
where 1(a = b) is 1 if a = b and is 0 if a = b.
Dr. Varun Kumar Lecture 3 8 / 13
9. Generalization:
Problem of generalization:
Let x1 and x2 are real-valued.
There are infinitely many such h that satisfy the condition, i.e, E=0.
At the boundary between positive and negative examples, different
candidate hypotheses may make different generalization predictions.
The problem of generalization refers that how well our hypothesis will
correctly classify future examples that are not part of the training set.
Most specific hypothesis:
The most specific hypothesis S is the tightest rectangle that includes all
the positive examples and none of the negative examples.
Most general hypothesis
The most general hypothesis G is the largest rectangle that includes all the
positive examples and none of the negative examples.
Dr. Varun Kumar Lecture 3 9 / 13
10. Comparison of different hypothesis:
Comparison of different hypothesis:
Any h ∈ H between S and G is a valid hypothesis with no error, said
to be consistent with the training set.
At another training set, S, G, version space, the parameters and thus
the learned hypothesis h can be different.
Dr. Varun Kumar Lecture 3 10 / 13
11. Margin:
Margin:
Actually, depending on X and H, there may be several Si and Gj that
make up the S-set and the G-set.
Margin:
The distance between boundary and the instances closest to it.
Margin decides the degree of accuracy of learning hypothesis h.
Dr. Varun Kumar Lecture 3 11 / 13
12. Probably Approximately Correct (PAC) Learning:
PAC learning:
In PAC learning,
Let a given class of problem is C.
Unknown examples have fixed probability distribution, i.e, p(x).
We want to find the number of examples N such that with probability
at least 1 − δ, the hypothesis h has error at most , for arbitrary
δ ≥ 1/2 and > 0.
P(C∆h ≤ ) ≥ 1 − δ, where C∆h is the region of difference between
C and h.
Dr. Varun Kumar Lecture 3 12 / 13
13. References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Dr. Varun Kumar Lecture 3 13 / 13