INTRODUCTION
TO
MACHINE LEARNING
Data
Outputs
Program
COMPUTER
Machine Learning History:
• 1950s:
– Samuel's checker-playing program
• 1960s:
– Neural network: Rosenblatt's perceptron
– Minsky & Papert prove limitations of Perceptron
• 1970s:
– Expert systems and knowledge acquisition
– Quinlan’s ID3
– Natural language processing (symbolic)
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
2
Machine Learning History:
• 1980s:
– Advanced decision tree and rule learning
– Learning and planning and problem solving
– Focus on experimental methodology
• 90's ML and Statistics
– Data Mining
– Adaptive agents and web applications
– Text learning
– Reinforcement learning
– Bayes Net learning
• 1994: Self-driving car road test
• 1997: Deep Blue beats Gary Kasparov
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
3
Machine Learning History:
• Popularity of this field in recent time and the reasons behind
that
-New software/ algorithms
- Neural networks
- Deep learning
-New hardware
- GPU’s
-Cloud Enabled
-Availability of Big Data
2009: Google builds self driving car
2011: IBM Watson wins Jeopardy
2014: Human vision surpassed by ML systems
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
4
Programs vs Learning Algorithms:
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
5
Computer
Data Program
Output
Computer
Data Output
Program
Algorithmic Solution Machine Learning Solution
Machine Learning -Definition
Learning is the ability to improve one's behaviour based on
experience.
• Build computer systems that automatically improve with
experience.
• Machine Learning explores algorithms that can
– learn from data / build a model from data
– use the model for prediction, decision making or
solving some tasks.
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
6
Machine Learning-Definition:
A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E. [Tom M.Mitchell]
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
7
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
8
Machine Learning- Problem Definition:
A checkers learning problem:
• Task T: playing checkers
• Performance measure P: percent of games won against
opponents
• Training experience E: playing practice games against itself.
A handwriting recognition learning problem:
• Task T: recognizing and classifying handwritten words
within images
• Performance measure P: percent of words correctly
classified.
• Training experience E: a database of handwritten words
with given classifications
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
9
A robot driving learning problem:
• Task T: driving on public four-lane highways using vision
sensors
• Performance measure P: average distance travelled before
an error (as judged by human overseer)
• Training experience E: a sequence of images and steering
commands recorded while observing a human driver
In Spam E-Mail detection:
• Task, T: To classify mails into Spam or Not Spam.
• Performance measure, P: Total percent of mails being
correctly classified as being “Spam” or “Not Spam”.
• Experience, E: Set of Mails with label “Spam”
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
10
Components of a learning problem:
• Task: The behaviour or task being improved.
– For example: classification, acting in an environment
• Data: The experiences that are being used to improve
performance in the task.
• Measure of improvement :
– For example: increasing accuracy in prediction,
acquiring new, improved speed and efficiency
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
11
Black box learner
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
12
Experiences/Data Problem/Task
Background Knowledge/Bias Answer/Performance
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
13
In machine learning, these black box models are created directly from data by an algorithm,
meaning that humans, even those who design them, cannot understand how variables are being combined
to make predictions. Even if one has a list of the input variables, black box predictive models can be such
complicated functions of the variables that no human can understand how the variables are jointly related
to each other to reach a final prediction.
Learner
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
14
Experiences/Data Problem/Task
Background Knowledge Answer/Performance
Learner Reasoner
Models
Domains and applications
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
15
Medicine:
• Diagnose a disease
– Input: symptoms, lab measurements, test
results, DNA tests etc.,
–Output: one of set of possible diseases, or
none of the above
• Data: historical medical records
• Learn: which future patients will respond best to
which treatments
Domains and applications
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
16
Vision:
• say what objects appear in an image
• convert hand-written digits to characters 0..9
• detect where objects appear in an image
Robot control:
• Design autonomous mobile robots that learn from
experience to
– Play soccer
– Navigate from their own experience
Domains and applications
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
17
NLP:
• detect where entities are mentioned in NL
• detect what facts are expressed in NL
• detect if a product/movie review is positive, negative or
neutral
Speech recognition
Machine translation
Financial:
• predict if a stock will rise or fall
• predict if a user will click on an ad or not
Domains and applications
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
18
• Forecasting product sales quantities taking
seasonality and trend into account.
• Identifying cross selling promotional opportunities
for consumer goods.
• Fraud detection : Credit card Providers
• Etc.,
Design a Learner:
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
19
Choose the training experience
Choose the target function (that is to be learned)
Choose a learning algorithm to infer the target function
Choose how to represent the target function
Final Design
Types of Machine Learning:
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
20
Supervised Learning
Unsupervised Learning
Semi supervised Learning
Reinforcement Learning
Broad types of learning
Supervised Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
21
X Y
Input 1  Output1
Input 2  Output 2
Input 3  Output3
.
.
.
.
.
.
Input n  Output n
Learning
Algorithm
Model
New input X
Output Y
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
22
• Supervised learning algorithms experience a dataset containing
features, but each example is also associated with a label or
target.
• The term supervised learning originates from the view of the
target y being provided by an instructor or teacher who shows the
machine learning system what to do.
• Supervised machine learning algorithms are designed to learn by
example.
• The objective of a supervised learning model is to predict the
correct label for newly presented input data.
Supervised Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
23
• Let us say we want to learn the class, C, of a “family car.” We have a set of examples of cars,
and we have a group of people that we survey to whom we show these cars. The people
look at the cars and label them; the cars that they believe are family cars are positive
examples, and the other cars are negative examples.
• The features that separate a family car from other cars are the price and engine power.
Supervised Learning
x1 : price
x
2
:
engine
power
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
24
Supervised Learning
x1 : price
x
2
:
engine
power
p1 p2
e1
e2
After further discussions with the expert and the analysis of the data, we may have reason to
believe that for a car to be a family car, its price and engine power should be in a certain range,
(p1 ≤ price ≤ p2) AND (e1 ≤ engine power ≤ e2)
Above equation fixes H, the hypothesis class from which we believe C is drawn, namely, the set
of rectangles. The learning algorithm then finds the particular hypothesis, h ϵ H, to approximate
C as closely as possible.
C
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
25
• Supervised learning can be split into two subcategories:
Classification Regression
Supervised Learning
height
weight
♀
♀
♀
♀
♀
♀
♀
♂
♂
♂
♂
♂
♂
♂
Male: ♂
Female: ♀
grade
1 3 6 9
Hours studied
100
75
50
Unsupervised Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
26
X
Input 1
Input 2
Input 3
.
.
.
.
.
Input n
Learning
Algorithm
Clusters
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
27
• It uses machine learning algorithms to analyse and cluster unlabelled
datasets.
• A cluster is therefore a collection of objects which are “similar”
between them and are “dissimilar” to the objects belonging to other
clusters.
• These algorithms discover hidden patterns or data groupings without
the need for human intervention.
• Its ability to discover similarities and differences in information make
it the ideal solution for exploratory data analysis, image recognition…
Unsupervised Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
28
• In this type of learning, the algorithm is trained upon a combination
of labelled and unlabelled data.
• Typically, this combination will contain a very small amount of
labelled data and a very large amount of unlabelled data.
• A semi-supervised machine-learning algorithm uses a limited set of
labelled sample data to train itself, resulting in a ‘partially trained’
model.
Semi-supervised Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
29
Semi-supervised Learning
Image source: towardsdatascience.com
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
30
Reinforcement Learning
Environment
Agent
state action
reward
The agent interacts with an environment. At any state of the environment,
the agent takes an action that changes the state and returns a reward.
*Source: Introduction to ML-Ethem Alpaydin
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
31
• Reinforcement learning addresses the question of how an
autonomous agent that senses and acts in its environment can learn
to choose optimal actions to achieve its goals.
• The learner is a decision-making agent that takes actions in an
environment and receives reward (or penalty) for its actions in trying
to solve a problem.
• It is called “learning with a critic,” as opposed to learning with a
teacher which we have in supervised learning.
Reinforcement Learning
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
32
• Classification is the process of finding or discovering a model or
function which helps in separating the data into multiple categorical
classes i.e. discrete values.
• It is the task of approximating a mapping function (f) from input
variables (X) to discrete output variables (Y).
• The output variables are often called labels or categories. The
mapping function predicts the class or category for a given
observation.
• Binary classification, multi-class classification.
Classification
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
33
Given:
– a set of input features X1 ,……..Xn
– A target feature Y
– a set of training examples where the values for the input
features and the target features are given for each example
– a new example, where only the values for the input features
are given
• Predict the values for the target features for the new example.
– classification when Y is discrete
– regression when Y is continuous
Classification
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
34
Classification
Example: Credit scoring
Differentiating between low-risk
and high-risk customers from their
income and savings. Predicting new
customer whether he can pay
credit bill or not.
• Predicting cold/hot weather,
student pass/fail, team win or
lose etc.,
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
35
• Regression is the process of finding a model or function for
distinguishing the data into continuous real values instead of using
classes or discrete values.
• It is the task of approximating a mapping function (f) from input
variables (X) to a continuous output variable (Y).
• A continuous output variable is a real-value, such as an integer or
floating point value. These are often quantities, such as amounts and
sizes.
Regression
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
36
Y: price
X: mileage
Regression
Example: Price of a used car
x : car attributes
y : price
y = g (x, θ )
g ( ) model,
𝜃 parameters
• Predicting the rain fall
based on historical data.
• Predicting winning
percentage.
• Predicting tomorrow
temperature
Y=wx+w0
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
37
• A hypothesis (h) is a function that best describes the target in
supervised machine learning. ( h: function that approximates f )
• Hypothesis space (H) is the set of all the possible legal hypothesis.
(H : set of functions we allow for approximating f. )
• The hypothesis space used by a machine learning system is the set of
all hypotheses that might possibly be returned by it.
• This is the set from which the machine learning algorithm would
determine the best possible (only one) which would best describe the
target function or the outputs.
Hypothesis Space
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
38
• Each setting of the parameters in the machine is a different
hypothesis about the function that maps input vectors to output
vectors.
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Example:
<0.5,2.5,+>
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
39
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
?
?
?
?
?
?
Hypothesis: function for labelling
examples
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
40
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
?
?
?
?
?
?
Hypothesis space:
set of legal hypotheses
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
41
• Because learning is ill-posed, and data by itself is not sufficient to find the
solution.
• we should make some extra assumptions to have a unique solution with the
data we have.
• The set of assumptions made by a learning algorithm to make learning
possible is called the inductive bias of the learning algorithm.
• The question now is to decide where to stop ?
• Thus learning is not possible without inductive bias, and now the question
is how to choose the right bias.
• This is called model selection, which is choosing between possible H.
Inductive bias / Learning bias
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
42
• There is no such thing as a perfect model so the model we build and
train will have errors.
• There will be differences between the predictions and the actual
values. Performance of model is inversely proportional to such
differences
• The smaller the difference, the better the model. Our goal is to try to
minimize the error.
• The part of the error that can be reduced has two components:
Bias and Variance.
Bias & Variance
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
43
• The performance of a model depends on the balance between bias and
variance.
• Bias occurs when we try to approximate a complex or complicated
relationship with a much simpler model.
• By using a simple model, we restrict the performance. The true relationship
between the features and the target cannot be reflected. The models with
high bias are not able to capture the important relations.
• Thus, the accuracy on both training and test sets will be very low. This
situation is also known as Underfitting.
Bias & Variance
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
44
• The models with high bias tends to Underfitting.
Bias & Variance
High Bias, Underfitting
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
45
• Variance occurs when the model is highly sensitive to the changes in
the independent variables (features).
• The model tries to pick every detail about the relationship between
features and target. It even learns the noise from data.
• A very small change in a feature might change the prediction of the
model.
• Thus, we end up with a model that captures each and every detail on
the training set so the accuracy on the training set will be very high.
Bias & Variance
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
46
• However, the accuracy of new, previously unseen samples will not be
good because there will always be different variations in the features.
• This situation is also known as overfitting.
Bias & Variance
High Variance, Overfitting
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
47
Learning algorithms
exhibiting Low Bias
Learning algorithms
exhibiting high Bias
Learning algorithms
exhibiting low variance
Learning algorithms
exhibiting high variance
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
48
• Underfitting: model is too simple to represent all the relevant class
characteristics
– High bias and low variance
– High training error and high test error
• Overfitting model is too complex and fits irrelevant characteristics
(noise) in the data
– Low bias and high variance
– Low training error and high test error
Underfitting & Overfitting
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
49
Underfitting & Overfitting
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
50
• Evaluating the performance of learning systems is important because:
– Learning systems are usually designed to predict the class of
future unlabelled data points.
Typical choices for Performance Evaluation:
– Error
– Accuracy
– Precision
– Recall
Evaluation
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
51
• It creates a N X N matrix, where N is the number of classes or
categories that are to be predicted.
Confusion Matrix
Predicted Class
+ -
Actual
Class
+ TP FN P=TP+FN
- FP TN N=FP+TN
Accuracy= (TP+TN)/(P+N)
Precision= TP/(TP+FP)
Recall/Sensitivity=TP/P
Specificity = TN/N
False Alarm Rate= FP/N
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
52
• Precision : Percentage of positive instances out of the total predicted
positive instances.(from all the classes we have predicted as positive,
how many are actually positive)
• Recall/Sensitivity/True Positive Rate: Percentage of positive
instances out of the total actual positive instances.(from all the
positive classes, how many we predicted correctly.)
• Specificity: Percentage of negative instances out of the total actual
negative instances.
Confusion Matrix
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
53
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
54
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
55
we want to classify 10 new photos. We could use our classifier to do the
categorization of the photos. Each photo receives a prediction containing the
label (0 or 1) which represents the two classes (dog or not a dog).
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
56
we want to train a model that predicts if a photo contains a dog, cat, or rabbit. In this
case, the number of classes will be 3. Now imagine that we’re passing 27 photos to
be classified (predicted) and we get the following confusion matrix:
1
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
57
Cross Validation
• We use one part for training (i.e., to fit a hypothesis), and the
remaining part is called the validation set and is used to test the
generalization ability.
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
58
• That is, given a set of possible hypothesis classes Hi , for each we fit the best
hi Є Hi on the training set.
• Then, assuming large enough training and validation sets, the hypothesis
that is the most accurate on the validation set is the best one (the one that
has the best inductive bias).
• This process is called cross-validation.
• We have used the validation set to choose the best model, and it has
effectively become a part of the training set.
• We need a third set, a test set, sometimes also called the publication set,
containing examples not used in training or validation.
Cross Validation
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
59
• Split the data into k equal subsets
• Perform k rounds of learning; on each round
– 1/k of the data is held out as a test set and
– the remaining examples are used as training data.
• Compute the average test set score of the k rounds
K-fold Cross Validation
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
60
In machine learning, there is always a trade off between
– complex hypotheses that fit the training data well
– simpler hypotheses that may generalise better.
• As the amount of training data increases, the generalization error
decreases.
Trade-off
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
61
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
62
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
63
• Machine Learning- Tom Mitchell
• Introduction to Machine Learning – Ethem Alpaydin
References
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
64

Intro to ML.pptx

  • 1.
  • 2.
    Machine Learning History: •1950s: – Samuel's checker-playing program • 1960s: – Neural network: Rosenblatt's perceptron – Minsky & Papert prove limitations of Perceptron • 1970s: – Expert systems and knowledge acquisition – Quinlan’s ID3 – Natural language processing (symbolic) Introduction to ML -B Bhagya Prasad, ECE, SRKREC 2
  • 3.
    Machine Learning History: •1980s: – Advanced decision tree and rule learning – Learning and planning and problem solving – Focus on experimental methodology • 90's ML and Statistics – Data Mining – Adaptive agents and web applications – Text learning – Reinforcement learning – Bayes Net learning • 1994: Self-driving car road test • 1997: Deep Blue beats Gary Kasparov Introduction to ML -B Bhagya Prasad, ECE, SRKREC 3
  • 4.
    Machine Learning History: •Popularity of this field in recent time and the reasons behind that -New software/ algorithms - Neural networks - Deep learning -New hardware - GPU’s -Cloud Enabled -Availability of Big Data 2009: Google builds self driving car 2011: IBM Watson wins Jeopardy 2014: Human vision surpassed by ML systems Introduction to ML -B Bhagya Prasad, ECE, SRKREC 4
  • 5.
    Programs vs LearningAlgorithms: Introduction to ML -B Bhagya Prasad, ECE, SRKREC 5 Computer Data Program Output Computer Data Output Program Algorithmic Solution Machine Learning Solution
  • 6.
    Machine Learning -Definition Learningis the ability to improve one's behaviour based on experience. • Build computer systems that automatically improve with experience. • Machine Learning explores algorithms that can – learn from data / build a model from data – use the model for prediction, decision making or solving some tasks. Introduction to ML -B Bhagya Prasad, ECE, SRKREC 6
  • 7.
    Machine Learning-Definition: A computerprogram is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Tom M.Mitchell] Introduction to ML -B Bhagya Prasad, ECE, SRKREC 7
  • 8.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 8 Machine Learning- Problem Definition:
  • 9.
    A checkers learningproblem: • Task T: playing checkers • Performance measure P: percent of games won against opponents • Training experience E: playing practice games against itself. A handwriting recognition learning problem: • Task T: recognizing and classifying handwritten words within images • Performance measure P: percent of words correctly classified. • Training experience E: a database of handwritten words with given classifications Introduction to ML -B Bhagya Prasad, ECE, SRKREC 9
  • 10.
    A robot drivinglearning problem: • Task T: driving on public four-lane highways using vision sensors • Performance measure P: average distance travelled before an error (as judged by human overseer) • Training experience E: a sequence of images and steering commands recorded while observing a human driver In Spam E-Mail detection: • Task, T: To classify mails into Spam or Not Spam. • Performance measure, P: Total percent of mails being correctly classified as being “Spam” or “Not Spam”. • Experience, E: Set of Mails with label “Spam” Introduction to ML -B Bhagya Prasad, ECE, SRKREC 10
  • 11.
    Components of alearning problem: • Task: The behaviour or task being improved. – For example: classification, acting in an environment • Data: The experiences that are being used to improve performance in the task. • Measure of improvement : – For example: increasing accuracy in prediction, acquiring new, improved speed and efficiency Introduction to ML -B Bhagya Prasad, ECE, SRKREC 11
  • 12.
    Black box learner Introductionto ML -B Bhagya Prasad, ECE, SRKREC 12 Experiences/Data Problem/Task Background Knowledge/Bias Answer/Performance
  • 13.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 13 In machine learning, these black box models are created directly from data by an algorithm, meaning that humans, even those who design them, cannot understand how variables are being combined to make predictions. Even if one has a list of the input variables, black box predictive models can be such complicated functions of the variables that no human can understand how the variables are jointly related to each other to reach a final prediction.
  • 14.
    Learner Introduction to ML -BBhagya Prasad, ECE, SRKREC 14 Experiences/Data Problem/Task Background Knowledge Answer/Performance Learner Reasoner Models
  • 15.
    Domains and applications Introductionto ML -B Bhagya Prasad, ECE, SRKREC 15 Medicine: • Diagnose a disease – Input: symptoms, lab measurements, test results, DNA tests etc., –Output: one of set of possible diseases, or none of the above • Data: historical medical records • Learn: which future patients will respond best to which treatments
  • 16.
    Domains and applications Introductionto ML -B Bhagya Prasad, ECE, SRKREC 16 Vision: • say what objects appear in an image • convert hand-written digits to characters 0..9 • detect where objects appear in an image Robot control: • Design autonomous mobile robots that learn from experience to – Play soccer – Navigate from their own experience
  • 17.
    Domains and applications Introductionto ML -B Bhagya Prasad, ECE, SRKREC 17 NLP: • detect where entities are mentioned in NL • detect what facts are expressed in NL • detect if a product/movie review is positive, negative or neutral Speech recognition Machine translation Financial: • predict if a stock will rise or fall • predict if a user will click on an ad or not
  • 18.
    Domains and applications Introductionto ML -B Bhagya Prasad, ECE, SRKREC 18 • Forecasting product sales quantities taking seasonality and trend into account. • Identifying cross selling promotional opportunities for consumer goods. • Fraud detection : Credit card Providers • Etc.,
  • 19.
    Design a Learner: Introductionto ML -B Bhagya Prasad, ECE, SRKREC 19 Choose the training experience Choose the target function (that is to be learned) Choose a learning algorithm to infer the target function Choose how to represent the target function Final Design
  • 20.
    Types of MachineLearning: Introduction to ML -B Bhagya Prasad, ECE, SRKREC 20 Supervised Learning Unsupervised Learning Semi supervised Learning Reinforcement Learning Broad types of learning
  • 21.
    Supervised Learning Introduction toML -B Bhagya Prasad, ECE, SRKREC 21 X Y Input 1  Output1 Input 2  Output 2 Input 3  Output3 . . . . . . Input n  Output n Learning Algorithm Model New input X Output Y
  • 22.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 22 • Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target. • The term supervised learning originates from the view of the target y being provided by an instructor or teacher who shows the machine learning system what to do. • Supervised machine learning algorithms are designed to learn by example. • The objective of a supervised learning model is to predict the correct label for newly presented input data. Supervised Learning
  • 23.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 23 • Let us say we want to learn the class, C, of a “family car.” We have a set of examples of cars, and we have a group of people that we survey to whom we show these cars. The people look at the cars and label them; the cars that they believe are family cars are positive examples, and the other cars are negative examples. • The features that separate a family car from other cars are the price and engine power. Supervised Learning x1 : price x 2 : engine power
  • 24.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 24 Supervised Learning x1 : price x 2 : engine power p1 p2 e1 e2 After further discussions with the expert and the analysis of the data, we may have reason to believe that for a car to be a family car, its price and engine power should be in a certain range, (p1 ≤ price ≤ p2) AND (e1 ≤ engine power ≤ e2) Above equation fixes H, the hypothesis class from which we believe C is drawn, namely, the set of rectangles. The learning algorithm then finds the particular hypothesis, h ϵ H, to approximate C as closely as possible. C
  • 25.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 25 • Supervised learning can be split into two subcategories: Classification Regression Supervised Learning height weight ♀ ♀ ♀ ♀ ♀ ♀ ♀ ♂ ♂ ♂ ♂ ♂ ♂ ♂ Male: ♂ Female: ♀ grade 1 3 6 9 Hours studied 100 75 50
  • 26.
    Unsupervised Learning Introduction toML -B Bhagya Prasad, ECE, SRKREC 26 X Input 1 Input 2 Input 3 . . . . . Input n Learning Algorithm Clusters
  • 27.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 27 • It uses machine learning algorithms to analyse and cluster unlabelled datasets. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. • These algorithms discover hidden patterns or data groupings without the need for human intervention. • Its ability to discover similarities and differences in information make it the ideal solution for exploratory data analysis, image recognition… Unsupervised Learning
  • 28.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 28 • In this type of learning, the algorithm is trained upon a combination of labelled and unlabelled data. • Typically, this combination will contain a very small amount of labelled data and a very large amount of unlabelled data. • A semi-supervised machine-learning algorithm uses a limited set of labelled sample data to train itself, resulting in a ‘partially trained’ model. Semi-supervised Learning
  • 29.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 29 Semi-supervised Learning Image source: towardsdatascience.com
  • 30.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 30 Reinforcement Learning Environment Agent state action reward The agent interacts with an environment. At any state of the environment, the agent takes an action that changes the state and returns a reward. *Source: Introduction to ML-Ethem Alpaydin
  • 31.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 31 • Reinforcement learning addresses the question of how an autonomous agent that senses and acts in its environment can learn to choose optimal actions to achieve its goals. • The learner is a decision-making agent that takes actions in an environment and receives reward (or penalty) for its actions in trying to solve a problem. • It is called “learning with a critic,” as opposed to learning with a teacher which we have in supervised learning. Reinforcement Learning
  • 32.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 32 • Classification is the process of finding or discovering a model or function which helps in separating the data into multiple categorical classes i.e. discrete values. • It is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (Y). • The output variables are often called labels or categories. The mapping function predicts the class or category for a given observation. • Binary classification, multi-class classification. Classification
  • 33.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 33 Given: – a set of input features X1 ,……..Xn – A target feature Y – a set of training examples where the values for the input features and the target features are given for each example – a new example, where only the values for the input features are given • Predict the values for the target features for the new example. – classification when Y is discrete – regression when Y is continuous Classification
  • 34.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 34 Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings. Predicting new customer whether he can pay credit bill or not. • Predicting cold/hot weather, student pass/fail, team win or lose etc.,
  • 35.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 35 • Regression is the process of finding a model or function for distinguishing the data into continuous real values instead of using classes or discrete values. • It is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (Y). • A continuous output variable is a real-value, such as an integer or floating point value. These are often quantities, such as amounts and sizes. Regression
  • 36.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 36 Y: price X: mileage Regression Example: Price of a used car x : car attributes y : price y = g (x, θ ) g ( ) model, 𝜃 parameters • Predicting the rain fall based on historical data. • Predicting winning percentage. • Predicting tomorrow temperature Y=wx+w0
  • 37.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 37 • A hypothesis (h) is a function that best describes the target in supervised machine learning. ( h: function that approximates f ) • Hypothesis space (H) is the set of all the possible legal hypothesis. (H : set of functions we allow for approximating f. ) • The hypothesis space used by a machine learning system is the set of all hypotheses that might possibly be returned by it. • This is the set from which the machine learning algorithm would determine the best possible (only one) which would best describe the target function or the outputs. Hypothesis Space
  • 38.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 38 • Each setting of the parameters in the machine is a different hypothesis about the function that maps input vectors to output vectors. Hypothesis Space 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Example: <0.5,2.5,+>
  • 39.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 39 Hypothesis Space 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 ? ? ? ? ? ? Hypothesis: function for labelling examples
  • 40.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 40 Hypothesis Space 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 ? ? ? ? ? ? Hypothesis space: set of legal hypotheses
  • 41.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 41 • Because learning is ill-posed, and data by itself is not sufficient to find the solution. • we should make some extra assumptions to have a unique solution with the data we have. • The set of assumptions made by a learning algorithm to make learning possible is called the inductive bias of the learning algorithm. • The question now is to decide where to stop ? • Thus learning is not possible without inductive bias, and now the question is how to choose the right bias. • This is called model selection, which is choosing between possible H. Inductive bias / Learning bias
  • 42.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 42 • There is no such thing as a perfect model so the model we build and train will have errors. • There will be differences between the predictions and the actual values. Performance of model is inversely proportional to such differences • The smaller the difference, the better the model. Our goal is to try to minimize the error. • The part of the error that can be reduced has two components: Bias and Variance. Bias & Variance
  • 43.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 43 • The performance of a model depends on the balance between bias and variance. • Bias occurs when we try to approximate a complex or complicated relationship with a much simpler model. • By using a simple model, we restrict the performance. The true relationship between the features and the target cannot be reflected. The models with high bias are not able to capture the important relations. • Thus, the accuracy on both training and test sets will be very low. This situation is also known as Underfitting. Bias & Variance
  • 44.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 44 • The models with high bias tends to Underfitting. Bias & Variance High Bias, Underfitting
  • 45.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 45 • Variance occurs when the model is highly sensitive to the changes in the independent variables (features). • The model tries to pick every detail about the relationship between features and target. It even learns the noise from data. • A very small change in a feature might change the prediction of the model. • Thus, we end up with a model that captures each and every detail on the training set so the accuracy on the training set will be very high. Bias & Variance
  • 46.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 46 • However, the accuracy of new, previously unseen samples will not be good because there will always be different variations in the features. • This situation is also known as overfitting. Bias & Variance High Variance, Overfitting
  • 47.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 47 Learning algorithms exhibiting Low Bias Learning algorithms exhibiting high Bias Learning algorithms exhibiting low variance Learning algorithms exhibiting high variance
  • 48.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 48 • Underfitting: model is too simple to represent all the relevant class characteristics – High bias and low variance – High training error and high test error • Overfitting model is too complex and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error Underfitting & Overfitting
  • 49.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 49 Underfitting & Overfitting
  • 50.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 50 • Evaluating the performance of learning systems is important because: – Learning systems are usually designed to predict the class of future unlabelled data points. Typical choices for Performance Evaluation: – Error – Accuracy – Precision – Recall Evaluation
  • 51.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 51 • It creates a N X N matrix, where N is the number of classes or categories that are to be predicted. Confusion Matrix Predicted Class + - Actual Class + TP FN P=TP+FN - FP TN N=FP+TN Accuracy= (TP+TN)/(P+N) Precision= TP/(TP+FP) Recall/Sensitivity=TP/P Specificity = TN/N False Alarm Rate= FP/N
  • 52.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 52 • Precision : Percentage of positive instances out of the total predicted positive instances.(from all the classes we have predicted as positive, how many are actually positive) • Recall/Sensitivity/True Positive Rate: Percentage of positive instances out of the total actual positive instances.(from all the positive classes, how many we predicted correctly.) • Specificity: Percentage of negative instances out of the total actual negative instances. Confusion Matrix
  • 53.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 53
  • 54.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 54
  • 55.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 55 we want to classify 10 new photos. We could use our classifier to do the categorization of the photos. Each photo receives a prediction containing the label (0 or 1) which represents the two classes (dog or not a dog).
  • 56.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 56 we want to train a model that predicts if a photo contains a dog, cat, or rabbit. In this case, the number of classes will be 3. Now imagine that we’re passing 27 photos to be classified (predicted) and we get the following confusion matrix: 1
  • 57.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 57 Cross Validation • We use one part for training (i.e., to fit a hypothesis), and the remaining part is called the validation set and is used to test the generalization ability.
  • 58.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 58 • That is, given a set of possible hypothesis classes Hi , for each we fit the best hi Є Hi on the training set. • Then, assuming large enough training and validation sets, the hypothesis that is the most accurate on the validation set is the best one (the one that has the best inductive bias). • This process is called cross-validation. • We have used the validation set to choose the best model, and it has effectively become a part of the training set. • We need a third set, a test set, sometimes also called the publication set, containing examples not used in training or validation. Cross Validation
  • 59.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 59 • Split the data into k equal subsets • Perform k rounds of learning; on each round – 1/k of the data is held out as a test set and – the remaining examples are used as training data. • Compute the average test set score of the k rounds K-fold Cross Validation
  • 60.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 60 In machine learning, there is always a trade off between – complex hypotheses that fit the training data well – simpler hypotheses that may generalise better. • As the amount of training data increases, the generalization error decreases. Trade-off
  • 61.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 61
  • 62.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 62
  • 63.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 63 • Machine Learning- Tom Mitchell • Introduction to Machine Learning – Ethem Alpaydin References
  • 64.
    Introduction to ML -BBhagya Prasad, ECE, SRKREC 64

Editor's Notes

  • #20 Notes Reference: https://www.geeksforgeeks.org/design-a-learning-system-in-machine-learning/
  • #43 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #44 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #45 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #46 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #47 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #48 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #49 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #50 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #51 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #52 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #53 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #54 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #55 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #56 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #57 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #58 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #59 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #60 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #61 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc
  • #64 https://towardsdatascience.com/bias-and-variance-in-machine-learning-b8019a5a15bc