Intro to ML.pptx

INTRODUCTION
TO
MACHINE LEARNING
Data
Outputs
Program
COMPUTER

Machine Learning History:
• 1950s:
– Samuel's checker-playing program
• 1960s:
– Neural network: Rosenblatt's perceptron
– Minsky & Papert prove limitations of Perceptron
• 1970s:
– Expert systems and knowledge acquisition
– Quinlan’s ID3
– Natural language processing (symbolic)
Introduction to ML
-B Bhagya Prasad, ECE, SRKREC
2

• 1980s:
– Advanced decision tree and rule learning
– Learning and planning and problem solving
– Focus on experimental methodology
• 90's ML and Statistics
– Data Mining
– Adaptive agents and web applications
– Text learning
– Reinforcement learning
– Bayes Net learning
• 1994: Self-driving car road test
• 1997: Deep Blue beats Gary Kasparov
Introduction to ML
3

• Popularity of this field in recent time and the reasons behind
that
-New software/ algorithms
- Neural networks
- Deep learning
-New hardware
- GPU’s
-Cloud Enabled
-Availability of Big Data
2009: Google builds self driving car
2011: IBM Watson wins Jeopardy
2014: Human vision surpassed by ML systems
Introduction to ML
4

Programs vs Learning Algorithms:
Introduction to ML
5
Computer
Data Program
Output
Computer
Data Output
Program
Algorithmic Solution Machine Learning Solution

Machine Learning -Definition
Learning is the ability to improve one's behaviour based on
experience.
• Build computer systems that automatically improve with
experience.
• Machine Learning explores algorithms that can
– learn from data / build a model from data
– use the model for prediction, decision making or
solving some tasks.
Introduction to ML
6

Machine Learning-Definition:
A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E. [Tom M.Mitchell]
Introduction to ML
7

Introduction to ML
8
Machine Learning- Problem Definition:

A checkers learning problem:
• Task T: playing checkers
• Performance measure P: percent of games won against
opponents
• Training experience E: playing practice games against itself.
A handwriting recognition learning problem:
• Task T: recognizing and classifying handwritten words
within images
• Performance measure P: percent of words correctly
classified.
• Training experience E: a database of handwritten words
with given classifications
Introduction to ML
9

A robot driving learning problem:
• Task T: driving on public four-lane highways using vision
sensors
• Performance measure P: average distance travelled before
an error (as judged by human overseer)
• Training experience E: a sequence of images and steering
commands recorded while observing a human driver
In Spam E-Mail detection:
• Task, T: To classify mails into Spam or Not Spam.
• Performance measure, P: Total percent of mails being
correctly classified as being “Spam” or “Not Spam”.
• Experience, E: Set of Mails with label “Spam”
Introduction to ML
10

Components of a learning problem:
• Task: The behaviour or task being improved.
– For example: classification, acting in an environment
• Data: The experiences that are being used to improve
performance in the task.
• Measure of improvement :
– For example: increasing accuracy in prediction,
acquiring new, improved speed and efficiency
Introduction to ML
11

Black box learner
Introduction to ML
12
Experiences/Data Problem/Task
Background Knowledge/Bias Answer/Performance

Introduction to ML
13
In machine learning, these black box models are created directly from data by an algorithm,
meaning that humans, even those who design them, cannot understand how variables are being combined
to make predictions. Even if one has a list of the input variables, black box predictive models can be such
complicated functions of the variables that no human can understand how the variables are jointly related
to each other to reach a final prediction.

Learner
Introduction to ML
14
Experiences/Data Problem/Task
Background Knowledge Answer/Performance
Learner Reasoner
Models

Domains and applications
Introduction to ML
15
Medicine:
• Diagnose a disease
– Input: symptoms, lab measurements, test
results, DNA tests etc.,
–Output: one of set of possible diseases, or
none of the above
• Data: historical medical records
• Learn: which future patients will respond best to
which treatments

Introduction to ML
16
Vision:
• say what objects appear in an image
• convert hand-written digits to characters 0..9
• detect where objects appear in an image
Robot control:
• Design autonomous mobile robots that learn from
experience to
– Play soccer
– Navigate from their own experience

Introduction to ML
17
NLP:
• detect where entities are mentioned in NL
• detect what facts are expressed in NL
• detect if a product/movie review is positive, negative or
neutral
Speech recognition
Machine translation
Financial:
• predict if a stock will rise or fall
• predict if a user will click on an ad or not

Introduction to ML
18
• Forecasting product sales quantities taking
seasonality and trend into account.
• Identifying cross selling promotional opportunities
for consumer goods.
• Fraud detection : Credit card Providers
• Etc.,

Design a Learner:
Introduction to ML
19
Choose the training experience
Choose the target function (that is to be learned)
Choose a learning algorithm to infer the target function
Choose how to represent the target function
Final Design

Types of Machine Learning:
Introduction to ML
20
Supervised Learning
Unsupervised Learning
Semi supervised Learning
Reinforcement Learning
Broad types of learning

Supervised Learning
Introduction to ML
21
X Y
Input 1  Output1
Input 2  Output 2
Input 3  Output3
.
.
.
.
.
.
Input n  Output n
Learning
Algorithm
Model
New input X
Output Y

Introduction to ML
22
• Supervised learning algorithms experience a dataset containing
features, but each example is also associated with a label or
target.
• The term supervised learning originates from the view of the
target y being provided by an instructor or teacher who shows the
machine learning system what to do.
• Supervised machine learning algorithms are designed to learn by
example.
• The objective of a supervised learning model is to predict the
correct label for newly presented input data.
Supervised Learning

Introduction to ML
23
• Let us say we want to learn the class, C, of a “family car.” We have a set of examples of cars,
and we have a group of people that we survey to whom we show these cars. The people
look at the cars and label them; the cars that they believe are family cars are positive
examples, and the other cars are negative examples.
• The features that separate a family car from other cars are the price and engine power.
Supervised Learning
x1 : price
x
2
:
engine
power

Introduction to ML
24
Supervised Learning
x1 : price
x
2
:
engine
power
p1 p2
e1
e2
After further discussions with the expert and the analysis of the data, we may have reason to
believe that for a car to be a family car, its price and engine power should be in a certain range,
(p1 ≤ price ≤ p2) AND (e1 ≤ engine power ≤ e2)
Above equation fixes H, the hypothesis class from which we believe C is drawn, namely, the set
of rectangles. The learning algorithm then finds the particular hypothesis, h ϵ H, to approximate
C as closely as possible.
C

Introduction to ML
25
• Supervised learning can be split into two subcategories:
Classification Regression
Supervised Learning
height
weight
♀
♀
♀
♀
♀
♀
♀
♂
♂
♂
♂
♂
♂
♂
Male: ♂
Female: ♀
grade
1 3 6 9
Hours studied
100
75
50

Introduction to ML
26
X
Input 1
Input 2
Input 3
.
.
.
.
.
Input n
Learning
Algorithm
Clusters

Introduction to ML
27
• It uses machine learning algorithms to analyse and cluster unlabelled
datasets.
• A cluster is therefore a collection of objects which are “similar”
between them and are “dissimilar” to the objects belonging to other
clusters.
• These algorithms discover hidden patterns or data groupings without
the need for human intervention.
• Its ability to discover similarities and differences in information make
it the ideal solution for exploratory data analysis, image recognition…

Introduction to ML
28
• In this type of learning, the algorithm is trained upon a combination
of labelled and unlabelled data.
• Typically, this combination will contain a very small amount of
labelled data and a very large amount of unlabelled data.
• A semi-supervised machine-learning algorithm uses a limited set of
labelled sample data to train itself, resulting in a ‘partially trained’
model.
Semi-supervised Learning

Introduction to ML
29
Semi-supervised Learning
Image source: towardsdatascience.com

Introduction to ML
30
Environment
Agent
state action
reward
The agent interacts with an environment. At any state of the environment,
the agent takes an action that changes the state and returns a reward.
*Source: Introduction to ML-Ethem Alpaydin

Introduction to ML
31
• Reinforcement learning addresses the question of how an
autonomous agent that senses and acts in its environment can learn
to choose optimal actions to achieve its goals.
• The learner is a decision-making agent that takes actions in an
environment and receives reward (or penalty) for its actions in trying
to solve a problem.
• It is called “learning with a critic,” as opposed to learning with a
teacher which we have in supervised learning.

Introduction to ML
32
• Classification is the process of finding or discovering a model or
function which helps in separating the data into multiple categorical
classes i.e. discrete values.
• It is the task of approximating a mapping function (f) from input
variables (X) to discrete output variables (Y).
• The output variables are often called labels or categories. The
mapping function predicts the class or category for a given
observation.
• Binary classification, multi-class classification.
Classification

Introduction to ML
33
Given:
– a set of input features X1 ,……..Xn
– A target feature Y
– a set of training examples where the values for the input
features and the target features are given for each example
– a new example, where only the values for the input features
are given
• Predict the values for the target features for the new example.
– classification when Y is discrete
– regression when Y is continuous
Classification

Introduction to ML
34
Classification
Example: Credit scoring
Differentiating between low-risk
and high-risk customers from their
income and savings. Predicting new
customer whether he can pay
credit bill or not.
• Predicting cold/hot weather,
student pass/fail, team win or
lose etc.,

Introduction to ML
35
• Regression is the process of finding a model or function for
distinguishing the data into continuous real values instead of using
classes or discrete values.
• It is the task of approximating a mapping function (f) from input
variables (X) to a continuous output variable (Y).
• A continuous output variable is a real-value, such as an integer or
floating point value. These are often quantities, such as amounts and
sizes.
Regression

Introduction to ML
36
Y: price
X: mileage
Regression
Example: Price of a used car
x : car attributes
y : price
y = g (x, θ )
g ( ) model,
𝜃 parameters
• Predicting the rain fall
based on historical data.
• Predicting winning
percentage.
• Predicting tomorrow
temperature
Y=wx+w0

Introduction to ML
37
• A hypothesis (h) is a function that best describes the target in
supervised machine learning. ( h: function that approximates f )
• Hypothesis space (H) is the set of all the possible legal hypothesis.
(H : set of functions we allow for approximating f. )
• The hypothesis space used by a machine learning system is the set of
all hypotheses that might possibly be returned by it.
• This is the set from which the machine learning algorithm would
determine the best possible (only one) which would best describe the
target function or the outputs.
Hypothesis Space

Introduction to ML
38
• Each setting of the parameters in the machine is a different
hypothesis about the function that maps input vectors to output
vectors.
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Example:
<0.5,2.5,+>

Introduction to ML
39
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
?
?
?
?
?
?
Hypothesis: function for labelling
examples

Introduction to ML
40
Hypothesis Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
?
?
?
?
?
?
Hypothesis space:
set of legal hypotheses

Introduction to ML
41
• Because learning is ill-posed, and data by itself is not sufficient to find the
solution.
• we should make some extra assumptions to have a unique solution with the
data we have.
• The set of assumptions made by a learning algorithm to make learning
possible is called the inductive bias of the learning algorithm.
• The question now is to decide where to stop ?
• Thus learning is not possible without inductive bias, and now the question
is how to choose the right bias.
• This is called model selection, which is choosing between possible H.
Inductive bias / Learning bias

Introduction to ML
42
• There is no such thing as a perfect model so the model we build and
train will have errors.
• There will be differences between the predictions and the actual
values. Performance of model is inversely proportional to such
differences
• The smaller the difference, the better the model. Our goal is to try to
minimize the error.
• The part of the error that can be reduced has two components:
Bias and Variance.
Bias & Variance

Introduction to ML
43
• The performance of a model depends on the balance between bias and
variance.
• Bias occurs when we try to approximate a complex or complicated
relationship with a much simpler model.
• By using a simple model, we restrict the performance. The true relationship
between the features and the target cannot be reflected. The models with
high bias are not able to capture the important relations.
• Thus, the accuracy on both training and test sets will be very low. This
situation is also known as Underfitting.
Bias & Variance

Introduction to ML
44
• The models with high bias tends to Underfitting.
Bias & Variance
High Bias, Underfitting

Introduction to ML
45
• Variance occurs when the model is highly sensitive to the changes in
the independent variables (features).
• The model tries to pick every detail about the relationship between
features and target. It even learns the noise from data.
• A very small change in a feature might change the prediction of the
model.
• Thus, we end up with a model that captures each and every detail on
the training set so the accuracy on the training set will be very high.
Bias & Variance

Introduction to ML
46
• However, the accuracy of new, previously unseen samples will not be
good because there will always be different variations in the features.
• This situation is also known as overfitting.
Bias & Variance
High Variance, Overfitting

Introduction to ML
47
Learning algorithms
exhibiting Low Bias
Learning algorithms
exhibiting high Bias
Learning algorithms
exhibiting low variance
Learning algorithms
exhibiting high variance

Introduction to ML
48
• Underfitting: model is too simple to represent all the relevant class
characteristics
– High bias and low variance
– High training error and high test error
• Overfitting model is too complex and fits irrelevant characteristics
(noise) in the data
– Low bias and high variance
– Low training error and high test error
Underfitting & Overfitting

Introduction to ML
49
Underfitting & Overfitting

Introduction to ML
50
• Evaluating the performance of learning systems is important because:
– Learning systems are usually designed to predict the class of
future unlabelled data points.
Typical choices for Performance Evaluation:
– Error
– Accuracy
– Precision
– Recall
Evaluation

Introduction to ML
51
• It creates a N X N matrix, where N is the number of classes or
categories that are to be predicted.
Confusion Matrix
Predicted Class
+ -
Actual
Class
+ TP FN P=TP+FN
- FP TN N=FP+TN
Accuracy= (TP+TN)/(P+N)
Precision= TP/(TP+FP)
Recall/Sensitivity=TP/P
Specificity = TN/N
False Alarm Rate= FP/N

Introduction to ML
52
• Precision : Percentage of positive instances out of the total predicted
positive instances.(from all the classes we have predicted as positive,
how many are actually positive)
• Recall/Sensitivity/True Positive Rate: Percentage of positive
instances out of the total actual positive instances.(from all the
positive classes, how many we predicted correctly.)
• Specificity: Percentage of negative instances out of the total actual
negative instances.
Confusion Matrix

Introduction to ML
53

Introduction to ML
54

Introduction to ML
55
we want to classify 10 new photos. We could use our classifier to do the
categorization of the photos. Each photo receives a prediction containing the
label (0 or 1) which represents the two classes (dog or not a dog).

Introduction to ML
56
we want to train a model that predicts if a photo contains a dog, cat, or rabbit. In this
case, the number of classes will be 3. Now imagine that we’re passing 27 photos to
be classified (predicted) and we get the following confusion matrix:
1

Introduction to ML
57
Cross Validation
• We use one part for training (i.e., to fit a hypothesis), and the
remaining part is called the validation set and is used to test the
generalization ability.

Introduction to ML
58
• That is, given a set of possible hypothesis classes Hi , for each we fit the best
hi Є Hi on the training set.
• Then, assuming large enough training and validation sets, the hypothesis
that is the most accurate on the validation set is the best one (the one that
has the best inductive bias).
• This process is called cross-validation.
• We have used the validation set to choose the best model, and it has
effectively become a part of the training set.
• We need a third set, a test set, sometimes also called the publication set,
containing examples not used in training or validation.
Cross Validation

Introduction to ML
59
• Split the data into k equal subsets
• Perform k rounds of learning; on each round
– 1/k of the data is held out as a test set and
– the remaining examples are used as training data.
• Compute the average test set score of the k rounds
K-fold Cross Validation

Introduction to ML
60
In machine learning, there is always a trade off between
– complex hypotheses that fit the training data well
– simpler hypotheses that may generalise better.
• As the amount of training data increases, the generalization error
decreases.
Trade-off

Introduction to ML
61

Introduction to ML
62

Introduction to ML
63
• Machine Learning- Tom Mitchell
• Introduction to Machine Learning – Ethem Alpaydin
References

Introduction to ML
64

Intro to ML.pptx

More Related Content

What's hot

Similar to Intro to ML.pptx

Recently uploaded

Intro to ML.pptx

Editor's Notes