SlideShare a Scribd company logo
1 of 40
Supervised Learning
• Supervised learning is the most common sub-branch of machine
learning today.
• Typically, new machine learning practitioners will begin their journey
with supervised learning algorithms. Therefore, the first of this three
post series will be about supervised learning.
• Supervised machine learning algorithms are designed to learn by
example.
• The name “supervised” learning originates from the idea that training
this type of algorithm is like having a teacher supervise the whole
process.
Supervised Learning
• When training a supervised learning algorithm, the training data will
consist of inputs paired with the correct outputs.
• During training, the algorithm will search for patterns in the data that
correlate with the desired outputs.
• After training, a supervised learning algorithm will take in new unseen
inputs and will determine which label the new inputs will be classified
as based on prior training data.
• The objective of a supervised learning model is to predict the correct
label for newly presented input data.
Supervised Learning
• At its most basic form, a supervised learning algorithm can be written
simply as:
• Where Y is the predicted output that is determined by a mapping
function that assigns a class to an input value x.
• The function used to connect input features to a predicted output is
created by the machine learning model during training.
Types of Supervised Learning
• Supervised learning can be split into
two subcategories: Classification and
regression.
• Classification:
• During training, a classification
algorithm will be given data points
with an assigned category. The job of a
classification algorithm is to then take
an input value and assign it a class, or
category, that it fits into based on the
training data provided.
Types of Supervised Learning
• Classification:
• The most common example of classification is determining if an email is
spam or not.
• With two classes to choose from (spam, or not spam), this problem is
called a binary classification problem. The algorithm will be given training
data with emails that are both spam and not spam.
• The model will find the features within the data that correlate to either
class and create the mapping function mentioned earlier: Y=f(x).
• Then, when provided with an unseen email, the model will use this
function to determine whether or not the email is spam.
Types of Supervised Learning
• Classification:
• Classification problems can be solved with a numerous amount of
algorithms. Whichever algorithm you choose to use depends on the
data and the situation. Here are a few popular classification
algorithms:
• Linear Classifiers
• Support Vector Machines
• Decision Trees
• K-Nearest Neighbor
• Random Forest
Types of Supervised Learning
• Regression
• Regression is a predictive statistical process where the model
attempts to find the important relationship between dependent and
independent variables. The goal of a regression algorithm is to predict
a continuous number such as sales, income, and test scores. The
equation for basic linear regression can be written as so:
• Where x[i] is the feature(s) for the data and where w[i] and b are
parameters which are developed during training.
Types of Supervised Learning
• For simple linear regression models with only one feature in the data,
the formula looks like this:
• Where w is the slope, x is the single feature and b is the y-intercept.
Familiar?
• For simple regression problems such as this, the models predictions
are represented by the line of best fit.
• For models using two features, the plane will be used. Finally, for a
model using more than two features, a hyperplane will be used.
Types of Supervised Learning
• Imagine we want to determine a student’s test
grade based on how many hours they studied the
week of the test. Lets say the plotted data with a
line of best fit looks like this:
• There is a clear positive correlation between
hours studied (independent variable) and the
student’s final test score (dependent variable).
• A line of best fit can be drawn through the data
points to show the models predictions when
given a new input.
• Say we wanted to know how well a student would
do with five hours of studying. We can use the
line of best fit to predict the test score based on
other student’s performances.
Types of Supervised Learning
• There are many different types of regression algorithms. The three
most common are listed below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
Summary
• Supervised learning is the simplest subcategory of machine learning
and serves as an introduction to machine learning to many machine
learning practitioners.
• Supervised learning is the most commonly used form of machine
learning, and has proven to be an excellent tool in many fields.
Learning Curves
• A learning curve is just a plot showing the progress over the
experience of a specific metric related to learning during the training
of a machine learning model.
• They are just a mathematical representation of the learning process.
Single Curves
• The most popular example of a learning
curve is loss over time. Loss (or cost)
measures our model error, or “how bad
our model is doing”.
• So, for now, the lower our loss becomes,
the better our model performance will
be.
• In the picture below, we can see the
expected behavior of the learning
process:
• Despite the fact it has slight ups and
downs, in the long term, the loss
decreases over time, so the model is
learning.
Single Curves
• Other examples of very popular
learning curves are accuracy,
precision, and recall.
• All of these capture model
performance, so the higher they are,
the better our model becomes.
• See below an example of a typical
accuracy curve over time:
• The model performance is growing
over time, which means the model is
improving with experience (it’s
learning).
• We also see it grows at the beginning,
but over time it reaches a plateau,
meaning it’s not able to learn
anymore.
Multiple Curves
• One of the most widely used metrics
combinations is training loss + validation loss
over time.
• The training loss indicates how well the model
is fitting the training data, while the validation
loss indicates how well the model fits new
data.
• We will see this combination later on, but for
now, see below a typical plot showing both
metrics:
• Another common practice is to have multiple
metrics in the same chart as well as those
metrics for different models.
Two Main Types
• We often see these two types of learning
curves appearing in charts:
• Optimization Learning Curves: Learning
curves calculated on the metric by which
the parameters of the model are being
optimized, such as loss or Mean Squared
Error
• Performance Learning Curves: Learning
curves calculated on the metric by which
the model will be evaluated and selected,
such as accuracy, precision, recall, or F1
score
• Below you can see an example in Machine
Translation showing BLEU (a performance
score) together with the loss (optimization
score) for two different models (orange and
green):
How to Detect Model Behavior?
• High Bias/Underfitting
• Bias: High bias occurs when the learning algorithm is not taking into
account all the relevant information, becoming unable to capture the
model’s richness and complexity
• Underfitting: When the algorithm is not able to model either training data
or new data, consistently obtaining high error values that don’t decrease
over time
• We can see they are closely tied, as the more biased a model is, the more it
underfits the data.
• Let’s imagine our data are the blue dots below, and we want to come up
with a linear model for regression purposes:
How to Detect Model Behavior?
• High Bias/Underfitting
• Let’s imagine our data are the blue dots below, and we want to come up with a
linear model for regression purposes:
• Suppose we’re very lazy machine learning practitioners and we propose this line
as a model:
• Clearly, a straight line like that doesn’t represent the pattern of our dots. It lacks
some complexity to describe the nature of the given data. We can see how the
biased model doesn’t take into account relevant information, which leads to
underfitting.
How to Detect Model Behavior?
• It’s doing a terrible job with the training data already, so what would
be the performance for a new example?
• It’s pretty obvious it performs as poorly with the new example as it
does with the training data:
How to Detect Model Behavior?
• Now, how can we use learning curves to detect
our model is underfitting? See an example
showing validation and training cost (loss)
curves:
• The cost (loss) function is high and doesn’t
decrease with the number of iterations, both for
the validation and training curves
• We could actually use just the training curve and
check that the loss is high and that it doesn’t
decrease, to see that it’s underfitting
High Variance/Overfitting
• Variance: High variance happens when the model is too complex and
doesn’t represent the simpler real patterns existing in the data
• Overfitting: The algorithm captures well the training data, but it
performs poorly on new data, so it’s not able to generalize
• These are also directly related concepts: The higher the variance of a
model, the more it overfits the training data.
• Let’s take the same example as before, where we wanted a linear
model to approximate these blue dots:
High Variance/Overfitting
• Well, we understand intuitively that this line is not what we wanted,
either. Indeed, it fits the data, but it doesn’t represent the real
pattern in it.
• When a new example appears, it will struggle to model it. See a new
example (in orange):
• Using the overfitted model, it won’t predict well enough the new
example:
High Variance/Overfitting
• How could we use learning curves to detect a
model is overfitting? We’ll need both the
validation and training loss curves:
• The training loss goes down over time, achieving
low error values
• The validation loss goes down until a turning
point is found, and there it starts going up again.
That point represents the beginning of
overfitting
Finding the Right Bias/Variance Tradeoff
• The solution to the bias/variance problem is to find a sweet spot
between them.
• In the example given above:
• a good linear model for the data would be a line like this:
• So, when a new example appears:
• We will make a better prediction:
Finding the Right Bias/Variance Tradeoff
• We can use the validation and training
loss curves to find the right bias/variance
tradeoff:
• The training process should be stopped
when the validation error trend changes
from descending to ascending
• If we stop the process before that point,
the model will underfit
• If we stop the process after that point, the
model will overfit
Training, Validation and Test.
• Training data. This type of data builds up the machine learning
algorithm. The data scientist feeds the algorithm input data, which
corresponds to an expected output.
• The model evaluates the data repeatedly to learn more about the
data’s behavior and then adjusts itself to serve its intended purpose.
• Validation data. During training, validation data infuses new data into
the model that it hasn’t evaluated before. Validation data provides
the first test against unseen data, allowing data scientists to evaluate
how well the model makes predictions based on the new data.
• Not all data scientists use validation data, but it can provide some
helpful information to optimize hyperparameters, which influence
how the model assesses data.
Training, Validation and Test.
• Test data. After the model is built, testing data once again validates
that it can make accurate predictions.
• If training and validation data include labels to monitor performance
metrics of the model, the testing data should be unlabeled. Test data
provides a final, real-world check of an unseen dataset to confirm
that the ML algorithm was trained effectively.
• While each of these three datasets has its place in creating and
training ML models, it’s easy to see some overlap between them.
• The difference between training data vs. test data is clear: one trains
a model, the other confirms it works correctly, but confusion can pop
up between the functional similarities and differences of other types
of datasets.
Training data vs. validation data
• ML algorithms require training data to achieve an objective. The algorithm
will analyze this training dataset, classify the inputs and outputs, then
analyze it again. Trained enough, an algorithm will essentially memorize all
of the inputs and outputs in a training dataset — this becomes a problem
when it needs to consider data from other sources, such as real-world
customers.
• Here is where validation data is useful. Validation data provides an initial
check that the model can return useful predictions in a real-world setting,
which training data cannot do. The ML algorithm can assess training data
and validation data at the same time.
• Validation data is an entirely separate segment of data, though a data
scientist might carve out part of the training dataset for validation — as
long as the datasets are kept separate throughout the entirety of training
and testing.
Training data vs. validation data
• For example, let’s say an ML algorithm is supposed to analyze a
picture of a vertebrate and provide its scientific classification.
• The training dataset would include lots of pictures of mammals, but
not all pictures of all mammals, let alone all pictures of all
vertebrates. So, when the validation data provides a picture of a
squirrel, an animal the model hasn’t seen before, the data scientist
can assess how well the algorithm performs in that task.
• This is a check against an entirely different dataset than the one it was
trained on.
Training data vs. validation data
• Based on the accuracy of the predictions after the validation stage,
data scientists can adjust hyperparameters such as learning rate,
input features and hidden layers. These adjustments prevent
overfitting, in which the algorithm can make excellent determinations
on the training data, but can't effectively adjust predictions for
additional data.
• The opposite problem, underfitting, occurs when the model isn’t
complex enough to make accurate predictions against either training
data or new data.
• In short, when you see good predictions on both the training datasets
and validation datasets, you can have confidence that the algorithm
works as intended on new data, not just a small subset of data.
Validation data vs. testing data
• Not all data scientists rely on both validation data and testing data. To
some degree, both datasets serve the same purpose: make sure the
model works on real data.
• However, there are some practical differences between validation
data and testing data.
• If you opt to include a separate stage for validation data analysis, this
dataset is typically labeled so the data scientist can collect metrics
that they can use to better train the model.
Validation data vs. testing data
• In this sense, validation data occurs as part of the model training
process.
• Conversely, the model acts as a black box when you run testing data
through it. Thus, validation data tunes the model, whereas testing
data simply confirms that it works.
• There is some semantic ambiguity between validation data and
testing data.
• Some organizations call testing datasets “validation datasets.”
Ultimately, if there are three datasets to tune and check ML
algorithms, validation data typically helps tune the algorithm and
testing data provides the final assessment.
Generalization
• In machine learning, generalization is a definition to demonstrate how
well is a trained model to classify or forecast unseen data.
• Training a generalized machine learning model means, in general, it
works for all subset of unseen data.
• An example is when we train a model to classify between dogs and
cats.
• If the model is provided with dogs images dataset with only two
breeds, it may obtain a good performance.
Generalization
• But, it possibly gets a low classification score when it is tested by
other breeds of dogs as well.
• This issue can result to classify an actual dog image as a cat from the
unseen dataset.
• Therefore, data diversity is very important factor in order to make a
good prediction.
• In the sample above, the model may obtain 85% performance score
when it is tested by only two dog breeds and gains 70% if trained by
all breeds.
• However, the first possibly gets a very low score (e.g. 45%) if it is
evaluated by an unseen dataset with all breed dogs.
Generalization
• This for the latter can be unchanged given than it has been trained by
high data diversity including all possible breeds.
• It should be taken into account that data diversity is not the only
point to care in order to have a generalized model.
• It can be resulted by nature of a machine learning algorithm, or by
poor hyper-parameter configuration.
Variance-bias trade-off
• The prediction results of a machine learning model stand somewhere
between
• a) low-bias, low-variance
• b) low-bias, high-variance
• c) high-bias, low-variance,
• d) high-bias, high-variance.
• A low-biased, high-variance model is called overfit and a high-biased,
low-variance model is called underfit.
Variance-bias trade-off
• By generalization, we find the best trade-off between underfitting and
overfitting so that a trained model obtains the best performance.
• An overfit model obtains a high prediction score on seen data and low
one from unseen datsets. An underfit model has low performance in
both seen and unseen datasets.
Three models with underfitting (left), goodfit (middle), and overfitting (right).
Determinant factors to train generalized
models
• Dataset
• In order to train a classifier and generate a generalized machine
learning model, a used dataset should contain diversity. It should be
noted that it doesn’t mean a huge dataset but a dataset containing all
different samples.
• This helps classifier to be trained not only from a specific subset of
data and therefore, the generalization is better fulfilled.
• In addition, during training, it is recommended to use cross validation
techniques such as K-fold or Monte-Carlo cross validations. These
techniques better secure to exploit all possible portions of data and
to avoid generating an overfit model.
Determinant factors to train generalized
models
• Machine Learning algorithm
• Machine learning algorithms differently act against overfitting,
underfitting.
• Overfitting is more likely with nonlinear, non-parametric machine
learning algorithms.
• For instance, Decision Tree is a non-parametric machine learning
algorithms, meaning its model is more likely with overfitting.
• On the other hand, some machine learning models are too simple to
capture complex underlying patterns in data.
• This cause to build an underfit model. Examples are linear and logistic
regression.
Determinant factors to train generalized
models
• Model complexity
• When a machine learning models becomes too complex, it is usually prone
to overfitting. There are methods that help to make the model simpler.
• They are called Regularization methods. Following we explain it.
• Regularization
• Regularization is collection of methods to make a machine learning model
simpler.
• To this end, certain approaches are applied to different machine learning
algorithms, for instance, pruning for decision trees, dropout techniques for
neural networks, and adding a penalty parameters to the cost function in
Regression.

More Related Content

Similar to Supervised Machine Learning Guide

Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptxmohammedalherwi1
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
Random Forest.pptx
Random Forest.pptxRandom Forest.pptx
Random Forest.pptxSPIDERSRSTV
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptxDOUGLASBILLY
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.pptSharpmark256
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptxYouKnowwho28
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulationKarishma Chaudhary
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 
Supervised Machine Learning.pptx
Supervised Machine Learning.pptxSupervised Machine Learning.pptx
Supervised Machine Learning.pptxChanduChandran6
 

Similar to Supervised Machine Learning Guide (20)

Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Random Forest.pptx
Random Forest.pptxRandom Forest.pptx
Random Forest.pptx
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
Regresión
RegresiónRegresión
Regresión
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.ppt
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
Supervised Machine Learning.pptx
Supervised Machine Learning.pptxSupervised Machine Learning.pptx
Supervised Machine Learning.pptx
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 

Supervised Machine Learning Guide

  • 1. Supervised Learning • Supervised learning is the most common sub-branch of machine learning today. • Typically, new machine learning practitioners will begin their journey with supervised learning algorithms. Therefore, the first of this three post series will be about supervised learning. • Supervised machine learning algorithms are designed to learn by example. • The name “supervised” learning originates from the idea that training this type of algorithm is like having a teacher supervise the whole process.
  • 2. Supervised Learning • When training a supervised learning algorithm, the training data will consist of inputs paired with the correct outputs. • During training, the algorithm will search for patterns in the data that correlate with the desired outputs. • After training, a supervised learning algorithm will take in new unseen inputs and will determine which label the new inputs will be classified as based on prior training data. • The objective of a supervised learning model is to predict the correct label for newly presented input data.
  • 3. Supervised Learning • At its most basic form, a supervised learning algorithm can be written simply as: • Where Y is the predicted output that is determined by a mapping function that assigns a class to an input value x. • The function used to connect input features to a predicted output is created by the machine learning model during training.
  • 4. Types of Supervised Learning • Supervised learning can be split into two subcategories: Classification and regression. • Classification: • During training, a classification algorithm will be given data points with an assigned category. The job of a classification algorithm is to then take an input value and assign it a class, or category, that it fits into based on the training data provided.
  • 5. Types of Supervised Learning • Classification: • The most common example of classification is determining if an email is spam or not. • With two classes to choose from (spam, or not spam), this problem is called a binary classification problem. The algorithm will be given training data with emails that are both spam and not spam. • The model will find the features within the data that correlate to either class and create the mapping function mentioned earlier: Y=f(x). • Then, when provided with an unseen email, the model will use this function to determine whether or not the email is spam.
  • 6. Types of Supervised Learning • Classification: • Classification problems can be solved with a numerous amount of algorithms. Whichever algorithm you choose to use depends on the data and the situation. Here are a few popular classification algorithms: • Linear Classifiers • Support Vector Machines • Decision Trees • K-Nearest Neighbor • Random Forest
  • 7. Types of Supervised Learning • Regression • Regression is a predictive statistical process where the model attempts to find the important relationship between dependent and independent variables. The goal of a regression algorithm is to predict a continuous number such as sales, income, and test scores. The equation for basic linear regression can be written as so: • Where x[i] is the feature(s) for the data and where w[i] and b are parameters which are developed during training.
  • 8. Types of Supervised Learning • For simple linear regression models with only one feature in the data, the formula looks like this: • Where w is the slope, x is the single feature and b is the y-intercept. Familiar? • For simple regression problems such as this, the models predictions are represented by the line of best fit. • For models using two features, the plane will be used. Finally, for a model using more than two features, a hyperplane will be used.
  • 9. Types of Supervised Learning • Imagine we want to determine a student’s test grade based on how many hours they studied the week of the test. Lets say the plotted data with a line of best fit looks like this: • There is a clear positive correlation between hours studied (independent variable) and the student’s final test score (dependent variable). • A line of best fit can be drawn through the data points to show the models predictions when given a new input. • Say we wanted to know how well a student would do with five hours of studying. We can use the line of best fit to predict the test score based on other student’s performances.
  • 10. Types of Supervised Learning • There are many different types of regression algorithms. The three most common are listed below: • Linear Regression • Logistic Regression • Polynomial Regression
  • 11. Summary • Supervised learning is the simplest subcategory of machine learning and serves as an introduction to machine learning to many machine learning practitioners. • Supervised learning is the most commonly used form of machine learning, and has proven to be an excellent tool in many fields.
  • 12. Learning Curves • A learning curve is just a plot showing the progress over the experience of a specific metric related to learning during the training of a machine learning model. • They are just a mathematical representation of the learning process.
  • 13. Single Curves • The most popular example of a learning curve is loss over time. Loss (or cost) measures our model error, or “how bad our model is doing”. • So, for now, the lower our loss becomes, the better our model performance will be. • In the picture below, we can see the expected behavior of the learning process: • Despite the fact it has slight ups and downs, in the long term, the loss decreases over time, so the model is learning.
  • 14. Single Curves • Other examples of very popular learning curves are accuracy, precision, and recall. • All of these capture model performance, so the higher they are, the better our model becomes. • See below an example of a typical accuracy curve over time: • The model performance is growing over time, which means the model is improving with experience (it’s learning). • We also see it grows at the beginning, but over time it reaches a plateau, meaning it’s not able to learn anymore.
  • 15. Multiple Curves • One of the most widely used metrics combinations is training loss + validation loss over time. • The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data. • We will see this combination later on, but for now, see below a typical plot showing both metrics: • Another common practice is to have multiple metrics in the same chart as well as those metrics for different models.
  • 16. Two Main Types • We often see these two types of learning curves appearing in charts: • Optimization Learning Curves: Learning curves calculated on the metric by which the parameters of the model are being optimized, such as loss or Mean Squared Error • Performance Learning Curves: Learning curves calculated on the metric by which the model will be evaluated and selected, such as accuracy, precision, recall, or F1 score • Below you can see an example in Machine Translation showing BLEU (a performance score) together with the loss (optimization score) for two different models (orange and green):
  • 17. How to Detect Model Behavior? • High Bias/Underfitting • Bias: High bias occurs when the learning algorithm is not taking into account all the relevant information, becoming unable to capture the model’s richness and complexity • Underfitting: When the algorithm is not able to model either training data or new data, consistently obtaining high error values that don’t decrease over time • We can see they are closely tied, as the more biased a model is, the more it underfits the data. • Let’s imagine our data are the blue dots below, and we want to come up with a linear model for regression purposes:
  • 18. How to Detect Model Behavior? • High Bias/Underfitting • Let’s imagine our data are the blue dots below, and we want to come up with a linear model for regression purposes: • Suppose we’re very lazy machine learning practitioners and we propose this line as a model: • Clearly, a straight line like that doesn’t represent the pattern of our dots. It lacks some complexity to describe the nature of the given data. We can see how the biased model doesn’t take into account relevant information, which leads to underfitting.
  • 19. How to Detect Model Behavior? • It’s doing a terrible job with the training data already, so what would be the performance for a new example? • It’s pretty obvious it performs as poorly with the new example as it does with the training data:
  • 20. How to Detect Model Behavior? • Now, how can we use learning curves to detect our model is underfitting? See an example showing validation and training cost (loss) curves: • The cost (loss) function is high and doesn’t decrease with the number of iterations, both for the validation and training curves • We could actually use just the training curve and check that the loss is high and that it doesn’t decrease, to see that it’s underfitting
  • 21. High Variance/Overfitting • Variance: High variance happens when the model is too complex and doesn’t represent the simpler real patterns existing in the data • Overfitting: The algorithm captures well the training data, but it performs poorly on new data, so it’s not able to generalize • These are also directly related concepts: The higher the variance of a model, the more it overfits the training data. • Let’s take the same example as before, where we wanted a linear model to approximate these blue dots:
  • 22. High Variance/Overfitting • Well, we understand intuitively that this line is not what we wanted, either. Indeed, it fits the data, but it doesn’t represent the real pattern in it. • When a new example appears, it will struggle to model it. See a new example (in orange): • Using the overfitted model, it won’t predict well enough the new example:
  • 23. High Variance/Overfitting • How could we use learning curves to detect a model is overfitting? We’ll need both the validation and training loss curves: • The training loss goes down over time, achieving low error values • The validation loss goes down until a turning point is found, and there it starts going up again. That point represents the beginning of overfitting
  • 24. Finding the Right Bias/Variance Tradeoff • The solution to the bias/variance problem is to find a sweet spot between them. • In the example given above: • a good linear model for the data would be a line like this: • So, when a new example appears: • We will make a better prediction:
  • 25. Finding the Right Bias/Variance Tradeoff • We can use the validation and training loss curves to find the right bias/variance tradeoff: • The training process should be stopped when the validation error trend changes from descending to ascending • If we stop the process before that point, the model will underfit • If we stop the process after that point, the model will overfit
  • 26. Training, Validation and Test. • Training data. This type of data builds up the machine learning algorithm. The data scientist feeds the algorithm input data, which corresponds to an expected output. • The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose. • Validation data. During training, validation data infuses new data into the model that it hasn’t evaluated before. Validation data provides the first test against unseen data, allowing data scientists to evaluate how well the model makes predictions based on the new data. • Not all data scientists use validation data, but it can provide some helpful information to optimize hyperparameters, which influence how the model assesses data.
  • 27. Training, Validation and Test. • Test data. After the model is built, testing data once again validates that it can make accurate predictions. • If training and validation data include labels to monitor performance metrics of the model, the testing data should be unlabeled. Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively. • While each of these three datasets has its place in creating and training ML models, it’s easy to see some overlap between them. • The difference between training data vs. test data is clear: one trains a model, the other confirms it works correctly, but confusion can pop up between the functional similarities and differences of other types of datasets.
  • 28. Training data vs. validation data • ML algorithms require training data to achieve an objective. The algorithm will analyze this training dataset, classify the inputs and outputs, then analyze it again. Trained enough, an algorithm will essentially memorize all of the inputs and outputs in a training dataset — this becomes a problem when it needs to consider data from other sources, such as real-world customers. • Here is where validation data is useful. Validation data provides an initial check that the model can return useful predictions in a real-world setting, which training data cannot do. The ML algorithm can assess training data and validation data at the same time. • Validation data is an entirely separate segment of data, though a data scientist might carve out part of the training dataset for validation — as long as the datasets are kept separate throughout the entirety of training and testing.
  • 29. Training data vs. validation data • For example, let’s say an ML algorithm is supposed to analyze a picture of a vertebrate and provide its scientific classification. • The training dataset would include lots of pictures of mammals, but not all pictures of all mammals, let alone all pictures of all vertebrates. So, when the validation data provides a picture of a squirrel, an animal the model hasn’t seen before, the data scientist can assess how well the algorithm performs in that task. • This is a check against an entirely different dataset than the one it was trained on.
  • 30. Training data vs. validation data • Based on the accuracy of the predictions after the validation stage, data scientists can adjust hyperparameters such as learning rate, input features and hidden layers. These adjustments prevent overfitting, in which the algorithm can make excellent determinations on the training data, but can't effectively adjust predictions for additional data. • The opposite problem, underfitting, occurs when the model isn’t complex enough to make accurate predictions against either training data or new data. • In short, when you see good predictions on both the training datasets and validation datasets, you can have confidence that the algorithm works as intended on new data, not just a small subset of data.
  • 31. Validation data vs. testing data • Not all data scientists rely on both validation data and testing data. To some degree, both datasets serve the same purpose: make sure the model works on real data. • However, there are some practical differences between validation data and testing data. • If you opt to include a separate stage for validation data analysis, this dataset is typically labeled so the data scientist can collect metrics that they can use to better train the model.
  • 32. Validation data vs. testing data • In this sense, validation data occurs as part of the model training process. • Conversely, the model acts as a black box when you run testing data through it. Thus, validation data tunes the model, whereas testing data simply confirms that it works. • There is some semantic ambiguity between validation data and testing data. • Some organizations call testing datasets “validation datasets.” Ultimately, if there are three datasets to tune and check ML algorithms, validation data typically helps tune the algorithm and testing data provides the final assessment.
  • 33. Generalization • In machine learning, generalization is a definition to demonstrate how well is a trained model to classify or forecast unseen data. • Training a generalized machine learning model means, in general, it works for all subset of unseen data. • An example is when we train a model to classify between dogs and cats. • If the model is provided with dogs images dataset with only two breeds, it may obtain a good performance.
  • 34. Generalization • But, it possibly gets a low classification score when it is tested by other breeds of dogs as well. • This issue can result to classify an actual dog image as a cat from the unseen dataset. • Therefore, data diversity is very important factor in order to make a good prediction. • In the sample above, the model may obtain 85% performance score when it is tested by only two dog breeds and gains 70% if trained by all breeds. • However, the first possibly gets a very low score (e.g. 45%) if it is evaluated by an unseen dataset with all breed dogs.
  • 35. Generalization • This for the latter can be unchanged given than it has been trained by high data diversity including all possible breeds. • It should be taken into account that data diversity is not the only point to care in order to have a generalized model. • It can be resulted by nature of a machine learning algorithm, or by poor hyper-parameter configuration.
  • 36. Variance-bias trade-off • The prediction results of a machine learning model stand somewhere between • a) low-bias, low-variance • b) low-bias, high-variance • c) high-bias, low-variance, • d) high-bias, high-variance. • A low-biased, high-variance model is called overfit and a high-biased, low-variance model is called underfit.
  • 37. Variance-bias trade-off • By generalization, we find the best trade-off between underfitting and overfitting so that a trained model obtains the best performance. • An overfit model obtains a high prediction score on seen data and low one from unseen datsets. An underfit model has low performance in both seen and unseen datasets. Three models with underfitting (left), goodfit (middle), and overfitting (right).
  • 38. Determinant factors to train generalized models • Dataset • In order to train a classifier and generate a generalized machine learning model, a used dataset should contain diversity. It should be noted that it doesn’t mean a huge dataset but a dataset containing all different samples. • This helps classifier to be trained not only from a specific subset of data and therefore, the generalization is better fulfilled. • In addition, during training, it is recommended to use cross validation techniques such as K-fold or Monte-Carlo cross validations. These techniques better secure to exploit all possible portions of data and to avoid generating an overfit model.
  • 39. Determinant factors to train generalized models • Machine Learning algorithm • Machine learning algorithms differently act against overfitting, underfitting. • Overfitting is more likely with nonlinear, non-parametric machine learning algorithms. • For instance, Decision Tree is a non-parametric machine learning algorithms, meaning its model is more likely with overfitting. • On the other hand, some machine learning models are too simple to capture complex underlying patterns in data. • This cause to build an underfit model. Examples are linear and logistic regression.
  • 40. Determinant factors to train generalized models • Model complexity • When a machine learning models becomes too complex, it is usually prone to overfitting. There are methods that help to make the model simpler. • They are called Regularization methods. Following we explain it. • Regularization • Regularization is collection of methods to make a machine learning model simpler. • To this end, certain approaches are applied to different machine learning algorithms, for instance, pruning for decision trees, dropout techniques for neural networks, and adding a penalty parameters to the cost function in Regression.