Chapter Two
Machine Learning
“To gain knowledge or understanding of, or skill in by
study, instruction or experience''
 Learning a set of new facts.
 Learning HOW to do something.
 Improving the ability of something already learned.
What is Machine Learning?
 Machine Learning is the study of methods for programming
computers to learn.
 Building machines that automatically learn from experience.
 Enable computers to learn without being explicitly
programmed (Arthur Samuel, in 1959 at IBM)
What is Learning?
 Learning is gaining Knowledge from experience.
 A computer program is said to learn from experience E
concerning some tasks T and performance measure P, if its
performance at tasks T, as measured by P, improves with
experience E.
ML
Examples: i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within
images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given
classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an
error
• Training Experience: A sequence of images and steering
commands recorded while observing a human driver
ML
ML
 It is tough to write programs that solve complex problems.
(Defining the requirements)
 Computing the probability of a credit card transaction is
fraudulent.
 Recognizing a three-dimensional object.
 We don’t know what program to write because we don’t know
how it is done. (tacit knowledge not explicit)
 Even if we had a good idea about how to do it, the program
might be complicated.
Why ML?
 There may not be rules that are both simple and reliable.
 We need to combine a huge number of weak rules.
 Maybe the rules are changing frequently (dynamic)?
 E.g.. Fraud is a moving target.
 The program needs to keep changing.
 Instead of writing a program by hand for each specific task,
we collect many examples that specify the correct output for
a given input.
 A machine learning algorithm then takes these examples and
produces a program that does the job.
Why ML?
 The program produced by the learning algorithm may look
very different from a typical hand-written program.
 If we do it right, the program works for new cases as well as
the ones we trained it on.
 If the data changes the program can change too by training
on the new data.
 Massive amounts of computation are now cheaper than
paying someone to write a task-specific program.
Why ML?
 Machine Learning is great for:
 Problems for which existing solutions require a lot of
hand-tuning or long lists of rules,
 Complex problems for which there is no good solution at
all using a traditional approach,
 Fluctuating environments: a ML system can adapt to new
data,
 Getting insights about complex problems and large
amounts of data.
 Machine Learning Approaches: the program is much shorter,
easier to maintain, and most likely more accurate.
Why ML?
The general structure of a learning system
Machine learning vs. “classic(Traditional)” programming
In general, ML algorithms can be classified into 3 types.
1. Supervised Learning
• Classification
• Regression/Prediction
2. Unsupervised Learning
• Clustering
• Dimensionality Reduction
3. Reinforcement Learning
Types of Machine Learning
 Supervised Learning is a machine learning technique that
uses a collection of paired input-output training samples to
learn about a system's input-output connection.
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations.
 New data is classified based on the training set
Supervised Learning
Supervised Learning
Supervised Learning
 Supervised learning: (function approximation) [labels data
well]
output <- input2
 Learn to predict an output when given an input vector.
E.g.: Features: age, gender, smoking, drinking, etc…
Labels: having the disease, does not have the disease
Supervised Learning
 Supervised Learning:
 Regression Problem: numerical / continuous value.
 Given some data, you assume that those values come
from some sort of function and try to find out what the
function is.
 It is a problem of function approximation or
interpolation.
 Classification Problem: nominal / discrete value.
 Grouping the data into predetermined classes.
Supervised Learning
 Example:
Supervised Learning
Classification
 predicts categorical class labels (discrete or
nominal)
 classifies data (constructs a model) based on
the training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data
Regression
 Regression is a type of Supervised Learning
task in which the output has a continuous value.
 The term regression is used when you try to find
the relationship between variables.
 It used to understand the relationship between
dependent and independent variables.
Supervised Learning
 Unsupervised machine learning is the process of inferring
underlying hidden patterns from historical data.
 Within such an approach, a machine learning model tries to
find any similarities, differences, patterns, and structure in
data by itself.
 used when the information used to train is neither classified
nor labeled.
 There is no complete and clean labelled dataset.
 No prior human intervention is needed.
Unsupervised Learning
Unsupervised Learning
 unsupervised learning aims to find clusters of
similar inputs in the data without being explicitly told that
some data points belong to one class and the other to other
classes.
 The algorithm has to discover this similarity by itself.
 Discover a good internal representation of the input.
 Unsupervised learning: extracting structure from
data.
 Example: segment grocery store shoppers into clusters
that exhibit similar behaviors.
 There is no “right answer”
 Clustering
Unsupervised Learning
Clustering
 Clustering automatically categorizes data into groups
according to similarity criteria
 It evaluates the similarity based on a metric like Euclidean
distance, Cosine similarity, Manhattan distance
Unsupervised Learning
Unsupervised Learning
Dimensionality reduction
 In many learning problems, the datasets have a large
number of variables.
 For example, such situations have arisen in many scientific
fields such as image processing, time series analysis,
internet search engines, and automatic text analysis among
others.
 Statistical and machine learning methods have some
difficulty when dealing with such high-dimensional data.
 Normally the number of input variables is reduced before
the machine learning algorithms can be successfully applied.
Unsupervised Learning
Dimensionality reduction
 In statistical and machine learning, dimension reduction is
the process of reducing the number of variables under
consideration by obtaining a smaller set of principal
variables
 It addresses the number of attributes of the dataset by
transforming it from its original representation to one with a
reduced set of features.
 goal is to obtain a new dataset that preserves, up to a level,
the original structure of the data, so its analysis will result
in the same or equivalent patterns present in the original
Unsupervised Learning
Dimensionality reduction
feature selection and feature extraction.
1.Feature selection
 Is interested in finding k of the total of n features that give us the most
information and we discard the other (n−k) dimensions.
 By only keeping the most relevant variables from the original dataset
2. Feature Extraction
 Transforming the space containing many dimensions into space with
fewer dimensions.
 Is used when we keep the whole information but use fewer resources
while processing the information.
Unsupervised Learning
Curse of Dimensionality
 Refers to the challenges that arise when working with high-
dimensional data.
 High-dimensional data is challenging to handle.
 More features increase model complexity and risk of
overfitting.
 Therefore, Overfitting leads to poor performance on new
data.
Unsupervised Learning
The main drawbacks of high-dimensional datasets are
 Increased data requirements: More records are needed to
represent all feature combinations.
 Overfitting risk: More features can lead to overly complex
models that fit to outliers.
 Longer training times: Higher dimensionality increases
computational complexity, slowing training.
 Higher storage needs: Larger datasets consume more
storage space.
Cont..
How can an agent learn behaviors when it doesn’t have a
Supervisor(teacher) to tell it how to perform?
 The agent has a task to perform
 It takes some actions in the world
 At some later point, it gets feedback telling it how well it
did on performing the task
 The agent performs the same task over and over again
This problem is called reinforcement learning:
 The agent gets positive reinforcement for tasks done well
 The agent gets negative reinforcement for tasks done
poorly
Reinforcement learning
 RL is Learning from interaction with an environment to
achieve some long-term goal that is related to the state of
the environment
 The goal is to get the agent to act in the world to maximize
its rewards
 The agent has to figure out what it did that made it get the
reward/punishment
 RL is Applicable to game-playing, robot controllers, others.
Reinforcement learning
Components Definition
 State: The current situation of
the agent in the environment.
 Action: The decision made by
the agent that affects the state.
 Reward: The feedback given
by the environment based on
the agent’s action.
 Perception: how the agent
observes and interprets its
environment to construct the
state.
Reinforcement learning
Model
 It is the Program trained to find patterns in data and make
predictions.
 How it does:
 Input Data: Receives data requests.
 Prediction: Analyzes input to make predictions.
 Output: Provides responses based on predictions.
 Training Process:
 Initial Training: Models are trained on a dataset.
 Learning: Algorithms reason over data, extract patterns,
and learn.
 Usage:
 Once trained, models predict outcomes on new, unseen
data.
Model Evaluation
 Model evaluation is the process of using different evaluation
metrics to understand a machine learning model’s
performance and its strengths and weaknesses.
 Evaluation is necessary for ensuring that machine learning
models are reliable, generalizable, and capable of making
accurate predictions on new, unseen data.
 Two biggest causes of poor performance of machine
learning algorithms:
 Overfitting and
 underfitting
Model Evaluation
Overfitting: occurs when a model performs very well for
training data but performs poorly with test data (new data).
 Overfitting can happen due to low bias and high
variance
Underfitting: Occurs when the model cannot adequately
capture the underlying structure of the data.
Right Fit: Occurs when both the training data error and the test
data are minimal
Model Evaluation
Confusion Matrix
 A confusion matrix is a table that is often used to describe
the performance of a classification model (or “classifier”)
on a set of test data for which the true values are known.
 we have 2 main situations. Predicted (Before), Actual Values
(After)
Model Evaluation Metrix
True Positive: predicted positive, and it’s true.
True Negative: predicted negative, and it’s true.
False Positive: predicted positive, and it’s false.
False Negative: predicted negative, and it’s false.
Model Evaluation Metrix
Model Evaluation Metrix
Model Evaluation Metrix
ROC curve
 it is a visual representation of model performance across all
thresholds.
 works by plotting the true positive rate (TPR) on the y-axis
and the false positive rate (FPR) on the x-axis of a graph
Area Under Curve (AUC)
 measures the overall performance of the binary classification
model.
 As both TPR and FPR range between 0 to 1, the area will
always lie between 0 and 1, and a greater value of AUC
denotes better model performance
Model Evaluation Metrix
 the main goal is to maximize this area to have the highest
TPR and lowest FPR at the given threshold.
Model Evaluation Metrix
Reading Assignment
 Mean Absolute Error
 Mean Squared Error
 Root Mean Square Error
Model Evaluation Metrix
 It is the process of deciding which algorithm and model
architecture is best suited for a particular task or dataset.
 The first step in this process is to define a suitable
evaluation metric that matches the objectives of the
particular situation.
 To make wise selections, it frequently calls for an iterative
process that involves testing several models and
hyperparameters.
 Finding a model that fits the training set of data well and
generalizes well to new data is the objective.
Model Selection
Train-Test Split
 With this strategy, the available data is divided into two sets:
 a training set &
 a separate test set.
 The models are evaluated using a predetermined evaluation
metric on the test set after being trained on the training set.
Cross-Validation
 divides the data into various groups or folds.
 Several folds are used as the test set & the rest folds as the
training set, and the models undergo training and
evaluation on each fold separately.
Model Selection Techniques
 The main purpose of cross-validation is to prevent overfitting.
 By evaluating the model on multiple validation sets, cross-
validation provides a more realistic estimate of the model’s
generalization performance.
 Frequently used types of cross-validation:
 K-fold cross_validation
 Stratified cross-validation
Model Selection Techniques
K-fold Cross-validation
 we split the dataset into k number of subsets (folds) then we
perform training on all the subsets but leave one(k-1) subset
to evaluate the trained model.
 In this method, we iterate k times with a different subset
reserved for testing purposes each time.
Model Selection Techniques
Model Selection Techniques
Stratified Cross-validation
 Used to ensure that each fold of the cross-validation process
maintains the same class distribution as the entire dataset.
 This is particularly important when dealing with imbalanced
datasets, where certain classes may be underrepresented.
 In this method,
 The dataset is divided into k folds while maintaining the
proportion of classes in each fold
Model Selection Techniques
Hyperparameters
 they are parameters whose values control the learning
process and determine the values of model parameters that
a learning algorithm ends up learning.
 The prefix ‘hyper_’ suggests that they are ‘top-level’
parameters that control the learning process and the model
parameters that result from it.
 hyperparameters are said to be external to the model
because the model cannot change its values during
learning/training.
Model Selection Techniques
Hyperparameters
 they are parameters whose values control the learning
process and determine the values of model parameters that
a learning algorithm ends up learning.
 The prefix ‘hyper_’ suggests that they are ‘top-level’
parameters that control the learning process and the model
parameters that result from it.
 They are said to be external to the model because the
model cannot change its values during learning/training.
 They are used by the learning algorithm when it is learning
but they are not part of the resulting model.
Model Selection Techniques
Some common examples of Hyperparameters:
 Learning rate in optimization algorithms (e.g. gradient
descent)
 Optimization algorithm (e.g., gradient descent, stochastic
gradient descent, or Adam optimizer)
 Activation function in a neural network (nn) layer (e.g.
Sigmoid, ReLU, Tanh)
 loss function
 Number of hidden layers in a nn
 Number of neurons in each layer
 Drop-out rate in nn
 Number of iterations (epochs) in training a nn
 Number of clusters in a clustering task
 Kernel or filter size in convolutional layers
 Pooling size
 Batch size
Model Selection Techniques
Hyperparameters Tuning
 It is the process of selecting the optimal values for a
machine learning model’s hyperparameters.
 Models can have many hyperparameters and finding the
best combination of parameters can be treated as a search
problem.
 The two best strategies for Hyperparameter tuning are:
 Grid Search
 Randomized Search
 Bayesian Optimization
Model Selection Techniques
Hyperparameters Tuning example
Model Selection Techniques
 Select one ML/DL algorithm and discuss its:
 Overfitting handling techniques
 Hyperparameters and the relation they have
due date: 08/10/2024
presentation
Model Selection Techniques

chapter Three artificial intelligence 1.pptx

  • 1.
  • 2.
    “To gain knowledgeor understanding of, or skill in by study, instruction or experience''  Learning a set of new facts.  Learning HOW to do something.  Improving the ability of something already learned. What is Machine Learning?  Machine Learning is the study of methods for programming computers to learn.  Building machines that automatically learn from experience.  Enable computers to learn without being explicitly programmed (Arthur Samuel, in 1959 at IBM) What is Learning?
  • 3.
     Learning isgaining Knowledge from experience.  A computer program is said to learn from experience E concerning some tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E. ML
  • 4.
    Examples: i) Handwritingrecognition learning problem • Task T: Recognizing and classifying handwritten words within images • Performance P: Percent of words correctly classified • Training experience E: A dataset of handwritten words with given classifications ii) A robot driving learning problem • Task T: Driving on highways using vision sensors • Performance measure P: Average distance traveled before an error • Training Experience: A sequence of images and steering commands recorded while observing a human driver ML
  • 5.
  • 6.
     It istough to write programs that solve complex problems. (Defining the requirements)  Computing the probability of a credit card transaction is fraudulent.  Recognizing a three-dimensional object.  We don’t know what program to write because we don’t know how it is done. (tacit knowledge not explicit)  Even if we had a good idea about how to do it, the program might be complicated. Why ML?
  • 7.
     There maynot be rules that are both simple and reliable.  We need to combine a huge number of weak rules.  Maybe the rules are changing frequently (dynamic)?  E.g.. Fraud is a moving target.  The program needs to keep changing.  Instead of writing a program by hand for each specific task, we collect many examples that specify the correct output for a given input.  A machine learning algorithm then takes these examples and produces a program that does the job. Why ML?
  • 8.
     The programproduced by the learning algorithm may look very different from a typical hand-written program.  If we do it right, the program works for new cases as well as the ones we trained it on.  If the data changes the program can change too by training on the new data.  Massive amounts of computation are now cheaper than paying someone to write a task-specific program. Why ML?
  • 9.
     Machine Learningis great for:  Problems for which existing solutions require a lot of hand-tuning or long lists of rules,  Complex problems for which there is no good solution at all using a traditional approach,  Fluctuating environments: a ML system can adapt to new data,  Getting insights about complex problems and large amounts of data.  Machine Learning Approaches: the program is much shorter, easier to maintain, and most likely more accurate. Why ML?
  • 10.
    The general structureof a learning system
  • 11.
    Machine learning vs.“classic(Traditional)” programming
  • 12.
    In general, MLalgorithms can be classified into 3 types. 1. Supervised Learning • Classification • Regression/Prediction 2. Unsupervised Learning • Clustering • Dimensionality Reduction 3. Reinforcement Learning Types of Machine Learning
  • 13.
     Supervised Learningis a machine learning technique that uses a collection of paired input-output training samples to learn about a system's input-output connection.  Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations.  New data is classified based on the training set Supervised Learning
  • 14.
  • 15.
  • 16.
     Supervised learning:(function approximation) [labels data well] output <- input2  Learn to predict an output when given an input vector. E.g.: Features: age, gender, smoking, drinking, etc… Labels: having the disease, does not have the disease Supervised Learning
  • 17.
     Supervised Learning: Regression Problem: numerical / continuous value.  Given some data, you assume that those values come from some sort of function and try to find out what the function is.  It is a problem of function approximation or interpolation.  Classification Problem: nominal / discrete value.  Grouping the data into predetermined classes. Supervised Learning
  • 18.
  • 19.
    Classification  predicts categoricalclass labels (discrete or nominal)  classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Regression  Regression is a type of Supervised Learning task in which the output has a continuous value.  The term regression is used when you try to find the relationship between variables.  It used to understand the relationship between dependent and independent variables. Supervised Learning
  • 20.
     Unsupervised machinelearning is the process of inferring underlying hidden patterns from historical data.  Within such an approach, a machine learning model tries to find any similarities, differences, patterns, and structure in data by itself.  used when the information used to train is neither classified nor labeled.  There is no complete and clean labelled dataset.  No prior human intervention is needed. Unsupervised Learning
  • 21.
  • 22.
     unsupervised learningaims to find clusters of similar inputs in the data without being explicitly told that some data points belong to one class and the other to other classes.  The algorithm has to discover this similarity by itself.  Discover a good internal representation of the input.  Unsupervised learning: extracting structure from data.  Example: segment grocery store shoppers into clusters that exhibit similar behaviors.  There is no “right answer”  Clustering Unsupervised Learning
  • 23.
    Clustering  Clustering automaticallycategorizes data into groups according to similarity criteria  It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance Unsupervised Learning
  • 24.
  • 25.
    Dimensionality reduction  Inmany learning problems, the datasets have a large number of variables.  For example, such situations have arisen in many scientific fields such as image processing, time series analysis, internet search engines, and automatic text analysis among others.  Statistical and machine learning methods have some difficulty when dealing with such high-dimensional data.  Normally the number of input variables is reduced before the machine learning algorithms can be successfully applied. Unsupervised Learning
  • 26.
    Dimensionality reduction  Instatistical and machine learning, dimension reduction is the process of reducing the number of variables under consideration by obtaining a smaller set of principal variables  It addresses the number of attributes of the dataset by transforming it from its original representation to one with a reduced set of features.  goal is to obtain a new dataset that preserves, up to a level, the original structure of the data, so its analysis will result in the same or equivalent patterns present in the original Unsupervised Learning
  • 27.
    Dimensionality reduction feature selectionand feature extraction. 1.Feature selection  Is interested in finding k of the total of n features that give us the most information and we discard the other (n−k) dimensions.  By only keeping the most relevant variables from the original dataset 2. Feature Extraction  Transforming the space containing many dimensions into space with fewer dimensions.  Is used when we keep the whole information but use fewer resources while processing the information. Unsupervised Learning
  • 28.
    Curse of Dimensionality Refers to the challenges that arise when working with high- dimensional data.  High-dimensional data is challenging to handle.  More features increase model complexity and risk of overfitting.  Therefore, Overfitting leads to poor performance on new data. Unsupervised Learning
  • 29.
    The main drawbacksof high-dimensional datasets are  Increased data requirements: More records are needed to represent all feature combinations.  Overfitting risk: More features can lead to overly complex models that fit to outliers.  Longer training times: Higher dimensionality increases computational complexity, slowing training.  Higher storage needs: Larger datasets consume more storage space. Cont..
  • 30.
    How can anagent learn behaviors when it doesn’t have a Supervisor(teacher) to tell it how to perform?  The agent has a task to perform  It takes some actions in the world  At some later point, it gets feedback telling it how well it did on performing the task  The agent performs the same task over and over again This problem is called reinforcement learning:  The agent gets positive reinforcement for tasks done well  The agent gets negative reinforcement for tasks done poorly Reinforcement learning
  • 31.
     RL isLearning from interaction with an environment to achieve some long-term goal that is related to the state of the environment  The goal is to get the agent to act in the world to maximize its rewards  The agent has to figure out what it did that made it get the reward/punishment  RL is Applicable to game-playing, robot controllers, others. Reinforcement learning
  • 32.
    Components Definition  State:The current situation of the agent in the environment.  Action: The decision made by the agent that affects the state.  Reward: The feedback given by the environment based on the agent’s action.  Perception: how the agent observes and interprets its environment to construct the state. Reinforcement learning
  • 33.
    Model  It isthe Program trained to find patterns in data and make predictions.  How it does:  Input Data: Receives data requests.  Prediction: Analyzes input to make predictions.  Output: Provides responses based on predictions.  Training Process:  Initial Training: Models are trained on a dataset.  Learning: Algorithms reason over data, extract patterns, and learn.  Usage:  Once trained, models predict outcomes on new, unseen data. Model Evaluation
  • 34.
     Model evaluationis the process of using different evaluation metrics to understand a machine learning model’s performance and its strengths and weaknesses.  Evaluation is necessary for ensuring that machine learning models are reliable, generalizable, and capable of making accurate predictions on new, unseen data.  Two biggest causes of poor performance of machine learning algorithms:  Overfitting and  underfitting Model Evaluation
  • 35.
    Overfitting: occurs whena model performs very well for training data but performs poorly with test data (new data).  Overfitting can happen due to low bias and high variance Underfitting: Occurs when the model cannot adequately capture the underlying structure of the data. Right Fit: Occurs when both the training data error and the test data are minimal Model Evaluation
  • 36.
    Confusion Matrix  Aconfusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.  we have 2 main situations. Predicted (Before), Actual Values (After) Model Evaluation Metrix
  • 37.
    True Positive: predictedpositive, and it’s true. True Negative: predicted negative, and it’s true. False Positive: predicted positive, and it’s false. False Negative: predicted negative, and it’s false. Model Evaluation Metrix
  • 38.
  • 39.
  • 40.
    ROC curve  itis a visual representation of model performance across all thresholds.  works by plotting the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis of a graph Area Under Curve (AUC)  measures the overall performance of the binary classification model.  As both TPR and FPR range between 0 to 1, the area will always lie between 0 and 1, and a greater value of AUC denotes better model performance Model Evaluation Metrix
  • 41.
     the maingoal is to maximize this area to have the highest TPR and lowest FPR at the given threshold. Model Evaluation Metrix
  • 42.
    Reading Assignment  MeanAbsolute Error  Mean Squared Error  Root Mean Square Error Model Evaluation Metrix
  • 43.
     It isthe process of deciding which algorithm and model architecture is best suited for a particular task or dataset.  The first step in this process is to define a suitable evaluation metric that matches the objectives of the particular situation.  To make wise selections, it frequently calls for an iterative process that involves testing several models and hyperparameters.  Finding a model that fits the training set of data well and generalizes well to new data is the objective. Model Selection
  • 44.
    Train-Test Split  Withthis strategy, the available data is divided into two sets:  a training set &  a separate test set.  The models are evaluated using a predetermined evaluation metric on the test set after being trained on the training set. Cross-Validation  divides the data into various groups or folds.  Several folds are used as the test set & the rest folds as the training set, and the models undergo training and evaluation on each fold separately. Model Selection Techniques
  • 45.
     The mainpurpose of cross-validation is to prevent overfitting.  By evaluating the model on multiple validation sets, cross- validation provides a more realistic estimate of the model’s generalization performance.  Frequently used types of cross-validation:  K-fold cross_validation  Stratified cross-validation Model Selection Techniques
  • 46.
    K-fold Cross-validation  wesplit the dataset into k number of subsets (folds) then we perform training on all the subsets but leave one(k-1) subset to evaluate the trained model.  In this method, we iterate k times with a different subset reserved for testing purposes each time. Model Selection Techniques
  • 47.
  • 48.
    Stratified Cross-validation  Usedto ensure that each fold of the cross-validation process maintains the same class distribution as the entire dataset.  This is particularly important when dealing with imbalanced datasets, where certain classes may be underrepresented.  In this method,  The dataset is divided into k folds while maintaining the proportion of classes in each fold Model Selection Techniques
  • 49.
    Hyperparameters  they areparameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning.  The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it.  hyperparameters are said to be external to the model because the model cannot change its values during learning/training. Model Selection Techniques
  • 50.
    Hyperparameters  they areparameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning.  The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it.  They are said to be external to the model because the model cannot change its values during learning/training.  They are used by the learning algorithm when it is learning but they are not part of the resulting model. Model Selection Techniques
  • 51.
    Some common examplesof Hyperparameters:  Learning rate in optimization algorithms (e.g. gradient descent)  Optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)  Activation function in a neural network (nn) layer (e.g. Sigmoid, ReLU, Tanh)  loss function  Number of hidden layers in a nn  Number of neurons in each layer  Drop-out rate in nn  Number of iterations (epochs) in training a nn  Number of clusters in a clustering task  Kernel or filter size in convolutional layers  Pooling size  Batch size Model Selection Techniques
  • 52.
    Hyperparameters Tuning  Itis the process of selecting the optimal values for a machine learning model’s hyperparameters.  Models can have many hyperparameters and finding the best combination of parameters can be treated as a search problem.  The two best strategies for Hyperparameter tuning are:  Grid Search  Randomized Search  Bayesian Optimization Model Selection Techniques
  • 53.
  • 54.
     Select oneML/DL algorithm and discuss its:  Overfitting handling techniques  Hyperparameters and the relation they have due date: 08/10/2024 presentation Model Selection Techniques