chapter Three artificial intelligence 1.pptx

“To gain knowledge or understanding of, or skill in by
study, instruction or experience''
 Learning a set of new facts.
 Learning HOW to do something.
 Improving the ability of something already learned.
What is Machine Learning?
 Machine Learning is the study of methods for programming
computers to learn.
 Building machines that automatically learn from experience.
 Enable computers to learn without being explicitly
programmed (Arthur Samuel, in 1959 at IBM)
What is Learning?

 Learning is gaining Knowledge from experience.
 A computer program is said to learn from experience E
concerning some tasks T and performance measure P, if its
performance at tasks T, as measured by P, improves with
experience E.
ML

Examples: i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within
images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given
classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an
error
• Training Experience: A sequence of images and steering
commands recorded while observing a human driver
ML

 It is tough to write programs that solve complex problems.
(Defining the requirements)
 Computing the probability of a credit card transaction is
fraudulent.
 Recognizing a three-dimensional object.
 We don’t know what program to write because we don’t know
how it is done. (tacit knowledge not explicit)
 Even if we had a good idea about how to do it, the program
might be complicated.
Why ML?

 There may not be rules that are both simple and reliable.
 We need to combine a huge number of weak rules.
 Maybe the rules are changing frequently (dynamic)?
 E.g.. Fraud is a moving target.
 The program needs to keep changing.
 Instead of writing a program by hand for each specific task,
we collect many examples that specify the correct output for
a given input.
 A machine learning algorithm then takes these examples and
produces a program that does the job.
Why ML?

 The program produced by the learning algorithm may look
very different from a typical hand-written program.
 If we do it right, the program works for new cases as well as
the ones we trained it on.
 If the data changes the program can change too by training
on the new data.
 Massive amounts of computation are now cheaper than
paying someone to write a task-specific program.
Why ML?

 Machine Learning is great for:
 Problems for which existing solutions require a lot of
hand-tuning or long lists of rules,
 Complex problems for which there is no good solution at
all using a traditional approach,
 Fluctuating environments: a ML system can adapt to new
data,
 Getting insights about complex problems and large
amounts of data.
 Machine Learning Approaches: the program is much shorter,
easier to maintain, and most likely more accurate.
Why ML?

The general structure of a learning system

Machine learning vs. “classic(Traditional)” programming

In general, ML algorithms can be classified into 3 types.
1. Supervised Learning
• Classification
• Regression/Prediction
2. Unsupervised Learning
• Clustering
• Dimensionality Reduction
3. Reinforcement Learning
Types of Machine Learning

 Supervised Learning is a machine learning technique that
uses a collection of paired input-output training samples to
learn about a system's input-output connection.
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations.
 New data is classified based on the training set
Supervised Learning

 Supervised learning: (function approximation) [labels data
well]
output <- input2
 Learn to predict an output when given an input vector.
E.g.: Features: age, gender, smoking, drinking, etc…
Labels: having the disease, does not have the disease
Supervised Learning

 Supervised Learning:
 Regression Problem: numerical / continuous value.
 Given some data, you assume that those values come
from some sort of function and try to find out what the
function is.
 It is a problem of function approximation or
interpolation.
 Classification Problem: nominal / discrete value.
 Grouping the data into predetermined classes.
Supervised Learning

 Example:
Supervised Learning

Classification
 predicts categorical class labels (discrete or
nominal)
 classifies data (constructs a model) based on
the training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data
Regression
 Regression is a type of Supervised Learning
task in which the output has a continuous value.
 The term regression is used when you try to find
the relationship between variables.
 It used to understand the relationship between
dependent and independent variables.
Supervised Learning

 Unsupervised machine learning is the process of inferring
underlying hidden patterns from historical data.
 Within such an approach, a machine learning model tries to
find any similarities, differences, patterns, and structure in
data by itself.
 used when the information used to train is neither classified
nor labeled.
 There is no complete and clean labelled dataset.
 No prior human intervention is needed.
Unsupervised Learning

 unsupervised learning aims to find clusters of
similar inputs in the data without being explicitly told that
some data points belong to one class and the other to other
classes.
 The algorithm has to discover this similarity by itself.
 Discover a good internal representation of the input.
 Unsupervised learning: extracting structure from
data.
 Example: segment grocery store shoppers into clusters
that exhibit similar behaviors.
 There is no “right answer”
 Clustering

Clustering
 Clustering automatically categorizes data into groups
according to similarity criteria
 It evaluates the similarity based on a metric like Euclidean
distance, Cosine similarity, Manhattan distance

Dimensionality reduction
 In many learning problems, the datasets have a large
number of variables.
 For example, such situations have arisen in many scientific
fields such as image processing, time series analysis,
internet search engines, and automatic text analysis among
others.
 Statistical and machine learning methods have some
difficulty when dealing with such high-dimensional data.
 Normally the number of input variables is reduced before
the machine learning algorithms can be successfully applied.

 In statistical and machine learning, dimension reduction is
the process of reducing the number of variables under
consideration by obtaining a smaller set of principal
variables
 It addresses the number of attributes of the dataset by
transforming it from its original representation to one with a
reduced set of features.
 goal is to obtain a new dataset that preserves, up to a level,
the original structure of the data, so its analysis will result
in the same or equivalent patterns present in the original

feature selection and feature extraction.
1.Feature selection
 Is interested in finding k of the total of n features that give us the most
information and we discard the other (n−k) dimensions.
 By only keeping the most relevant variables from the original dataset
2. Feature Extraction
 Transforming the space containing many dimensions into space with
fewer dimensions.
 Is used when we keep the whole information but use fewer resources
while processing the information.

Curse of Dimensionality
 Refers to the challenges that arise when working with high-
dimensional data.
 High-dimensional data is challenging to handle.
 More features increase model complexity and risk of
overfitting.
 Therefore, Overfitting leads to poor performance on new
data.

The main drawbacks of high-dimensional datasets are
 Increased data requirements: More records are needed to
represent all feature combinations.
 Overfitting risk: More features can lead to overly complex
models that fit to outliers.
 Longer training times: Higher dimensionality increases
computational complexity, slowing training.
 Higher storage needs: Larger datasets consume more
storage space.
Cont..

How can an agent learn behaviors when it doesn’t have a
Supervisor(teacher) to tell it how to perform?
 The agent has a task to perform
 It takes some actions in the world
 At some later point, it gets feedback telling it how well it
did on performing the task
 The agent performs the same task over and over again
This problem is called reinforcement learning:
 The agent gets positive reinforcement for tasks done well
 The agent gets negative reinforcement for tasks done
poorly
Reinforcement learning

 RL is Learning from interaction with an environment to
achieve some long-term goal that is related to the state of
the environment
 The goal is to get the agent to act in the world to maximize
its rewards
 The agent has to figure out what it did that made it get the
reward/punishment
 RL is Applicable to game-playing, robot controllers, others.

Components Definition
 State: The current situation of
the agent in the environment.
 Action: The decision made by
the agent that affects the state.
 Reward: The feedback given
by the environment based on
the agent’s action.
 Perception: how the agent
observes and interprets its
environment to construct the
state.

Model
 It is the Program trained to find patterns in data and make
predictions.
 How it does:
 Input Data: Receives data requests.
 Prediction: Analyzes input to make predictions.
 Output: Provides responses based on predictions.
 Training Process:
 Initial Training: Models are trained on a dataset.
 Learning: Algorithms reason over data, extract patterns,
and learn.
 Usage:
 Once trained, models predict outcomes on new, unseen
data.
Model Evaluation

 Model evaluation is the process of using different evaluation
metrics to understand a machine learning model’s
performance and its strengths and weaknesses.
 Evaluation is necessary for ensuring that machine learning
models are reliable, generalizable, and capable of making
accurate predictions on new, unseen data.
 Two biggest causes of poor performance of machine
learning algorithms:
 Overfitting and
 underfitting
Model Evaluation

Overfitting: occurs when a model performs very well for
training data but performs poorly with test data (new data).
 Overfitting can happen due to low bias and high
variance
Underfitting: Occurs when the model cannot adequately
capture the underlying structure of the data.
Right Fit: Occurs when both the training data error and the test
data are minimal
Model Evaluation

Confusion Matrix
 A confusion matrix is a table that is often used to describe
the performance of a classification model (or “classifier”)
on a set of test data for which the true values are known.
 we have 2 main situations. Predicted (Before), Actual Values
(After)
Model Evaluation Metrix

True Positive: predicted positive, and it’s true.
True Negative: predicted negative, and it’s true.
False Positive: predicted positive, and it’s false.
False Negative: predicted negative, and it’s false.

ROC curve
 it is a visual representation of model performance across all
thresholds.
 works by plotting the true positive rate (TPR) on the y-axis
and the false positive rate (FPR) on the x-axis of a graph
Area Under Curve (AUC)
 measures the overall performance of the binary classification
model.
 As both TPR and FPR range between 0 to 1, the area will
always lie between 0 and 1, and a greater value of AUC
denotes better model performance

 the main goal is to maximize this area to have the highest
TPR and lowest FPR at the given threshold.

Reading Assignment
 Mean Absolute Error
 Mean Squared Error
 Root Mean Square Error

 It is the process of deciding which algorithm and model
architecture is best suited for a particular task or dataset.
 The first step in this process is to define a suitable
evaluation metric that matches the objectives of the
particular situation.
 To make wise selections, it frequently calls for an iterative
process that involves testing several models and
hyperparameters.
 Finding a model that fits the training set of data well and
generalizes well to new data is the objective.
Model Selection

Train-Test Split
 With this strategy, the available data is divided into two sets:
 a training set &
 a separate test set.
 The models are evaluated using a predetermined evaluation
metric on the test set after being trained on the training set.
Cross-Validation
 divides the data into various groups or folds.
 Several folds are used as the test set & the rest folds as the
training set, and the models undergo training and
evaluation on each fold separately.
Model Selection Techniques

 The main purpose of cross-validation is to prevent overfitting.
 By evaluating the model on multiple validation sets, cross-
validation provides a more realistic estimate of the model’s
generalization performance.
 Frequently used types of cross-validation:
 K-fold cross_validation
 Stratified cross-validation

K-fold Cross-validation
 we split the dataset into k number of subsets (folds) then we
perform training on all the subsets but leave one(k-1) subset
to evaluate the trained model.
 In this method, we iterate k times with a different subset
reserved for testing purposes each time.

Stratified Cross-validation
 Used to ensure that each fold of the cross-validation process
maintains the same class distribution as the entire dataset.
 This is particularly important when dealing with imbalanced
datasets, where certain classes may be underrepresented.
 In this method,
 The dataset is divided into k folds while maintaining the
proportion of classes in each fold

Hyperparameters
 they are parameters whose values control the learning
process and determine the values of model parameters that
a learning algorithm ends up learning.
 The prefix ‘hyper_’ suggests that they are ‘top-level’
parameters that control the learning process and the model
parameters that result from it.
 hyperparameters are said to be external to the model
because the model cannot change its values during
learning/training.

Hyperparameters
 they are parameters whose values control the learning
process and determine the values of model parameters that
a learning algorithm ends up learning.
 The prefix ‘hyper_’ suggests that they are ‘top-level’
parameters that control the learning process and the model
parameters that result from it.
 They are said to be external to the model because the
model cannot change its values during learning/training.
 They are used by the learning algorithm when it is learning
but they are not part of the resulting model.

Some common examples of Hyperparameters:
 Learning rate in optimization algorithms (e.g. gradient
descent)
 Optimization algorithm (e.g., gradient descent, stochastic
gradient descent, or Adam optimizer)
 Activation function in a neural network (nn) layer (e.g.
Sigmoid, ReLU, Tanh)
 loss function
 Number of hidden layers in a nn
 Number of neurons in each layer
 Drop-out rate in nn
 Number of iterations (epochs) in training a nn
 Number of clusters in a clustering task
 Kernel or filter size in convolutional layers
 Pooling size
 Batch size

Hyperparameters Tuning
 It is the process of selecting the optimal values for a
machine learning model’s hyperparameters.
 Models can have many hyperparameters and finding the
best combination of parameters can be treated as a search
problem.
 The two best strategies for Hyperparameter tuning are:
 Grid Search
 Randomized Search
 Bayesian Optimization

Hyperparameters Tuning example

 Select one ML/DL algorithm and discuss its:
 Overfitting handling techniques
 Hyperparameters and the relation they have
due date: 08/10/2024
presentation

chapter Three artificial intelligence 1.pptx

More Related Content

Similar to chapter Three artificial intelligence 1.pptx

More from gadisaadamu101

Recently uploaded

chapter Three artificial intelligence 1.pptx