SlideShare a Scribd company logo
Unit - 3
Artificial Neural Networks
Introduction
► Artificial Neural Networks (ANN) are algorithms based on brain function and are used
to model complicated patterns and forecast issues. The Artificial Neural Network
(ANN) is a deep learning method that arose from the concept of the human brain
Biological Neural Networks.
Introduction
► An artificial neural network (ANN) is a computational model inspired by the structure
and function of the human brain's neural networks. It consists of interconnected nodes,
called neurons or units, organized in layers. Each neuron receives input signals,
processes them through an activation function, and produces an output signal. ANNs are
a fundamental component of artificial intelligence (AI) and are used in various
applications such as image recognition, natural language processing, and predictive
analytics.
► Example:-
► Let's consider a simple neural network designed to classify images of handwritten digits
(0-9) into their respective categories.
1. Input Layer: The input layer consists of neurons representing the features of the input
data. In our example, each neuron corresponds to a pixel in the image of a handwritten
digit. If we're using grayscale images with dimensions of, say, 28x28 pixels, there would be
784 neurons (28x28) in the input layer.
Introduction
2. Hidden Layers: Between the input and output layers, there can be one or more hidden
layers. Each hidden layer contains neurons that perform computations on the input data.
These layers extract features and patterns from the input data through a series of weighted
connections and activation functions. The number of neurons and layers in the hidden
layers is determined based on the complexity of the problem.
3.Output Layer: The output layer produces the network's predictions or classifications. In
our example, it typically consists of 10 neurons, each representing one digit (0-9). The
neuron with the highest output value indicates the predicted digit.
4.Weights and Bias: Each connection between neurons in adjacent layers has associated
weights and a bias. These parameters are adjusted during the training process to minimize
the difference between the network's predictions and the actual labels of the training data.
5.Activation Function: Each neuron applies an activation function to the weighted sum of
its inputs. Common activation functions include sigmoid, tanh, ReLU (Rectified Linear
Unit), and softmax. Activation functions introduce non-linearity into the network, allowing
it to learn complex patterns.
Example of how the network learns:
► Training: Initially, the network's weights and biases are randomly initialized. Then, it's trained on
a dataset of labeled images (e.g., the MNIST dataset). During training, the network adjusts its
parameters using optimization algorithms like gradient descent to minimize the error (the
difference between predicted and actual labels).
► Forward Propagation: In the forward pass, input data is fed into the network, and computations
are performed layer by layer until the output is generated.
► Backpropagation: After forward propagation, the error is calculated based on the network's output
and the true labels. Then, through backpropagation, this error is propagated backward through the
network, and the weights and biases are adjusted accordingly using gradient descent.
► Testing: Once trained, the network is tested on a separate dataset to evaluate its performance. It
can classify new, unseen images of handwritten digits into their respective categories based on the
learned patterns.
Through this iterative process of training and adjustment, the neural network learns to recognize
patterns and make accurate predictions, demonstrating one of the fundamental capabilities of artificial
intelligence.
Activation Function:-
Activation functions are mathematical functions applied to the output of each neuron in a neural
network. They introduce non-linearity to the network, enabling it to learn and approximate
complex relationships in the data.
1. Sigmoid Function (Logistic Function):-
Activation Function:-
2.Hyperbolic Tangent Function (Tanh):
Activation Function:-
Optimization Algorithm:- Gradient
Descent
► Gradient descent is a popular optimization algorithm used in artificial intelligence,
especially in machine learning. It's used to minimize the loss function, which represents
the error between the predicted and actual values in a model. Here's a simplified
explanation with an example:
► 1.Initialization: Start with an initial guess for the parameters of the model.
► 2. Compute Gradient: Calculate the gradient of the loss function with respect to each
parameter. The gradient points in the direction of steepest ascent, so to minimize the loss,
we move in the opposite direction.
► 3. Update Parameters: Adjust the parameters in the direction opposite to the gradient,
scaled by a learning rate, which determines the size of the steps taken during
optimization.
► 4. Repeat: Continue steps 2 and 3 until convergence, which is typically determined by
either reaching a predefined number of iterations or when the improvement in the loss
function becomes negligible.
Optimization Algorithm:- Gradient
Descent
► Example:
► Let's say we have a simple linear regression problem where we want to predict house prices based
on the size of the house. We have some data points with house sizes and their corresponding prices.
► model: ( y = mx + b ).
► Loss Function:- MSE
► 1. Initialization: Start with random values for ( m ) (slope) and ( b ) (intercept).
► 2. Compute Gradient: Calculate the gradient of the MSE loss function with respect to ( m ) and ( b ).
This involves partial derivatives.
► 3. Update Parameters: Adjust ( m ) and ( b ) in the opposite direction of the gradient, scaled by a
learning rate.
► 4. Repeat: Iterate steps 2 and 3 until convergence.
Networks:- Perceptron
► A perceptron is a type of artificial neural network (ANN) model that is often used for binary
classification tasks. It consists of a single layer of input nodes (neurons) connected directly to
an output node, without any hidden layers. Each input node is associated with a weight, and the
output node combines the weighted inputs and applies an activation function to produce the
output.
• simple example to illustrate how a perceptron works: Let's say we have a perceptron that we
want to train to classify whether a fruit is an apple or not based on two features: sweetness and
roundness.
• The perceptron will take these two features as inputs.
• 1.Initialization: Initially, the weights of the input features are set randomly or initialized to some
predefined values.
• 2. Training: During the training phase, the perceptron is presented with training examples, each
consisting of input features and their corresponding labels (e.g., (sweetness, roundness) →
apple or not apple)
Networks:- Perceptron
3. Prediction: For each training example, the perceptron computes the weighted sum of the input
features, applies an activation function (e.g., step function or sigmoid function), and produces an
output (either 0 or 1).
4 Error Calculation: The output is compared to the actual label, and the error (the difference
between the predicted output and the true label) is calculated.
5. Weight Update: The weights of the input features are adjusted based on the error, using a
learning algorithm such as the perceptron learning rule or gradient descent. The goal is to
minimize the error over the training examples.
6.Iteration: Steps 3-5 are repeated iteratively over the training dataset until the perceptron achieves
satisfactory performance (e.g., accurately classifies most examples).
Once trained, the perceptron can classify new fruits based on their sweetness and roundness by
computing the weighted sum of the input features and applying the learned weights and activation
function. It's important to note that perceptrons are limited to linearly separable problems,
meaning they can only learn to classify data that is linearly separable. For more complex tasks,
multi-layer perceptrons (MLPs) with hidden layers are used.
27 March 2024
Multi-layer Perceptron neural architecture
• In a typical MLP network, the input units (Xi
) are fully connected to all
hidden layer units (Yj
) and the hidden layer units are fully connected to all
output layer units (Zk
)
• Each of the connections between the
input to hidden and hidden to output
layer units has an associated weight
attached to it (Wij
or Wjk
)
• The hidden and output layer units also
derive their bias values (bj
or bk
) from
weighted connections to units whose
outputs are always 1 (true neurons)
27 March 2024
MLP training algorithm
A Multi-Layer Perceptron (MLP) neural network trained using the
Backpropagation learning algorithm is one of the most powerful forms of
supervised neural network system.
The training of such a network involves three stages:
• feedforward of the input training pattern,
• calculation and backpropagation of the associated error
• adjustment of the weights
This procedure is repeated for each pattern over several complete passes
(epochs) through the training set.
After training, application of the net only involves the computations of the
feedforward phase.
27 March 2024
Backpropagation Learning Algorithm
Feed Forward phase:
• Xi
= input[i]
• Yj
= f( bj
+ ∑Xi
Wij
)
• Zk
= f( bk
+ ∑Yj
Wjk
)
Backpropagation of errors:
• δk
= Zk
[1 - Zk
](dk
- Zk
)
• δj
= Yj
[1 - Yj
] ∑ δk
Wjk
Weight updating:
• Wjk
(t+1) = Wjk
(t) + ηδk
Yj
+ α[Wjk
(t) - Wjk
(t - 1)]
• bk
(t+1) = bk
(t) + ηδk
Ytn
+ α[bk
(t) - bk
(t - 1)]
• Wij
(t+1) = Wij
(t) + ηδj
Xi
+ α[Wij
(t) - Wij
(t - 1)]
• bj
(t+1) = bj
(t) + ηδj
Xtn
+ α[bj
(t) - bj
(t - 1)]
27 March 2024
Test stopping condition
After each epoch of training the Root Mean Square error of the network for all of
the patterns in a separate validation set is calculated.
ERMS
= ∑ ∑(dk
- Zk
)2
n.k
• n is the number of patterns in the set
• k is the number of neuron units in the output layer
Training is terminated when the ERMS
value for the validation set either starts to
increase or remains constant over several epochs.
This prevents the network from being over trained (i.e. memorising the training
set) and ensures that the ability of the network to generalise (i.e. correctly classify
non-trained patterns) will be at its maximum.
27 March 2024
Factors affecting network performance
Number of hidden nodes:
• Too many and the network may memorise training set
• Too few and the network may not learn the training set
Initial weight set:
• some starting weight sets may lead to a local minimum
• other starting weight sets avoid the local minimum.
Training set:
• must be statistically relevant
• patterns should be presented in random order
Date representation:
• Low level - very large training set might be required
• High level – human expertise required
27 March 2024
MLP as classifiers
MLP classifiers are used in a wide range of domains from engineering to
medical diagnosis. A classic example of use is as an Optical Character
Recogniser.
Simple example would be a 35-8-26
mlp network. This network could learn
to map input patterns, corresponding
to the 5x7 matrix representations of
the capital letters A - Z, to 1 of 26
output patterns.
After training, this network then classifies ‘noisy’ input patterns to the
correct output pattern that the network was trained to produce.
Adaline neural network
• The Adaptive Linear Neuron, abbreviated as Adaline, is one of the fundamental artificial neural
networks (ANNs) used in machine learning and artificial intelligence. It was introduced by
Bernard Widrow and his graduate student Ted Hoff in 1960.
• Adaline is closely related to the perceptron, another early neural network model. In fact, Adaline
can be seen as a single-layer neural network, similar to the perceptron, with a linear activation
function. However, unlike the perceptron, Adaline's output is not binary; instead, it outputs a
continuous value. This makes it suitable for regression tasks rather than just classification.
Adaline neural network :- basic Architecture
• The basic architecture of Adaline consists of:
Input layer: Nodes representing input features.
Weights: Each input feature is associated with a weight, which is adjusted during training to
minimize the error.
Summation unit: Calculates the weighted sum of the input features.
Activation function: Typically a linear activation function, although sometimes other activation
functions may be used.
Output: The output of the activation function serves as the output of the Adaline network.
Adaline neural network :- Training
Training an Adaline network typically involves a process called the Widrow-Hoff learning
rule or the delta rule, which is a form of gradient descent. The goal of training is to adjust
the weights to minimize the difference between the predicted output and the true output
(i.e., the error). This is achieved by iteratively updating the weights in the direction that
reduces the error.
Adaline has been used in various applications, including pattern recognition, signal
processing, and prediction tasks. However, its simplicity and linear nature limit its
applicability to problems that are linearly separable or can be adequately approximated by
linear models.
While Adaline has been surpassed by more complex and powerful neural network
architectures such as multilayer perceptrons (MLPs) and deep learning models, it remains an
important milestone in the history of artificial neural networks and serves as a foundational
concept in machine learning and artificial intelligence.
Adaline neural network :- Widrow-Hoff
learning
The Widrow-Hoff learning rule, also known as the delta rule or the LMS (Least Mean
Squares) algorithm, is the primary learning algorithm used to train the Adaline (Adaptive
Linear Neuron) neural network. The goal of the learning process is to adjust the weights
of the network in such a way that the output closely matches the desired target output
for a given input. Here's an overview of how the Widrow-Hoff learning rule works with
Adaline.
1. Initialization.
2. Forward Propagation.
3. Activation.
4. Error Calculation.
5. Weight Update.
6. Iterative learning.
Adaline neural network :- Widrow-Hoff
learning
Adaline neural network :- Widrow-Hoff
learning
Backpropagation Algorithm:- introduction &
Training Procedure
Backpropagation is a fundamental algorithm used for training artificial neural networks,
particularly multilayer perceptrons(MLPs) and deep neural networks (DNNs). It is a supervised
learning algorithm that adjusts the weights of the network to minimize the difference between
the predicted output and the actual target output. Here's an overview of how backpropagation
works:
1.Forward Pass:
• Input data is fed into the neural network, and computations are performed layer by layer to
generate an output.
• Each layer computes a weighted sum of its inputs, applies an activation function to the sum,
and passes the result to the next layer.
2. Compute Error:
• Once the output is generated, the error between the predicted output and the actual target
output is computed using a loss function.
• Common loss functions include mean squared error (MSE) for regression problems and
categorical cross-entropy for classification problems.
Backpropagation Algorithm:-
introduction & Training Procedure
3.Backward Pass (Backpropagation):
• Backpropagation involves propagating the error backward through the network to update the
weights.
• Starting from the output layer, the gradient of the loss function with respect to the weights
and biases of each layer is computed.
This is done using the chain rule of calculus, which allows for the computation of gradients
layer by layer.
4. Weight Update:
• Once the gradients are computed, the weights and biases of each layer are updated in the
opposite direction of the gradient to minimize the loss function.
• The update rule typically involves subtracting a fraction of the gradient from the current
weights, scaled by a learning rate hyper parameter.
• The learning rate controls the step size of the weight updates and is crucial for the
convergence and stability of the training process.
Backpropagation Algorithm:-
introduction & Training Procedure
5.Iterative Training:
Steps 1-4 are repeated iteratively for multiple epochs (passes through the entire dataset) until
the network converges or until a stopping criterion is met.
During each epoch, the network sees the entire dataset in batches or as individual samples,
depending on the training strategy (e.g., mini-batch gradient descent, stochastic gradient
descent).
Backpropagation enables neural networks to learn complex patterns and relationships in data by
iteratively adjusting their weights to minimize prediction errors. It has been instrumental in the
success of deep learning, allowing for the training of neural networks with many layers, which
are capable of solving a wide range of tasks across various domains, including image
recognition, natural language processing, and speech recognition.
Tuning the Network Size
Tuning the network size in an artificial neural network (ANN) refers to adjusting the architecture
of the network, including the number of layers and the number of neurons in each layer, to
achieve optimal performance for a specific task. This process involves finding the right balance
between model complexity and generalization ability.
key considerations and steps involved in tuning the network size:-
1.Start with a Baseline Model: Begin by constructing a baseline ANN architecture with a reasonable
number of layers and neurons. This initial model serves as a reference point for comparison when
evaluating the performance of subsequent models.
2.Understand the Problem Complexity: Consider the complexity of the problem you are trying to
solve. Complex tasks, such as image recognition or natural language processing, may require larger and
more complex networks to capture intricate patterns and relationships in the data.
3Avoid Overfitting: Overfitting occurs when the model learns to memorize the training data instead of
generalizing to unseen data. Increasing the network size can exacerbate overfitting, especially when
dealing with limited training data. Regularization techniques, such as dropout and weight decay, can
help mitigate overfitting by introducing constraints on the model parameters.
Tuning the Network Size
4.Evaluate Performance: Train the baseline model and evaluate its performance on a validation
dataset. Common metrics for evaluation include accuracy, precision, recall, F1-score, and mean
squared error, depending on the nature of the task (classification or regression).
5.Experiment with Network Size: Systematically vary the network size by adjusting the
number of layers and neurons in each layer. Explore different configurations, including shallow
vs. deep networks, wide vs. narrow networks, and the number of hidden units in each layer.
6.Monitor Training and Validation Performance: During training, monitor both training and
validation performance to detect signs of overfitting or underfitting. Overfitting typically
manifests as a large gap between training and validation performance, whereas underfitting
indicates that the model is too simple to capture the underlying patterns in the data.
7.Use Cross-Validation: Employ techniques like k-fold cross-validation to assess the
generalization performance of different network sizes more reliably. Cross-validation involves
partitioning the dataset into multiple subsets, training the model on different subsets, and
evaluating its performance on the remaining subset.
Tuning the Network Size
8.Select the Optimal Network Size: Choose the network size that achieves the best balance
between performance and generalization ability based on the evaluation metrics. It's essential to
strike a balance between model complexity and simplicity, ensuring that the selected architecture
can generalize well to unseen data.
9.Fine-Tuning: Once the optimal network size is determined, fine-tune other hyper parameters,
such as learning rate, batch size, and activation functions, to further optimize the model's
performance.
10.Test the Final Model: Assess the final model's performance on a separate test dataset that
was not used during training or validation. This step provides an unbiased estimate of the model's
generalization ability in real-world scenarios.
By systematically tune the size of network we can develop an optimal ANN.

More Related Content

Similar to Artificial Neural Network for machine learning

Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNNaren Chandra Kattla
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
ShwethaShreeS
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkNaren Chandra Kattla
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
NaveenBhajantri1
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
MrHacker61
 
Acem neuralnetworks
Acem neuralnetworksAcem neuralnetworks
Acem neuralnetworks
Aastha Kohli
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
gnans Kgnanshek
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
qwerty432737
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
vijaym148
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
ijcses
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
theijes
 
SVM & MLP on Matlab program
 SVM & MLP on Matlab program  SVM & MLP on Matlab program
SVM & MLP on Matlab program
Hussain Ala'a Alkabi
 
Perceptron Study Material with XOR example
Perceptron Study Material with XOR examplePerceptron Study Material with XOR example
Perceptron Study Material with XOR example
GSURESHKUMAR11
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
KIRAN R
 

Similar to Artificial Neural Network for machine learning (20)

Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANN
 
N ns 1
N ns 1N ns 1
N ns 1
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Ann
Ann Ann
Ann
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Acem neuralnetworks
Acem neuralnetworksAcem neuralnetworks
Acem neuralnetworks
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
SVM & MLP on Matlab program
 SVM & MLP on Matlab program  SVM & MLP on Matlab program
SVM & MLP on Matlab program
 
Perceptron Study Material with XOR example
Perceptron Study Material with XOR examplePerceptron Study Material with XOR example
Perceptron Study Material with XOR example
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 

Recently uploaded

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
parmarsneha2
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Denish Jangid
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
Special education needs
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 

Artificial Neural Network for machine learning

  • 1. Unit - 3 Artificial Neural Networks
  • 2. Introduction ► Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model complicated patterns and forecast issues. The Artificial Neural Network (ANN) is a deep learning method that arose from the concept of the human brain Biological Neural Networks.
  • 3. Introduction ► An artificial neural network (ANN) is a computational model inspired by the structure and function of the human brain's neural networks. It consists of interconnected nodes, called neurons or units, organized in layers. Each neuron receives input signals, processes them through an activation function, and produces an output signal. ANNs are a fundamental component of artificial intelligence (AI) and are used in various applications such as image recognition, natural language processing, and predictive analytics. ► Example:- ► Let's consider a simple neural network designed to classify images of handwritten digits (0-9) into their respective categories. 1. Input Layer: The input layer consists of neurons representing the features of the input data. In our example, each neuron corresponds to a pixel in the image of a handwritten digit. If we're using grayscale images with dimensions of, say, 28x28 pixels, there would be 784 neurons (28x28) in the input layer.
  • 4. Introduction 2. Hidden Layers: Between the input and output layers, there can be one or more hidden layers. Each hidden layer contains neurons that perform computations on the input data. These layers extract features and patterns from the input data through a series of weighted connections and activation functions. The number of neurons and layers in the hidden layers is determined based on the complexity of the problem. 3.Output Layer: The output layer produces the network's predictions or classifications. In our example, it typically consists of 10 neurons, each representing one digit (0-9). The neuron with the highest output value indicates the predicted digit. 4.Weights and Bias: Each connection between neurons in adjacent layers has associated weights and a bias. These parameters are adjusted during the training process to minimize the difference between the network's predictions and the actual labels of the training data. 5.Activation Function: Each neuron applies an activation function to the weighted sum of its inputs. Common activation functions include sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax. Activation functions introduce non-linearity into the network, allowing it to learn complex patterns.
  • 5. Example of how the network learns: ► Training: Initially, the network's weights and biases are randomly initialized. Then, it's trained on a dataset of labeled images (e.g., the MNIST dataset). During training, the network adjusts its parameters using optimization algorithms like gradient descent to minimize the error (the difference between predicted and actual labels). ► Forward Propagation: In the forward pass, input data is fed into the network, and computations are performed layer by layer until the output is generated. ► Backpropagation: After forward propagation, the error is calculated based on the network's output and the true labels. Then, through backpropagation, this error is propagated backward through the network, and the weights and biases are adjusted accordingly using gradient descent. ► Testing: Once trained, the network is tested on a separate dataset to evaluate its performance. It can classify new, unseen images of handwritten digits into their respective categories based on the learned patterns. Through this iterative process of training and adjustment, the neural network learns to recognize patterns and make accurate predictions, demonstrating one of the fundamental capabilities of artificial intelligence.
  • 6. Activation Function:- Activation functions are mathematical functions applied to the output of each neuron in a neural network. They introduce non-linearity to the network, enabling it to learn and approximate complex relationships in the data. 1. Sigmoid Function (Logistic Function):-
  • 9. Optimization Algorithm:- Gradient Descent ► Gradient descent is a popular optimization algorithm used in artificial intelligence, especially in machine learning. It's used to minimize the loss function, which represents the error between the predicted and actual values in a model. Here's a simplified explanation with an example: ► 1.Initialization: Start with an initial guess for the parameters of the model. ► 2. Compute Gradient: Calculate the gradient of the loss function with respect to each parameter. The gradient points in the direction of steepest ascent, so to minimize the loss, we move in the opposite direction. ► 3. Update Parameters: Adjust the parameters in the direction opposite to the gradient, scaled by a learning rate, which determines the size of the steps taken during optimization. ► 4. Repeat: Continue steps 2 and 3 until convergence, which is typically determined by either reaching a predefined number of iterations or when the improvement in the loss function becomes negligible.
  • 10. Optimization Algorithm:- Gradient Descent ► Example: ► Let's say we have a simple linear regression problem where we want to predict house prices based on the size of the house. We have some data points with house sizes and their corresponding prices. ► model: ( y = mx + b ). ► Loss Function:- MSE ► 1. Initialization: Start with random values for ( m ) (slope) and ( b ) (intercept). ► 2. Compute Gradient: Calculate the gradient of the MSE loss function with respect to ( m ) and ( b ). This involves partial derivatives. ► 3. Update Parameters: Adjust ( m ) and ( b ) in the opposite direction of the gradient, scaled by a learning rate. ► 4. Repeat: Iterate steps 2 and 3 until convergence.
  • 11. Networks:- Perceptron ► A perceptron is a type of artificial neural network (ANN) model that is often used for binary classification tasks. It consists of a single layer of input nodes (neurons) connected directly to an output node, without any hidden layers. Each input node is associated with a weight, and the output node combines the weighted inputs and applies an activation function to produce the output. • simple example to illustrate how a perceptron works: Let's say we have a perceptron that we want to train to classify whether a fruit is an apple or not based on two features: sweetness and roundness. • The perceptron will take these two features as inputs. • 1.Initialization: Initially, the weights of the input features are set randomly or initialized to some predefined values. • 2. Training: During the training phase, the perceptron is presented with training examples, each consisting of input features and their corresponding labels (e.g., (sweetness, roundness) → apple or not apple)
  • 12. Networks:- Perceptron 3. Prediction: For each training example, the perceptron computes the weighted sum of the input features, applies an activation function (e.g., step function or sigmoid function), and produces an output (either 0 or 1). 4 Error Calculation: The output is compared to the actual label, and the error (the difference between the predicted output and the true label) is calculated. 5. Weight Update: The weights of the input features are adjusted based on the error, using a learning algorithm such as the perceptron learning rule or gradient descent. The goal is to minimize the error over the training examples. 6.Iteration: Steps 3-5 are repeated iteratively over the training dataset until the perceptron achieves satisfactory performance (e.g., accurately classifies most examples). Once trained, the perceptron can classify new fruits based on their sweetness and roundness by computing the weighted sum of the input features and applying the learned weights and activation function. It's important to note that perceptrons are limited to linearly separable problems, meaning they can only learn to classify data that is linearly separable. For more complex tasks, multi-layer perceptrons (MLPs) with hidden layers are used.
  • 13. 27 March 2024 Multi-layer Perceptron neural architecture • In a typical MLP network, the input units (Xi ) are fully connected to all hidden layer units (Yj ) and the hidden layer units are fully connected to all output layer units (Zk ) • Each of the connections between the input to hidden and hidden to output layer units has an associated weight attached to it (Wij or Wjk ) • The hidden and output layer units also derive their bias values (bj or bk ) from weighted connections to units whose outputs are always 1 (true neurons)
  • 14. 27 March 2024 MLP training algorithm A Multi-Layer Perceptron (MLP) neural network trained using the Backpropagation learning algorithm is one of the most powerful forms of supervised neural network system. The training of such a network involves three stages: • feedforward of the input training pattern, • calculation and backpropagation of the associated error • adjustment of the weights This procedure is repeated for each pattern over several complete passes (epochs) through the training set. After training, application of the net only involves the computations of the feedforward phase.
  • 15. 27 March 2024 Backpropagation Learning Algorithm Feed Forward phase: • Xi = input[i] • Yj = f( bj + ∑Xi Wij ) • Zk = f( bk + ∑Yj Wjk ) Backpropagation of errors: • δk = Zk [1 - Zk ](dk - Zk ) • δj = Yj [1 - Yj ] ∑ δk Wjk Weight updating: • Wjk (t+1) = Wjk (t) + ηδk Yj + α[Wjk (t) - Wjk (t - 1)] • bk (t+1) = bk (t) + ηδk Ytn + α[bk (t) - bk (t - 1)] • Wij (t+1) = Wij (t) + ηδj Xi + α[Wij (t) - Wij (t - 1)] • bj (t+1) = bj (t) + ηδj Xtn + α[bj (t) - bj (t - 1)]
  • 16. 27 March 2024 Test stopping condition After each epoch of training the Root Mean Square error of the network for all of the patterns in a separate validation set is calculated. ERMS = ∑ ∑(dk - Zk )2 n.k • n is the number of patterns in the set • k is the number of neuron units in the output layer Training is terminated when the ERMS value for the validation set either starts to increase or remains constant over several epochs. This prevents the network from being over trained (i.e. memorising the training set) and ensures that the ability of the network to generalise (i.e. correctly classify non-trained patterns) will be at its maximum.
  • 17. 27 March 2024 Factors affecting network performance Number of hidden nodes: • Too many and the network may memorise training set • Too few and the network may not learn the training set Initial weight set: • some starting weight sets may lead to a local minimum • other starting weight sets avoid the local minimum. Training set: • must be statistically relevant • patterns should be presented in random order Date representation: • Low level - very large training set might be required • High level – human expertise required
  • 18. 27 March 2024 MLP as classifiers MLP classifiers are used in a wide range of domains from engineering to medical diagnosis. A classic example of use is as an Optical Character Recogniser. Simple example would be a 35-8-26 mlp network. This network could learn to map input patterns, corresponding to the 5x7 matrix representations of the capital letters A - Z, to 1 of 26 output patterns. After training, this network then classifies ‘noisy’ input patterns to the correct output pattern that the network was trained to produce.
  • 19. Adaline neural network • The Adaptive Linear Neuron, abbreviated as Adaline, is one of the fundamental artificial neural networks (ANNs) used in machine learning and artificial intelligence. It was introduced by Bernard Widrow and his graduate student Ted Hoff in 1960. • Adaline is closely related to the perceptron, another early neural network model. In fact, Adaline can be seen as a single-layer neural network, similar to the perceptron, with a linear activation function. However, unlike the perceptron, Adaline's output is not binary; instead, it outputs a continuous value. This makes it suitable for regression tasks rather than just classification.
  • 20. Adaline neural network :- basic Architecture • The basic architecture of Adaline consists of: Input layer: Nodes representing input features. Weights: Each input feature is associated with a weight, which is adjusted during training to minimize the error. Summation unit: Calculates the weighted sum of the input features. Activation function: Typically a linear activation function, although sometimes other activation functions may be used. Output: The output of the activation function serves as the output of the Adaline network.
  • 21. Adaline neural network :- Training Training an Adaline network typically involves a process called the Widrow-Hoff learning rule or the delta rule, which is a form of gradient descent. The goal of training is to adjust the weights to minimize the difference between the predicted output and the true output (i.e., the error). This is achieved by iteratively updating the weights in the direction that reduces the error. Adaline has been used in various applications, including pattern recognition, signal processing, and prediction tasks. However, its simplicity and linear nature limit its applicability to problems that are linearly separable or can be adequately approximated by linear models. While Adaline has been surpassed by more complex and powerful neural network architectures such as multilayer perceptrons (MLPs) and deep learning models, it remains an important milestone in the history of artificial neural networks and serves as a foundational concept in machine learning and artificial intelligence.
  • 22. Adaline neural network :- Widrow-Hoff learning The Widrow-Hoff learning rule, also known as the delta rule or the LMS (Least Mean Squares) algorithm, is the primary learning algorithm used to train the Adaline (Adaptive Linear Neuron) neural network. The goal of the learning process is to adjust the weights of the network in such a way that the output closely matches the desired target output for a given input. Here's an overview of how the Widrow-Hoff learning rule works with Adaline. 1. Initialization. 2. Forward Propagation. 3. Activation. 4. Error Calculation. 5. Weight Update. 6. Iterative learning.
  • 23. Adaline neural network :- Widrow-Hoff learning
  • 24. Adaline neural network :- Widrow-Hoff learning
  • 25. Backpropagation Algorithm:- introduction & Training Procedure Backpropagation is a fundamental algorithm used for training artificial neural networks, particularly multilayer perceptrons(MLPs) and deep neural networks (DNNs). It is a supervised learning algorithm that adjusts the weights of the network to minimize the difference between the predicted output and the actual target output. Here's an overview of how backpropagation works: 1.Forward Pass: • Input data is fed into the neural network, and computations are performed layer by layer to generate an output. • Each layer computes a weighted sum of its inputs, applies an activation function to the sum, and passes the result to the next layer. 2. Compute Error: • Once the output is generated, the error between the predicted output and the actual target output is computed using a loss function. • Common loss functions include mean squared error (MSE) for regression problems and categorical cross-entropy for classification problems.
  • 26. Backpropagation Algorithm:- introduction & Training Procedure 3.Backward Pass (Backpropagation): • Backpropagation involves propagating the error backward through the network to update the weights. • Starting from the output layer, the gradient of the loss function with respect to the weights and biases of each layer is computed. This is done using the chain rule of calculus, which allows for the computation of gradients layer by layer. 4. Weight Update: • Once the gradients are computed, the weights and biases of each layer are updated in the opposite direction of the gradient to minimize the loss function. • The update rule typically involves subtracting a fraction of the gradient from the current weights, scaled by a learning rate hyper parameter. • The learning rate controls the step size of the weight updates and is crucial for the convergence and stability of the training process.
  • 27. Backpropagation Algorithm:- introduction & Training Procedure 5.Iterative Training: Steps 1-4 are repeated iteratively for multiple epochs (passes through the entire dataset) until the network converges or until a stopping criterion is met. During each epoch, the network sees the entire dataset in batches or as individual samples, depending on the training strategy (e.g., mini-batch gradient descent, stochastic gradient descent). Backpropagation enables neural networks to learn complex patterns and relationships in data by iteratively adjusting their weights to minimize prediction errors. It has been instrumental in the success of deep learning, allowing for the training of neural networks with many layers, which are capable of solving a wide range of tasks across various domains, including image recognition, natural language processing, and speech recognition.
  • 28. Tuning the Network Size Tuning the network size in an artificial neural network (ANN) refers to adjusting the architecture of the network, including the number of layers and the number of neurons in each layer, to achieve optimal performance for a specific task. This process involves finding the right balance between model complexity and generalization ability. key considerations and steps involved in tuning the network size:- 1.Start with a Baseline Model: Begin by constructing a baseline ANN architecture with a reasonable number of layers and neurons. This initial model serves as a reference point for comparison when evaluating the performance of subsequent models. 2.Understand the Problem Complexity: Consider the complexity of the problem you are trying to solve. Complex tasks, such as image recognition or natural language processing, may require larger and more complex networks to capture intricate patterns and relationships in the data. 3Avoid Overfitting: Overfitting occurs when the model learns to memorize the training data instead of generalizing to unseen data. Increasing the network size can exacerbate overfitting, especially when dealing with limited training data. Regularization techniques, such as dropout and weight decay, can help mitigate overfitting by introducing constraints on the model parameters.
  • 29. Tuning the Network Size 4.Evaluate Performance: Train the baseline model and evaluate its performance on a validation dataset. Common metrics for evaluation include accuracy, precision, recall, F1-score, and mean squared error, depending on the nature of the task (classification or regression). 5.Experiment with Network Size: Systematically vary the network size by adjusting the number of layers and neurons in each layer. Explore different configurations, including shallow vs. deep networks, wide vs. narrow networks, and the number of hidden units in each layer. 6.Monitor Training and Validation Performance: During training, monitor both training and validation performance to detect signs of overfitting or underfitting. Overfitting typically manifests as a large gap between training and validation performance, whereas underfitting indicates that the model is too simple to capture the underlying patterns in the data. 7.Use Cross-Validation: Employ techniques like k-fold cross-validation to assess the generalization performance of different network sizes more reliably. Cross-validation involves partitioning the dataset into multiple subsets, training the model on different subsets, and evaluating its performance on the remaining subset.
  • 30. Tuning the Network Size 8.Select the Optimal Network Size: Choose the network size that achieves the best balance between performance and generalization ability based on the evaluation metrics. It's essential to strike a balance between model complexity and simplicity, ensuring that the selected architecture can generalize well to unseen data. 9.Fine-Tuning: Once the optimal network size is determined, fine-tune other hyper parameters, such as learning rate, batch size, and activation functions, to further optimize the model's performance. 10.Test the Final Model: Assess the final model's performance on a separate test dataset that was not used during training or validation. This step provides an unbiased estimate of the model's generalization ability in real-world scenarios. By systematically tune the size of network we can develop an optimal ANN.