The document discusses the multi-layer perceptron model and the backpropagation algorithm. It provides details on the MLP model including transfer functions and how outputs are calculated. It then describes the backpropagation learning algorithm in 11 steps, covering initialization, forward computation to calculate outputs, backward computation to calculate error gradients and update weights. The goal is to minimize error between network and desired outputs by adjusting weights according to the gradient.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
This document discusses integrating adaptation mechanisms in self-adaptive software systems using control theory models. It presents a case study of using feedback control loops and control theory models to optimize a web server's performance by self-adjusting tuning parameters. The challenges of engineering such self-adaptive systems include control challenges for control engineers and integration challenges for software engineers. The study models the web server as a multi-input multi-output system and designs a linear quadratic regulator controller to optimize performance based on CPU utilization and memory usage.
Amnestic neural network for classificationlolokikipipi
1. The document presents an Amnesic Neural Network model for stock trend prediction that incorporates "forgetting" to address time variation in customer behavior data.
2. It applies the model to data on 900 stocks from 2001-2004 to classify price changes, training the network on 1000 records and testing it on 500.
3. The model achieved the highest classification accuracy of the test data with a forgetting coefficient of 0.1, outperforming a standard BP neural network model without forgetting.
This document provides an overview of support vector machines and related pattern recognition techniques:
- SVMs find the optimal separating hyperplane between classes by maximizing the margin between classes using support vectors.
- Nonlinear decision surfaces can be achieved by transforming data into a higher-dimensional feature space using kernel functions.
- Soft margin classifiers allow some misclassified points by introducing slack variables to improve generalization.
- Relevance vector machines take a Bayesian approach, placing a sparsity-inducing prior over weights to provide a probabilistic interpretation.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
The document discusses clustering techniques and provides details about the k-means clustering algorithm. It begins with an introduction to clustering and lists different clustering techniques. It then describes the k-means algorithm in detail, including how it works, the steps involved, and provides an example illustration. Finally, it discusses comments on the k-means algorithm, focusing on aspects like choosing the value of k, initializing cluster centroids, and different distance measurement methods.
Discrete mathematical structures are widely used in machine learning. Boolean algebra, in particular, is used in many machine learning algorithms and applications. The document discusses:
1) How Boolean logic is used in neural networks, with layers representing conjunction, disjunction, etc.
2) An example of classifying patterns with perceptrons using truth tables of Boolean AND and OR operations.
3) Real-life applications that use Boolean logic and machine learning, including diagnosing diseases from biomarker data and building a prognostic classifier for neuroblastoma.
Discrete mathematics concepts like Boolean algebra thus play an important role in machine learning algorithms and their applications to solve real-world problems.
OCR is the transformation of Images of text to Machine encoded text.
A simple API to an OCR library might provide a function which takes as input an image and outputs a string.
In this project we have applied Deep learning Neural Network to solve Optical Character Recognition.
We have made use of Tensorflow and Convolutional Neural Network.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
This document discusses integrating adaptation mechanisms in self-adaptive software systems using control theory models. It presents a case study of using feedback control loops and control theory models to optimize a web server's performance by self-adjusting tuning parameters. The challenges of engineering such self-adaptive systems include control challenges for control engineers and integration challenges for software engineers. The study models the web server as a multi-input multi-output system and designs a linear quadratic regulator controller to optimize performance based on CPU utilization and memory usage.
Amnestic neural network for classificationlolokikipipi
1. The document presents an Amnesic Neural Network model for stock trend prediction that incorporates "forgetting" to address time variation in customer behavior data.
2. It applies the model to data on 900 stocks from 2001-2004 to classify price changes, training the network on 1000 records and testing it on 500.
3. The model achieved the highest classification accuracy of the test data with a forgetting coefficient of 0.1, outperforming a standard BP neural network model without forgetting.
This document provides an overview of support vector machines and related pattern recognition techniques:
- SVMs find the optimal separating hyperplane between classes by maximizing the margin between classes using support vectors.
- Nonlinear decision surfaces can be achieved by transforming data into a higher-dimensional feature space using kernel functions.
- Soft margin classifiers allow some misclassified points by introducing slack variables to improve generalization.
- Relevance vector machines take a Bayesian approach, placing a sparsity-inducing prior over weights to provide a probabilistic interpretation.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
The document discusses clustering techniques and provides details about the k-means clustering algorithm. It begins with an introduction to clustering and lists different clustering techniques. It then describes the k-means algorithm in detail, including how it works, the steps involved, and provides an example illustration. Finally, it discusses comments on the k-means algorithm, focusing on aspects like choosing the value of k, initializing cluster centroids, and different distance measurement methods.
Discrete mathematical structures are widely used in machine learning. Boolean algebra, in particular, is used in many machine learning algorithms and applications. The document discusses:
1) How Boolean logic is used in neural networks, with layers representing conjunction, disjunction, etc.
2) An example of classifying patterns with perceptrons using truth tables of Boolean AND and OR operations.
3) Real-life applications that use Boolean logic and machine learning, including diagnosing diseases from biomarker data and building a prognostic classifier for neuroblastoma.
Discrete mathematics concepts like Boolean algebra thus play an important role in machine learning algorithms and their applications to solve real-world problems.
OCR is the transformation of Images of text to Machine encoded text.
A simple API to an OCR library might provide a function which takes as input an image and outputs a string.
In this project we have applied Deep learning Neural Network to solve Optical Character Recognition.
We have made use of Tensorflow and Convolutional Neural Network.
This document discusses methods for one-shot learning using siamese neural networks. It provides an overview of several key papers in this area, including using siamese networks for signature verification (1993) and one-shot image recognition (2015), and introducing matching networks for one-shot learning (2016). Matching networks incorporate an attention mechanism into a neural network to rapidly learn from small datasets by matching training and test conditions. The document also reviews experiments demonstrating one-shot and few-shot learning on datasets like Omniglot using these siamese and matching network approaches.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional feature space using gaussian kernel and clustering in the feature space. The Matlab simulation results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
This project report summarizes research on using deep learning methods for super resolution of document images to improve optical character recognition (OCR) accuracy. Two approaches are studied: SRCNN and incorporating OCR accuracy into the loss function. Experiments include modifying the dataset to focus on text regions and varying the SRCNN architecture. Results show the 2-layer SRCNN trained on the modified dataset achieves the best OCR accuracy, exceeding the ground truth in some cases.
Disease Classification using ECG Signal Based on PCA Feature along with GA & ...IRJET Journal
This document describes a method for classifying ECG signals to detect cardiovascular diseases using principal component analysis (PCA), genetic algorithms, and artificial neural networks. PCA is used to extract features from ECG signals. A genetic algorithm is then used to select optimal features and train an artificial neural network classifier. The method is tested on datasets from Physionet.org to classify ECG signals as normal or indicating conditions like bradycardia or tachycardia with high accuracy. The goal is to develop an automated system for ECG analysis and heart disease diagnosis.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
This document summarizes research on improving image classification results using neural networks. It compares common image classification methods like support vector machines (SVM) and K-nearest neighbors (KNN). It then evaluates the performance of multilayer perceptron (MLP) neural networks and radial basis function (RBF) neural networks on image classification. The document tests various configurations of MLP and RBF networks on a dataset containing 2310 images across 7 classes. It finds that a MLP network with two hidden layers of 10 neurons each achieves the best results, with an average accuracy of 98.84%. This is significantly higher than the 84.47% average accuracy of RBF networks and outperforms KNN classification as well. The research concludes that neural
The document provides an overview of neural networks for data mining. It discusses how neural networks can be used for classification tasks in data mining. It describes the structure of a multi-layer feedforward neural network and the backpropagation algorithm used for training neural networks. The document also discusses techniques like neural network pruning and rule extraction that can optimize neural network performance and interpretability.
MLPfit is a tool for designing and training multi-layer perceptrons (MLPs) for tasks like function approximation and classification. It implements stochastic minimization as well as more powerful methods like conjugate gradients and BFGS. MLPfit is designed to be simple, precise, fast and easy to use for both standalone and integrated applications. Documentation and source code are available online.
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
This paper explores using block coordinate descent to scale kernel learning methods to large datasets. It compares exact kernel methods to two approximation techniques, Nystrom and random Fourier features, on speech, text, and image datasets. Experimental results show that Nystrom generally achieves better accuracy than random features but requires more iterations. The paper also analyzes the performance and scalability of computing kernel blocks in a distributed setting.
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
This document summarizes a paper on learning to remember rare events using a memory-augmented neural network. The paper proposes a memory module that stores examples from previous tasks to help learn new rare tasks from only a single example. The memory module is trained end-to-end with the neural network on two tasks: one-shot learning on Omniglot characters and machine translation of rare words. The implementation uses a TensorFlow memory module that stores key-value pairs to retrieve examples similar to a query. Experiments show the memory module improves one-shot learning performance and handles rare words better than baselines.
- Hierarchical clustering is an unsupervised machine learning algorithm that groups unlabeled datasets into a hierarchical structure of clusters. It does not require predetermining the number of clusters as in K-means clustering.
- There are two approaches: agglomerative (bottom-up) and divisive (top-down). Agglomerative clustering starts with each data point as a cluster and merges them iteratively, while divisive clustering splits clusters iteratively.
- Hierarchical clustering represents the clustering hierarchy as a dendrogram tree structure. The optimal number of clusters can be determined by cutting the dendrogram tree at an appropriate level.
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
REAL TIME FACE DETECTION ON GPU USING OPENCLNARMADA NAIK
This paper presents a novel approach for real time face detection using heterogeneous
computing. The algorithm uses local binary pattern (LBP) as feature vector for face detection.
OpenCL is used to accelerate the code using GPU[1]. Illuminance invariance is achieved using
gamma correction and Difference of Gaussian(DOG) to make the algorithm robust against
varying lighting conditions. This implementation is compared with previous parallel
implementation and is found to perform faster.
Real Time Face Detection on GPU Using OPENCLcsandit
This document summarizes a research paper that presents a real-time face detection algorithm using local binary patterns (LBP) as features. The algorithm is implemented on a GPU using OpenCL to achieve faster processing speeds compared to CPU implementations. LBP features are extracted from divided image blocks in parallel on the GPU. Histograms of the LBP codes from each block are concatenated to form an overall feature vector for classification. Experimental results show the GPU implementation achieves a 5x speed improvement over CPU for 640x480 images.
Evaluation of programs codes using machine learningVivek Maskara
This document discusses using machine learning to detect copied code submissions. It proposes using unsupervised learning via k-means clustering and dimensionality reduction with principal component analysis (PCA) to group similar codes and reduce complexity from O(n^2) to O(n). Key steps include extracting features from codes, applying PCA to reduce dimensions, running k-means to cluster codes, and detecting copies between clusters. This approach could help identify cheating in online programming contests and evaluate student code submissions.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
This document discusses methods for one-shot learning using siamese neural networks. It provides an overview of several key papers in this area, including using siamese networks for signature verification (1993) and one-shot image recognition (2015), and introducing matching networks for one-shot learning (2016). Matching networks incorporate an attention mechanism into a neural network to rapidly learn from small datasets by matching training and test conditions. The document also reviews experiments demonstrating one-shot and few-shot learning on datasets like Omniglot using these siamese and matching network approaches.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional feature space using gaussian kernel and clustering in the feature space. The Matlab simulation results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
This project report summarizes research on using deep learning methods for super resolution of document images to improve optical character recognition (OCR) accuracy. Two approaches are studied: SRCNN and incorporating OCR accuracy into the loss function. Experiments include modifying the dataset to focus on text regions and varying the SRCNN architecture. Results show the 2-layer SRCNN trained on the modified dataset achieves the best OCR accuracy, exceeding the ground truth in some cases.
Disease Classification using ECG Signal Based on PCA Feature along with GA & ...IRJET Journal
This document describes a method for classifying ECG signals to detect cardiovascular diseases using principal component analysis (PCA), genetic algorithms, and artificial neural networks. PCA is used to extract features from ECG signals. A genetic algorithm is then used to select optimal features and train an artificial neural network classifier. The method is tested on datasets from Physionet.org to classify ECG signals as normal or indicating conditions like bradycardia or tachycardia with high accuracy. The goal is to develop an automated system for ECG analysis and heart disease diagnosis.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
This document summarizes research on improving image classification results using neural networks. It compares common image classification methods like support vector machines (SVM) and K-nearest neighbors (KNN). It then evaluates the performance of multilayer perceptron (MLP) neural networks and radial basis function (RBF) neural networks on image classification. The document tests various configurations of MLP and RBF networks on a dataset containing 2310 images across 7 classes. It finds that a MLP network with two hidden layers of 10 neurons each achieves the best results, with an average accuracy of 98.84%. This is significantly higher than the 84.47% average accuracy of RBF networks and outperforms KNN classification as well. The research concludes that neural
The document provides an overview of neural networks for data mining. It discusses how neural networks can be used for classification tasks in data mining. It describes the structure of a multi-layer feedforward neural network and the backpropagation algorithm used for training neural networks. The document also discusses techniques like neural network pruning and rule extraction that can optimize neural network performance and interpretability.
MLPfit is a tool for designing and training multi-layer perceptrons (MLPs) for tasks like function approximation and classification. It implements stochastic minimization as well as more powerful methods like conjugate gradients and BFGS. MLPfit is designed to be simple, precise, fast and easy to use for both standalone and integrated applications. Documentation and source code are available online.
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
This paper explores using block coordinate descent to scale kernel learning methods to large datasets. It compares exact kernel methods to two approximation techniques, Nystrom and random Fourier features, on speech, text, and image datasets. Experimental results show that Nystrom generally achieves better accuracy than random features but requires more iterations. The paper also analyzes the performance and scalability of computing kernel blocks in a distributed setting.
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
This document summarizes a paper on learning to remember rare events using a memory-augmented neural network. The paper proposes a memory module that stores examples from previous tasks to help learn new rare tasks from only a single example. The memory module is trained end-to-end with the neural network on two tasks: one-shot learning on Omniglot characters and machine translation of rare words. The implementation uses a TensorFlow memory module that stores key-value pairs to retrieve examples similar to a query. Experiments show the memory module improves one-shot learning performance and handles rare words better than baselines.
- Hierarchical clustering is an unsupervised machine learning algorithm that groups unlabeled datasets into a hierarchical structure of clusters. It does not require predetermining the number of clusters as in K-means clustering.
- There are two approaches: agglomerative (bottom-up) and divisive (top-down). Agglomerative clustering starts with each data point as a cluster and merges them iteratively, while divisive clustering splits clusters iteratively.
- Hierarchical clustering represents the clustering hierarchy as a dendrogram tree structure. The optimal number of clusters can be determined by cutting the dendrogram tree at an appropriate level.
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
In the paper the analysis of the periodic kernels parameters is described. Periodic kernels can
be used for the prediction task, performed as the typical regression problem. On the basis of the
Periodic Kernel Estimator (PerKE) the prediction of real time series is performed. As periodic
kernels require the setting of their parameters it is necessary to analyse their influence on the
prediction quality. This paper describes an easy methodology of finding values of parameters of
periodic kernels. It is based on grid search. Two different error measures are taken into
consideration as the prediction qualities but lead to comparable results. The methodology was
tested on benchmark and real datasets and proved to give satisfactory results.
REAL TIME FACE DETECTION ON GPU USING OPENCLNARMADA NAIK
This paper presents a novel approach for real time face detection using heterogeneous
computing. The algorithm uses local binary pattern (LBP) as feature vector for face detection.
OpenCL is used to accelerate the code using GPU[1]. Illuminance invariance is achieved using
gamma correction and Difference of Gaussian(DOG) to make the algorithm robust against
varying lighting conditions. This implementation is compared with previous parallel
implementation and is found to perform faster.
Real Time Face Detection on GPU Using OPENCLcsandit
This document summarizes a research paper that presents a real-time face detection algorithm using local binary patterns (LBP) as features. The algorithm is implemented on a GPU using OpenCL to achieve faster processing speeds compared to CPU implementations. LBP features are extracted from divided image blocks in parallel on the GPU. Histograms of the LBP codes from each block are concatenated to form an overall feature vector for classification. Experimental results show the GPU implementation achieves a 5x speed improvement over CPU for 640x480 images.
Evaluation of programs codes using machine learningVivek Maskara
This document discusses using machine learning to detect copied code submissions. It proposes using unsupervised learning via k-means clustering and dimensionality reduction with principal component analysis (PCA) to group similar codes and reduce complexity from O(n^2) to O(n). Key steps include extracting features from codes, applying PCA to reduce dimensions, running k-means to cluster codes, and detecting copies between clusters. This approach could help identify cheating in online programming contests and evaluate student code submissions.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
Similar to WK3 - Multi Layer Perceptron networks of neural multilayer perceptron (20)
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Technical Drawings introduction to drawing of prisms
WK3 - Multi Layer Perceptron networks of neural multilayer perceptron
1. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
WK3 – Multi Layer Perceptron
CS 476: Networks of Neural Computation
WK3 – Multi Layer Perceptron
2. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Contents
Contents
•MLP model details
•Back-propagation algorithm
•XOR Example
•Heuristics for Back-propagation
•Heuristics for learning rate
•Approximation of functions
•Generalisation
•Model selection through cross-validation
•Conguate-Gradient method for BP
3. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Contents
Contents II
•Advantages and disadvantages of BP
•Types of problems for applying BP
•Conclusions
4. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
MLP Model
Multi Layer Perceptron
•“Neurons” are positioned in layers. There are Input,
Hidden and Output Layers
5. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
MLP Model
Multi Layer Perceptron Output
•The output y is calculated by:
Where w0(n) is the bias.
•The function j(•) is a sigmoid function. Typical
examples are:
m
i
i
ji
j
j
j
j n
y
n
w
n
v
n
y
0
)
(
)
(
))
(
(
)
(
6. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
MLP Model
Transfer Functions
•The logistic sigmoid:
)
exp(
1
1
x
y
7. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
MLP Model
Transfer Functions II
•The hyperbolic tangent sigmoid:
2
))
exp(
)
(exp(
2
))
exp(
)
(exp(
)
cosh(
)
sinh(
)
tanh(
x
x
x
x
x
x
x
y
8. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm
•Assume that a set of examples ={x(n),d(n)},
n=1,…,N is given. x(n) is the input vector of
dimension m0 and d(n) is the desired response vector
of dimension M
•Thus an error signal, ej(n)=dj(n)-yj(n) can be defined
for the output neuron j.
•We can derive a learning algorithm for an MLP by
assuming an optimisation approach which is based on
the steepest descent direction, I.e.
w(n)=-g(n)
Where g(n) is the gradient vector of the cost function
and is the learning rate.
9. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm II
•The algorithm that it is derived from the steepest
descent direction is called back-propagation
•Assume that we define a SSE instantaneous cost
function (I.e. per example) as follows:
Where C is the set of all output neurons.
•If we assume that there are N examples in the set
then the average squared error is:
C
j
j n
e
n )
(
2
1
)
(
2
)
(
1
1
n
N
N
n
av
10. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm III
•We need to calculate the gradient wrt Eav or wrt to
E(n). In the first case we calculate the gradient per
epoch (i.e. in all patterns N) while in the second the
gradient is calculated per pattern.
•In the case of Eav we have the Batch mode of the
algorithm. In the case of E(n) we have the Online or
Stochastic mode of the algorithm.
•Assume that we use the online mode for the rest of
the calculation. The gradient is defined as:
)
(
)
(
)
(
n
w
n
n
g
ji
11. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm IV
•Using the chain rule of calculus we can write:
•We calculate the different partial derivatives as
follows:
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
n
w
n
v
n
v
n
y
n
y
n
e
n
e
n
n
w
n
ji
j
j
j
j
j
j
ji
)
(
)
(
)
(
n
e
n
e
n
j
j
1
)
(
)
(
n
y
n
e
j
j
12. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm V
•And,
•Combining all the previous equations we get finally:
))
(
(
'
)
(
)
(
n
v
n
v
n
y
j
j
j
j
)
(
)
(
)
(
n
y
n
w
n
v
i
ji
j
)
(
))
(
(
'
)
(
)
(
)
(
)
( n
y
n
v
n
e
n
w
n
n
w i
j
j
j
ji
ij
13. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm VI
•The equation regarding the weight corrections can be
written as:
Where j(n) is defined as the local gradient and is
given by:
•We need to distinguish two cases:
•j is an output neuron
•j is a hidden neuron
)
(
)
(
)
( n
y
n
n
w i
j
ji
))
(
(
'
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
( n
v
n
e
n
v
n
y
n
y
n
e
n
e
n
n
v
n
n j
j
j
j
j
j
j
j
j
j
14. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm VII
•Thus the Back-Propagation algorithm is an error-
correction algorithm for supervised learning.
•If j is an output neuron, we have already a definition
of ej(n), so, j(n) is defined (after substitution) as:
•If j is a hidden neuron then j(n) is defined as:
))
(
(
'
))
(
)
(
(
)
( n
v
n
y
n
d
n j
j
j
j
j
))
(
(
'
)
(
)
(
)
(
)
(
)
(
)
(
)
( n
v
n
y
n
n
v
n
y
n
y
n
n j
j
j
j
j
j
j
15. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm VIII
•To calculate the partial derivative of E(n) wrt to yj(n)
we remember the definition of E(n) and we change
the index for the output neuron to k, i.e.
•Then we have:
C
k
k n
e
n )
(
2
1
)
(
2
C
k j
k
k
j n
y
n
e
n
e
n
y
n
)
(
)
(
)
(
)
(
)
(
16. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm IX
•We use again the chain rule of differentiation to get
the partial derivative of ek(n) wrt yj(n):
•Remembering the definition of ek(n) we have:
•Hence:
C
k j
k
k
k
k
j n
y
n
v
n
v
n
e
n
e
n
y
n
)
(
)
(
)
(
)
(
)
(
)
(
)
(
))
(
(
)
(
)
(
)
(
)
( n
v
n
d
n
y
n
d
n
e v
k
k
k
k
k
))
(
(
'
)
(
)
(
n
v
n
v
n
e
k
k
k
k
17. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm X
•The local field vk(n) is defined as:
Where m is the number of neurons (from the previous
layer) which connect to neuron k. Thus we get:
•Hence:
m
j
j
kj
k n
y
n
w
n
v
0
)
(
)
(
)
(
)
(
)
(
)
(
n
w
n
y
n
v
kj
j
k
)
(
)
(
)
(
))
(
(
'
)
(
)
(
)
(
n
w
n
n
w
n
v
n
e
n
y
n
kj
C
k
k
kj
C
k
k
k
k
j
18. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Learning Algorithm XI
•Putting all together we find for the local gradient of a
hidden neuron j the following formula:
•It is useful to remember the special form of the
derivatives for the logistic and hyperbolic tangent
sigmoids:
•j’(vj(n))=yj(n)[1-yj(n)] (Logistic)
•j’(vj(n))=[1-yj(n)][1+yj(n)] (Hyp. Tangent)
)
(
)
(
))
(
(
'
)
( n
w
n
n
v
n kj
C
k
k
j
j
j
19. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm
1. Initialisation: Assuming that no prior infromation is
available, pick the synaptic weights and thresholds
from a uniform distribution whose mean is zero
and whose variance is chosen to make the std of
the local fields of the neurons lie at the transition
between the linear and saturated parts of the
sigmoid function
2. Presentation of training examples: Present the
network with an epoch of training examples. For
each example in the set, perform the sequence of
the forward and backward computations described
in points 3 & 4 below.
20. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm II
3. Forward Computation:
• Let the training example in the epoch be
denoted by (x(n),d(n)), where x is the input
vector and d is the desired vector.
• Compute the local fields by proceeding forward
through the network layer by layer. The local
field for neuron j at layer l is defined as:
where m is the number of neurons which
connect to j and yi
(l-1)(n) is the activation of
neuron i at layer (l-1). Wji
(l)(n) is the weight
m
i
l
i
l
ji
l
j n
y
n
w
n
v
0
)
1
(
)
(
)
(
)
(
)
(
)
(
21. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm III
which connects the neurons j and i.
• For i=0, we have y0
(l-1)(n)=+1 and
wj0
(l)(n)=bj
(l)(n) is the bias of neuron j.
• Assuming a sigmoid function, the output signal
of the neuron j is:
• If j is in the input layer we simply set:
where xj(n) is the jth component of the input
vector x.
))
(
(
)
(
)
(
)
(
n
v
n
y
l
j
j
l
j
)
(
)
(
)
0
(
n
x
n
y j
j
22. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm IV
• If j is in the output layer we have:
where oj(n) is the jth component of the output
vector o. L is the total number of layers in the
network.
• Compute the error signal:
where dj(n) is the desired response for the jth
element.
)
(
)
(
)
(
n
o
n
y j
L
j
)
(
)
(
)
( n
o
n
d
n
e j
j
j
23. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm V
4. Backward Computation:
• Compute the s of the network defined by:
where j(•) is the derivative of function j wrt
the argument.
• Adjust the weights using the generalised delta
rule:
where is the momentum constant
l
layer
hidden
in
j
neuron
for
n
w
n
n
v
L
layer
output
in
j
neuron
for
n
v
n
e
n
k
l
kj
l
k
l
j
j
L
j
j
L
j
l
j )
(
)
(
))
(
(
'
))
(
(
'
)
(
)
( )
1
(
)
1
(
)
(
)
(
)
(
)
(
)
(
)
(
)
1
(
)
(
)
1
(
)
(
)
(
)
(
n
y
n
n
w
n
w
l
i
l
j
l
ji
l
ji
24. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Summary of BP Algorithm VI
5. Iteration: Iterate the forward and backward
computations of steps 3 & 4 by presenting new
epochs of training examples until the stopping
criterion is met.
• The order of presentation of examples should be
randomised from epoch to epoch
• The momentum and the learning rate parameters
typically change (usually decreased) as the number
of training iterations increases.
25. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Stopping Criteria
• The BP algorithm is considered to have converged
when the Euclidean norm of the gradient vector
reaches a sufficiently small gradient threshold.
• The BP is considered to have converged when the
absolute value of the change in the average square
error per epoch is sufficiently small
26. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
XOR Example
• The XOR problem is defined by the following truth
table:
• The following network solves the problem. The
perceptron could not do this. (We use Sgn func.)
27. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Heuristics for Back-Propagation
• To speed the convergence of the back-propagation
algorithm the following heuristics are applied:
• H1: Use sequential (online) vs batch update
• H2: Maximise information content
• Use examples that produce largest error
• Use example which very different from all the
previous ones
• H3: Use an antisymmetric activation function,
such as the hyperbolic tangent. Antisymmetric
means:
(-x)=- (x)
28. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Heuristics for Back-Propagation II
• H4: Use different target values inside a smaller
range, different from the asymptotic values of
the sigmoid
• H5: Normalise the inputs:
• Create zero-mean variables
• Decorrelate the variables
• Scale the variables to have covariances
approximately equal
• H6: Initialise properly the weights. Use a zero
mean distribution with variance of:
m
w
1
29. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Heuristics for Back-Propagation III
where m is the number of connections arriving
to a neuron
• H7: Learn from hints
• H8: Adapt the learning rates appropriately (see
next section)
30. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP Algorithm
Heuristics for Learning Rate
• R1: Every adjustable parameter should have its
own learning rate
• R2: Every learning rate should be allowed to
adjust from one iteration to the next
• R3: When the derivative of the cost function
wrt a weight has the same algebraic sign for
several consecutive iterations of the algorithm,
the learning rate for that particular weight
should be increased.
• R4: When the algebraic sign of the derivative
above alternates for several consecutive
iterations of the algorithm the learning rate
should be decreased.
31. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Approxim.
Approximation of Functions
•Q: What is the minimum number of hidden layers in a
MLP that provides an approximate realisation of any
continuous mapping?
•A: Universal Approximation Theorem
Let (•) be a nonconstant, bounded, and monotone
increasing continuous function. Let Im0 denote the m0-
dimensional unit hypercube [0,1]m0. The space of
continuous functions on Im0 is denoted by C(Im0). Then
given any function f C(Im0) and > 0, there exists an
integer m1 and sets of real constants ai , bi and wij
where i=1,…, m1 and j=1,…, m0 such that we may
32. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Approxim.
Approximation of Functions II
define:
as an approximate realisation of function f(•); that is:
for all x1, …, xm0 that lie in the input space.
1 0
0
1 1
1 )
,...,
(
m
i
m
j
i
j
ij
i
m b
x
w
a
x
x
F
|
)
,...,
(
)
,...,
(
| 0
0 1
1 m
m x
x
f
x
x
F
33. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Approxim.
Approximation of Functions III
•The Universal Approximation Theorem is directly
applicable to MLPs. Specifically:
•The sigmoid functions cover the requirements for
function
•The network has m0 input nodes and a single
hidden layer consisting of m1 neurons; the inputs
are denoted by x1, …, xm0
•Hidden neuron I has synaptic weights wi1, …, wm0
and bias bi
•The network output is a linear combination of the
outputs of the hidden neurons, with a1 ,…, am1
defining the synaptic weights of the output layer
34. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Approxim.
Approximation of Functions IV
•The theorem is an existence theorem: It does not tell
us exactly what is the number m1; it just says that
exists!!!
•The theorem states that a single hidden layer is
sufficient for an MLP to compute a uniform
approximation to a given training set represented by
the set of inputs x1, …, xm0 and a desired output f(x1,
…, xm0).
•The theorem does not say however that a single
hidden layer is optimum in the sense of the learning
time, ease of implementation or generalisation.
35. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Approxim.
Approximation of Functions V
•Empirical knowledge shows that the number of data
pairs that are needed in order to achieve a given error
level is:
Where W is the total number of adjustable parameters
of the model. There is mathematical support for this
observation (but we will not analyse this further!)
•There is the “curse of dimensionality” for
approximating functions in high-dimensional spaces.
•It is theoretically justified to use two hidden layers.
W
O
N
36. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Generalisation
Def: A network generalises well when the input-output
mapping computed by the network is correct (or nearly
so) for test data never used in creating or training the
network. It is assumed that the test data are drawn
form the population used to generate the training data.
•We should try to approximate the true mechanism
that generates the data; not the specific structure of
the data in order to achieve the generalisation. If we
learn the specific structure of the data we have
overfitting or overtraining.
38. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Generalisation III
•To achieve good generalisation we need:
•To have good data (see previous slides)
•To impose smoothness constraints on the function
•To add knowledge we have about the mechanism
•Reduce / constrain model parameters:
•Through cross-validation
•Through regularisation (Pruning, AIC, BIC, etc)
39. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Cross Validation
•In cross validation method for model selection we
split the training data to two sets:
•Estimation set
•Validation set
•We train our model in the estimation set.
•We evaluate the performance in the validation set.
•We select the model which performs “best” in the
validation set.
40. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Cross Validation II
•There are variations of the method depending on the
partition of the validation set. Typical variants are:
•Method of early stopping
•Leave k-out
41. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Method of Early Stopping
•Apply the method of early stopping when the number
of data pairs, N, is less than N<30W, where W is the
number of free parameters in the network.
•Assume that r is the ratio of the training set which is
allocated to the validation. It can be shown that the
optimal value of this parameter is given by:
•The method works as follows:
•Train in the usual way the network using the data
in the estimation set
)
1
(
2
1
1
2
1
W
W
ropt
42. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Method of Early Stopping II
•After a period of estimation, the weights and bias
levels of MLP are all fixed and the network is
operating in its forward mode only. The validation
error is measured for each example present in the
validation subset
•When the validation phase is completed, the
estimation is resumed for another period (e.g. 10
epochs) and the process is repeated
43. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Leave k-out Validation
•We divide the set of available examples into K subsets
•The model is trained in all the subsets except for one
and the validation error is measured by testing it on
the subset left out
•The procedure is repeated for a total of K trials, each
time using a different subset for validation
•The performance of the model is assessed by
averaging the squared error under validation over all
the trials of the experiment
•There is a limiting case for K=N in which case the
method is called leave-one-out.
44. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Leave k-out Validation II
•An example with K=4 is shown below
45. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Network Pruning
•To solve real world problems we need to reduce the
free parameters of the model. We can achieve this
objective in one of two ways:
•Network growing: in which case we start with a
small MLP and then add a new neuron or layer of
hidden neurons only when we are unable to
achieve the performance level we want
•Network pruning: in this case we start with a large
MLP with an adequate performance for the
problem at hand, and then we prune it by
weakening or eliminating certain weights in a
principled manner
46. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Network Pruning II
•Pruning can be implemented as a form of
regularisation
47. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Regularisation
•In model selection we need to balance two needs:
•To achieve good performance, which usually leads
to a complex model
•To keep the complexity of the model manageable
due to practical estimation difficulties and the
overfitting phenomenon
•A principled approach to the counterbalance both
needs is given by regularisation theory.
•In this theory we assume that the estimation of the
model takes place using the usual cost function and a
second term which is called complexity penalty:
48. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Regularisation II
R(w)=Es(w)+Ec(w)
Where R is the total cost function, Es is the standard
performance measure, Ec is the complexity penalty and
>0 is a regularisation parameter
•Typically one imposes smoothness constraints as a
complexity term. I.e. we want to co-minimise the
smoothing integral of the kth-order:
Where F(x,w) is the function performed by the model
and (x) is some weighting function which determines
x
d
x
w
x
F
x
k
w k
k
c
)
(
||
)
,
(
||
2
1
)
,
( 2
49. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Regularisation III
the region of the input space where the function
F(x,w) is required to be smooth.
50. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Regularisation IV
•Other complexity penalty options include:
•Weight Decay:
Where W is the total number of all free parameters
in the model
•Weight Elimination:
Where w0 is a pre-assigned parameter
W
i
i
c w
w
w
1
2
2
||
||
)
(
W
i i
i
c
w
w
w
w
w
1
2
0
2
0
)
/
(
1
)
/
(
)
(
51. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Model Selec.
Regularisation V
•There are other methods which base their decision on
which weights to eliminate on the Hessian, H
•For example:
•The optimal brain damage procedure (OBD)
•The optimal brain surgeon procedure (OBS)
•In this case a weight, wi, is eliminated when:
Eav < Si
Where Si is defined as:
i
i
i
i
H
w
S
,
1
2
]
[
2
52. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP & Opt.
Conjugate-Gradient Method
•The conjugate-gradient method is a 2nd order
optimisation method, i.e. we assume that we can
approximate the cost function up to second degree in
the Taylor series:
Where A and b are appropriate matrix and vector and
x is a W-by-1 vector
•We can find the minimum point by solving the
equations:
x* = A-1b
c
x
b
x
A
x
x
f T
T
2
1
)
(
53. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP & Opt.
Conjugate-Gradient Method II
•Given the matrix A we say that a set of nonzero
vectors s(0), …, s(W-1) is A-conjugate if the following
condition holds:
sT(n)As(j)=0 , n and j, nj
•If A is the identity matrix, conjugacy is the same as
orthogonality.
•A-conjugate vectors are linearly independent
54. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP & Opt.
Summary of the Conjugate-Gradient Method
1. Initialisation: Unless prior knowledge on the weight
vector w is available, choose the initial value w(0)
using a procedure similar to the ones which are
used for the BP algorithm
2. Computation:
1. For w(0), use the BP to compute the gradient vector
g(0)
2. Set s(0)=r(0)=-g(0)
3. At time step n, use a line search to find (n) that
minimises Eav(n) sufficiently, representing the cost
function Eav expressed as a function of for fixed
values of w and s
55. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP & Opt.
Summary of the Conjugate-Gradient Method II
4. Test to determine if the Euclidean norm of the
residual r(n) has fallen below a specific value, that
is, a small fraction of the initial value ||r(0)||
5. Update the weight vector:
w(n+1)=w(n)+ (n) s(n)
6. For w(n+1), use the BP to compute the updated
gradient vector g(n+1)
7. Set r(n+1)=-g(n+1)
8. Use the Polak-Ribiere formula to calculate (n+1):
0
,
)
(
)
(
)]
(
)
1
(
)[
1
(
max
)
1
(
n
r
n
r
n
r
n
r
n
r
n T
T
56. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
BP & Opt.
Summary of the Conjugate-Gradient Method III
9. Update the direction vector:
s(n+1)=r(n+1)+ (n+1)s(n)
10. Set n=n+1 and go to step 3
3. Stopping Criterion: Terminate the algorithm when
the following condition is satisfied:
||r(n)|| ||r(0)||
Where is a prescribed small number
57. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Conclusions
Advantages & Disadvantages
•MLP and BP is used in Cognitive and Computational
Neuroscience modelling but still the algorithm does not
have real neuro-physiological support
•The algorithm can be used to make encoding /
decoding and compression systems. Useful for data
pre-processing operations
•The MLP with the BP algorithm is a universal
approximator of functions
•The algorithm is computationally efficient as it has
O(W) complexity to the model parameters
•The algorithm has “local” robustness
•The convergence of the BP can be very slow,
especially in large problems, depending on the method
58. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Conclusions
Advantages & Disadvantages II
•The BP algorithm suffers from the problem of local
minima
59. Contents
MLP Model
BP Algorithm
Approxim.
Model Selec.
BP & Opt.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Conclusions
Conclusions
Types of problems
•The BP algorithm is used in a great variety of
problems:
•Time series predictions
•Credit risk assessment
•Pattern recognition
•Speech processing
•Cognitive modelling
•Image processing
•Control
•Etc
•BP is the standard algorithm against which all other
NN algorithms are compared!!