We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures without searching for the optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To address such limitations, we introduce a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained network for a novel dataset and constraints (e.g. number of parameters), from a model zoo. Then, we propose a novel framework to tackle the problem, namely Task-Adaptive Neural Network Search (TANS). Given a model-zoo that consists of network pretrained on diverse datasets, we use a novel amortized meta-learning framework to learn a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a high-performing network on it, and minimize the similarity between irrelevant dataset-network pairs. We validate the effectiveness and efficiency of our method on ten real-world datasets, against existing NAS/AutoML baselines. The results show that our method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network. Our code and the model-zoo are available at https://anonymous.4open.science/r/TANS-33D6
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
How to create a neural network that detects people wearing masks. Ultimate description, the A-to-Z workflow for creating a neural network that recognizes images.
A short intro to the paper: https://blog.fulcrum.rocks/neural-network-image-recognition
Pattern Recognition using Artificial Neural NetworkEditor IJCATR
An artificial neural network (ANN) usually called neural network. It can be considered as a resemblance to a paradigm
which is inspired by biological nervous system. In network the signals are transmitted by the means of connections links. The links
possess an associated way which is multiplied along with the incoming signal. The output signal is obtained by applying activation to
the net input NN are one of the most exciting and challenging research areas. As ANN mature into commercial systems, they are likely
to be implemented in hardware. Their fault tolerance and reliability are therefore vital to the functioning of the system in which they
are embedded. The pattern recognition system is implemented with Back propagation network and Hopfield network to remove the
distortion from the input. The Hopfield network has high fault tolerance which supports this system to get the accurate output.
This document provides an introduction to neural networks, including their basic components and types. It discusses neurons, activation functions, different types of neural networks based on connection type, topology, and learning methods. It also covers applications of neural networks in areas like pattern recognition and control systems. Neural networks have advantages like the ability to learn from experience and handle incomplete information, but also disadvantages like the need for training and high processing times for large networks. In conclusion, neural networks can provide more human-like artificial intelligence by taking approximation and hard-coded reactions out of AI design, though they still require fine-tuning.
Neural Network and Artificial Intelligence.
Neural Network and Artificial Intelligence.
WHAT IS NEURAL NETWORK?
The method calculation is based on the interaction of plurality of processing elements inspired by biological nervous system called neurons.
It is a powerful technique to solve real world problem.
A neural network is composed of a number of nodes, or units[1], connected by links. Each linkhas a numeric weight[2]associated with it. .
Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights.
Artificial neurons are the constitutive units in an artificial neural network.
WHY USE NEURAL NETWORKS?
It has ability to Learn from experience.
It can deal with incomplete information.
It can produce result on the basis of input, has not been taught to deal with.
It is used to extract useful pattern from given data i.e. pattern Recognition etc.
Biological Neurons
Four parts of a typical nerve cell :• DENDRITES: Accepts the inputs• SOMA : Process the inputs• AXON : Turns the processed inputs into outputs.• SYNAPSES : The electrochemical contactbetween the neurons.
ARTIFICIAL NEURONS MODEL
Inputs to the network arerepresented by the x1mathematical symbol, xn
Each of these inputs are multiplied by a connection weight , wn
sum = w1 x1 + ……+ wnxn
These products are simplysummed, fed through the transfer function, f( ) to generate a result and then output.
NEURON MODEL
Neuron Consist of:
Inputs (Synapses): inputsignal.Weights (Dendrites):determines the importance ofincoming value.Output (Axon): output toother neuron or of NN .
The document discusses hyperparameters and hyperparameter tuning in deep learning models. It defines hyperparameters as parameters that govern how the model parameters (weights and biases) are determined during training, in contrast to model parameters which are learned from the training data. Important hyperparameters include the learning rate, number of layers and units, and activation functions. The goal of training is for the model to perform optimally on unseen test data. Model selection, such as through cross-validation, is used to select the optimal hyperparameters. Training, validation, and test sets are also discussed, with the validation set used for model selection and the test set providing an unbiased evaluation of the fully trained model.
Examinations of humans' central nervous systems inspired the concept of artificial neural networks. In an artificial neural network, simple artificial nodes, known as "neurons", "neurodes", "processing elements" or "units", are connected together to form a network which mimics a biological neural network
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures without searching for the optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To address such limitations, we introduce a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained network for a novel dataset and constraints (e.g. number of parameters), from a model zoo. Then, we propose a novel framework to tackle the problem, namely Task-Adaptive Neural Network Search (TANS). Given a model-zoo that consists of network pretrained on diverse datasets, we use a novel amortized meta-learning framework to learn a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a high-performing network on it, and minimize the similarity between irrelevant dataset-network pairs. We validate the effectiveness and efficiency of our method on ten real-world datasets, against existing NAS/AutoML baselines. The results show that our method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network. Our code and the model-zoo are available at https://anonymous.4open.science/r/TANS-33D6
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
How to create a neural network that detects people wearing masks. Ultimate description, the A-to-Z workflow for creating a neural network that recognizes images.
A short intro to the paper: https://blog.fulcrum.rocks/neural-network-image-recognition
Pattern Recognition using Artificial Neural NetworkEditor IJCATR
An artificial neural network (ANN) usually called neural network. It can be considered as a resemblance to a paradigm
which is inspired by biological nervous system. In network the signals are transmitted by the means of connections links. The links
possess an associated way which is multiplied along with the incoming signal. The output signal is obtained by applying activation to
the net input NN are one of the most exciting and challenging research areas. As ANN mature into commercial systems, they are likely
to be implemented in hardware. Their fault tolerance and reliability are therefore vital to the functioning of the system in which they
are embedded. The pattern recognition system is implemented with Back propagation network and Hopfield network to remove the
distortion from the input. The Hopfield network has high fault tolerance which supports this system to get the accurate output.
This document provides an introduction to neural networks, including their basic components and types. It discusses neurons, activation functions, different types of neural networks based on connection type, topology, and learning methods. It also covers applications of neural networks in areas like pattern recognition and control systems. Neural networks have advantages like the ability to learn from experience and handle incomplete information, but also disadvantages like the need for training and high processing times for large networks. In conclusion, neural networks can provide more human-like artificial intelligence by taking approximation and hard-coded reactions out of AI design, though they still require fine-tuning.
Neural Network and Artificial Intelligence.
Neural Network and Artificial Intelligence.
WHAT IS NEURAL NETWORK?
The method calculation is based on the interaction of plurality of processing elements inspired by biological nervous system called neurons.
It is a powerful technique to solve real world problem.
A neural network is composed of a number of nodes, or units[1], connected by links. Each linkhas a numeric weight[2]associated with it. .
Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights.
Artificial neurons are the constitutive units in an artificial neural network.
WHY USE NEURAL NETWORKS?
It has ability to Learn from experience.
It can deal with incomplete information.
It can produce result on the basis of input, has not been taught to deal with.
It is used to extract useful pattern from given data i.e. pattern Recognition etc.
Biological Neurons
Four parts of a typical nerve cell :• DENDRITES: Accepts the inputs• SOMA : Process the inputs• AXON : Turns the processed inputs into outputs.• SYNAPSES : The electrochemical contactbetween the neurons.
ARTIFICIAL NEURONS MODEL
Inputs to the network arerepresented by the x1mathematical symbol, xn
Each of these inputs are multiplied by a connection weight , wn
sum = w1 x1 + ……+ wnxn
These products are simplysummed, fed through the transfer function, f( ) to generate a result and then output.
NEURON MODEL
Neuron Consist of:
Inputs (Synapses): inputsignal.Weights (Dendrites):determines the importance ofincoming value.Output (Axon): output toother neuron or of NN .
The document discusses hyperparameters and hyperparameter tuning in deep learning models. It defines hyperparameters as parameters that govern how the model parameters (weights and biases) are determined during training, in contrast to model parameters which are learned from the training data. Important hyperparameters include the learning rate, number of layers and units, and activation functions. The goal of training is for the model to perform optimally on unseen test data. Model selection, such as through cross-validation, is used to select the optimal hyperparameters. Training, validation, and test sets are also discussed, with the validation set used for model selection and the test set providing an unbiased evaluation of the fully trained model.
Examinations of humans' central nervous systems inspired the concept of artificial neural networks. In an artificial neural network, simple artificial nodes, known as "neurons", "neurodes", "processing elements" or "units", are connected together to form a network which mimics a biological neural network
This document provides an overview of different techniques for hyperparameter tuning in machine learning models. It begins with introductions to grid search and random search, then discusses sequential model-based optimization techniques like Bayesian optimization and Tree-of-Parzen Estimators. Evolutionary algorithms like CMA-ES and particle-based methods like particle swarm optimization are also covered. Multi-fidelity methods like successive halving and Hyperband are described, along with recommendations on when to use different techniques. The document concludes by listing several popular libraries for hyperparameter tuning.
This document introduces neural networks and their applications. It discusses how neural networks simulate the human brain using processing nodes and weights to learn from patterns in data. Applications include prediction, pattern detection, and classification. The document also provides an overview of neural network theory, architecture, learning process, and development tools. It notes benefits like handling nonlinear problems and noisy data, as well as limitations such as the "black box" nature and lack of explainability.
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
An artificial neural network is a mathematical model that maps inputs to outputs. It consists of an input layer, hidden layers, and an output layer connected by weights and biases. Activation functions determine the output of each node. Training a neural network involves adjusting the weights and biases through backpropagation to minimize a loss function and improve predictions based on the input data. Feedforward involves calculating predictions, while backpropagation calculates gradients to update weights and biases through gradient descent.
The document proposes using an artificial neural network with a modified backpropagation algorithm for load forecasting. It describes developing a model to forecast electrical load for the next 24 hours on a daily basis. The neural network is trained using historical load data from a load dispatch center. Once trained, the network can generate daily load forecasts. The document provides background on artificial neural networks, including their structure of interconnected processing units inspired by biological neurons, and how they are trained through a process of backward propagation of errors.
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
1. The document summarizes a research paper that proposes training deep neural networks on soft labels representing human uncertainty in image classification, which improves generalization and robustness compared to training on hard labels.
2. Experiments show that models trained on soft labels constructed from human responses better fit patterns of human uncertainty and improve accuracy, cross-entropy, and a new second-best accuracy measure on various generalization datasets.
3. Alternative soft label methods are also explored, finding that human uncertainty provides a more important contribution than soft labels alone. While robustness to adversarial attacks is improved, defenses are still needed.
This document discusses using graph-based semi-supervised learning for sentiment categorization. It begins by explaining the problem of sentiment categorization with limited labeled data. A graph-based approach is proposed, where documents are represented as nodes in a graph and edge weights indicate similarity between documents. A loss function is defined over the graph that incorporates both labeled and unlabeled data. By minimizing this loss function, predictions can be made for unlabeled data. A closed-form solution is derived by solving a system of linear equations involving the graph Laplacian matrix. Experimental results show this approach improves over a supervised-only baseline, demonstrating the benefits of leveraging unlabeled data through graph-based semi-supervised learning.
Although a new technological advancement, the scope of Deep Learning is expanding exponentially. Advanced Deep Learning technology aims to imitate the biological neural network, that is, of the human brain.
https://takeoffprojects.com/advanced-deep-learning-projects
We are providing you with some of the greatest ideas for building Final Year projects with proper guidance and assistance.
Neural networks are a type of data mining technique inspired by biological neural systems. They are composed of interconnected nodes similar to neurons in the brain. Neural networks can learn patterns from complex data through supervised or unsupervised learning methods. They are widely used for applications like fraud detection, risk assessment, image recognition, and stock market prediction due to their ability to learn from examples without being explicitly programmed.
Artificial Neural Network Paper Presentationguestac67362
The document provides an introduction to artificial neural networks. It discusses how neural networks are designed to mimic the human brain by using interconnected processing elements like neurons. The key aspects covered are:
- Neural networks can perform tasks like pattern recognition that are difficult for traditional algorithms.
- They are composed of interconnected nodes that transmit scalar messages to each other via weighted connections like synapses.
- Neural networks are trained by presenting examples, allowing the weighted connections to adjust until the network produces the desired output for each input.
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
This document discusses using genetic algorithms to tune hyperparameters in predictive models. It begins by providing an overview of genetic algorithms, describing them as a heuristic approach that mimics natural selection to generate multiple solutions. It then defines key terms related to genetic algorithms and chromosomes. The document outlines the genetic algorithm methodology and provides pseudocode. It applies this approach to tune hyperparameters C and gamma in an SVM model and finds it achieves higher accuracy than grid search in less computation time. In an appendix, it references related work and describes a spam email dataset used to classify emails as spam or not spam.
Data Science, Machine Learning and Neural NetworksBICA Labs
Lecture briefly overviewing state of the art of Data Science, Machine Learning and Neural Networks. Covers main Artificial Intelligence technologies, Data Science algorithms, Neural network architectures and cloud computing facilities enabling the whole stack.
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
Provides a brief overview of what machine learning is, how it works (theory), how to prepare data for a machine learning problem, an example case study, and additional resources.
Forecasting of Sales using Neural network techniquesHitesh Dua
This document discusses using neural network techniques for sales forecasting. It begins by defining sales forecasting and explaining its need in areas like human resources, R&D, marketing, finance, production and purchasing. The document then outlines the sales forecasting process including setting goals, data gathering, analysis, mining, and applying neural network models. It describes the basic concepts of artificial neural networks and different neural network models like feed forward, recurrent, and backpropagation. It provides details on how these models work, especially explaining the backpropagation training algorithm and how it minimizes network error through forward and backward passes. Finally, it lists several references for further information.
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
Deep learning systems are susceptible to adversarial manipulation through techniques like generating adversarial samples and substitute models. By making small, targeted perturbations to inputs, an attacker can cause misclassifications or reduce a model's confidence without affecting human perception of the inputs. This is possible due to blind spots in how models learn representations that are different from human concepts. Defending against such attacks requires training models with adversarial techniques to make them more robust.
This document provides an overview of different techniques for hyperparameter tuning in machine learning models. It begins with introductions to grid search and random search, then discusses sequential model-based optimization techniques like Bayesian optimization and Tree-of-Parzen Estimators. Evolutionary algorithms like CMA-ES and particle-based methods like particle swarm optimization are also covered. Multi-fidelity methods like successive halving and Hyperband are described, along with recommendations on when to use different techniques. The document concludes by listing several popular libraries for hyperparameter tuning.
This document introduces neural networks and their applications. It discusses how neural networks simulate the human brain using processing nodes and weights to learn from patterns in data. Applications include prediction, pattern detection, and classification. The document also provides an overview of neural network theory, architecture, learning process, and development tools. It notes benefits like handling nonlinear problems and noisy data, as well as limitations such as the "black box" nature and lack of explainability.
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
An artificial neural network is a mathematical model that maps inputs to outputs. It consists of an input layer, hidden layers, and an output layer connected by weights and biases. Activation functions determine the output of each node. Training a neural network involves adjusting the weights and biases through backpropagation to minimize a loss function and improve predictions based on the input data. Feedforward involves calculating predictions, while backpropagation calculates gradients to update weights and biases through gradient descent.
The document proposes using an artificial neural network with a modified backpropagation algorithm for load forecasting. It describes developing a model to forecast electrical load for the next 24 hours on a daily basis. The neural network is trained using historical load data from a load dispatch center. Once trained, the network can generate daily load forecasts. The document provides background on artificial neural networks, including their structure of interconnected processing units inspired by biological neurons, and how they are trained through a process of backward propagation of errors.
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
1. The document summarizes a research paper that proposes training deep neural networks on soft labels representing human uncertainty in image classification, which improves generalization and robustness compared to training on hard labels.
2. Experiments show that models trained on soft labels constructed from human responses better fit patterns of human uncertainty and improve accuracy, cross-entropy, and a new second-best accuracy measure on various generalization datasets.
3. Alternative soft label methods are also explored, finding that human uncertainty provides a more important contribution than soft labels alone. While robustness to adversarial attacks is improved, defenses are still needed.
This document discusses using graph-based semi-supervised learning for sentiment categorization. It begins by explaining the problem of sentiment categorization with limited labeled data. A graph-based approach is proposed, where documents are represented as nodes in a graph and edge weights indicate similarity between documents. A loss function is defined over the graph that incorporates both labeled and unlabeled data. By minimizing this loss function, predictions can be made for unlabeled data. A closed-form solution is derived by solving a system of linear equations involving the graph Laplacian matrix. Experimental results show this approach improves over a supervised-only baseline, demonstrating the benefits of leveraging unlabeled data through graph-based semi-supervised learning.
Although a new technological advancement, the scope of Deep Learning is expanding exponentially. Advanced Deep Learning technology aims to imitate the biological neural network, that is, of the human brain.
https://takeoffprojects.com/advanced-deep-learning-projects
We are providing you with some of the greatest ideas for building Final Year projects with proper guidance and assistance.
Neural networks are a type of data mining technique inspired by biological neural systems. They are composed of interconnected nodes similar to neurons in the brain. Neural networks can learn patterns from complex data through supervised or unsupervised learning methods. They are widely used for applications like fraud detection, risk assessment, image recognition, and stock market prediction due to their ability to learn from examples without being explicitly programmed.
Artificial Neural Network Paper Presentationguestac67362
The document provides an introduction to artificial neural networks. It discusses how neural networks are designed to mimic the human brain by using interconnected processing elements like neurons. The key aspects covered are:
- Neural networks can perform tasks like pattern recognition that are difficult for traditional algorithms.
- They are composed of interconnected nodes that transmit scalar messages to each other via weighted connections like synapses.
- Neural networks are trained by presenting examples, allowing the weighted connections to adjust until the network produces the desired output for each input.
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
This document discusses using genetic algorithms to tune hyperparameters in predictive models. It begins by providing an overview of genetic algorithms, describing them as a heuristic approach that mimics natural selection to generate multiple solutions. It then defines key terms related to genetic algorithms and chromosomes. The document outlines the genetic algorithm methodology and provides pseudocode. It applies this approach to tune hyperparameters C and gamma in an SVM model and finds it achieves higher accuracy than grid search in less computation time. In an appendix, it references related work and describes a spam email dataset used to classify emails as spam or not spam.
Data Science, Machine Learning and Neural NetworksBICA Labs
Lecture briefly overviewing state of the art of Data Science, Machine Learning and Neural Networks. Covers main Artificial Intelligence technologies, Data Science algorithms, Neural network architectures and cloud computing facilities enabling the whole stack.
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
Provides a brief overview of what machine learning is, how it works (theory), how to prepare data for a machine learning problem, an example case study, and additional resources.
Forecasting of Sales using Neural network techniquesHitesh Dua
This document discusses using neural network techniques for sales forecasting. It begins by defining sales forecasting and explaining its need in areas like human resources, R&D, marketing, finance, production and purchasing. The document then outlines the sales forecasting process including setting goals, data gathering, analysis, mining, and applying neural network models. It describes the basic concepts of artificial neural networks and different neural network models like feed forward, recurrent, and backpropagation. It provides details on how these models work, especially explaining the backpropagation training algorithm and how it minimizes network error through forward and backward passes. Finally, it lists several references for further information.
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
Deep learning systems are susceptible to adversarial manipulation through techniques like generating adversarial samples and substitute models. By making small, targeted perturbations to inputs, an attacker can cause misclassifications or reduce a model's confidence without affecting human perception of the inputs. This is possible due to blind spots in how models learn representations that are different from human concepts. Defending against such attacks requires training models with adversarial techniques to make them more robust.
This document provides an overview of artificial neural networks (ANNs). It begins with an introduction explaining that ANNs are computational models inspired by the human brain's neural network. The document then discusses what ANNs are, how they work by simulating biological neurons in interconnected layers, and the learning paradigms of supervised, unsupervised, and reinforcement learning. It also outlines common applications of ANNs like image and language processing. The document concludes by noting both the advantages of ANNs in adapting to new situations and modeling complex functions, as well as their disadvantages like forgetting and large network complexity.
Survey on contrastive self supervised l earningAnirudh Ganguly
Contrastive self-supervised learning aims to group similar images together and dissimilar images apart by randomly augmenting each image and training the model to group originals with their augmentations but not with other images. Pretext tasks generate pseudo labels for self-supervised learning using data attributes without labels. Major pretext tasks include color/geometric transformations and context/cross-model based tasks. Encoders output representations for downstream tasks like classification, and contrastive loss updates encoder parameters to bring positive samples closer and negative samples farther in latent space.
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
The document discusses a paper that argues traditional theories of generalization may not fully explain why large neural networks generalize well in practice. It summarizes the paper's key points:
1) The paper shows neural networks can easily fit random labels, calling into question traditional measures of complexity.
2) Regularization helps but is not the fundamental reason for generalization. Neural networks have sufficient capacity to memorize data.
3) Implicit biases in algorithms like SGD may better explain generalization by driving solutions toward minimum norm.
4) The paper suggests rethinking generalization as the effective capacity of neural networks may differ from theoretical measures. Understanding finite sample expressivity is important.
Neural networks can be used for machine learning tasks like classification. They consist of interconnected nodes that update their weight values during a training process using examples. Neural networks have been applied successfully to tasks like handwritten character recognition, autonomous vehicle control by observing human drivers, and pronunciation of written text. Their design is inspired by biological neural networks in the brain.
Neural networks can be used for machine learning tasks like classification. They consist of interconnected nodes that update their weight values during a training process using examples. Neural networks have been applied successfully to tasks like handwritten character recognition, autonomous vehicle control by observing human drivers, and text-to-speech pronunciation generation. Their architecture is inspired by the human brain but neural networks are trained using computational methods while the brain uses biological processes.
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
Presentation on 'Deep Learning: Evolution of ML from Statistical to Brain-like Computing'
Speaker- Dr. Vijay Srinivas Agneeswaran,Director, Big Data Labs, Impetus
The main objective of the presentation is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives can be summarized as below:
- First-hand experience and insights into implementation of distributed deep learning networks.
- Thorough view of GraphLab (including descriptions of code) and the extensions required to implement these networks.
- Details of how the extensions were realized/implemented in GraphLab source – they have been submitted to the community for evaluation.
- Arrhythmia detection use case as an application of the large scale distributed deep learning network.
Learning to learn unlearned feature for segmentationNAVER Engineering
최근 machine learning 분야에서 활발히 연구되고 있는 meta-learning은 기존의 Gradient-descent 기반 학습 방법의 한계점으로 지적되는 엄청난 규모의 데이터 요구량 문제를 해결하기 위해 연구되는 분야로 학습 모델이 수 샘플으로도 충분한 학습 성능을 낼 수 있도록 하는 학습 기법이다. 메타 러닝 기법 중에서 Model-Agnostic Meta-Learning (MAML)은 학습 대상 모델의 구조와 상관없이 새로운 gradient-descent based algorithm을 통해 classification, reinforcement learning 임무를 빠른 시간 안에 높은 성능을 가지는 모델으로 학습하는 것이 실제로 가능하다고 보여주었다. 하지만 MAML은 image segmentation과 같이 복잡한 학습 네트워크 모델을 가지는 일에서는 효과적인 성능을 보여주지 못한다. 따라서 본 발표에서는 segmentation에 적용할 수 있는 MAML 기반 학습법에 대해 고찰하고, 특히 segmentation 네트워크를 re-training, transfer-learning와 같이 fine-tuning해야할 때 쓸 수 있는 meta-learning 기법을 소개하고자 한다. 제안된 기법은 active meta-tune이라 부르며, classification과 달리 복잡한 구조를 가지는 segmentation을 잘 수행하기 위해 meta-learning을 통해 학습하는 학습 데이터의 순서를 active learning 기반 알고리즘으로 정해주는 기술이다. 그러므로 본 발표에서는 active learning과 meta-learning이 어떻게 결합될 수 있는 지에 대한 이론적 배경과 active meta-tune의 알고리즘, 실제 적용 분야에 대하여 다룰 것이다.
This document provides an overview of deep learning techniques including neural networks, convolutional neural networks (CNNs), and long short-term memory (LSTM) algorithms. It defines key concepts like Bayesian inference, heuristics, perceptrons, and backpropagation. It also describes how to configure neural networks by specifying hyperparameters, hidden layers, normalization methods, and training parameters. CNN architectures are explained including convolution, pooling, and applications in computer vision tasks. Finally, predictive maintenance using deep learning to predict equipment failures from sensor data is briefly discussed.
1. The document discusses model interpretation and techniques for interpreting machine learning models, especially deep neural networks.
2. It describes what model interpretation is, its importance and benefits, and provides examples of interpretability algorithms like dimensionality reduction, manifold learning, and visualization techniques.
3. The document aims to help make machine learning models more transparent and understandable to humans in order to build trust and improve model evaluation, debugging and feature engineering.
[DSC Adria 23]Davor Horvatic Human-Centric Explainable AI In Time Series Anal...DataScienceConferenc1
To fully trust, accept, and adopt newly emerging AI solutions in our everyday lives and practices, we need human-centric explainable AI that can provide human-understandable interpretations for their algorithmic behaviour and outcomes—consequently enabling us to control and continuously improve their performance, robustness, fairness, accountability, transparency, and explainability throughout the entire lifecycle of AI applications. The recently emerging trend within diverse and multidisciplinary research forms the basis of the next wave of AI. In this talk, we will present research that plans to produce interpretable deep learning models for time series analysis with a broad scope of applications.
The document discusses developing a computerized paper evaluation system using neural networks. It proposes replacing the current manual evaluation system, which is biased, inconsistent, and slow, with an automated system. A neural network would analyze student answers, search reference materials for relevant information, assign marks, and ask follow-up questions to further assess student understanding. The network would learn to accurately evaluate papers through a supervised learning process using example papers. Key chapters address the basic structure of the proposed examination system, the role neural networks could play in automatic language analysis and evaluation, and algorithms that could enable unsupervised learning.
This paper describes a method to classify messages from online health forums about psoriasis into those describing treatments that worked and those that do not. The authors use natural language processing and a convolutional neural network model. They collected over 2000 posts from various forums discussing psoriasis treatments. The CNN model was trained on this labeled data and achieved an accuracy of 84% at classifying messages as describing a solution or not. The authors developed tools to automatically search forums, extract posts, and prepare the text for analysis using NLP techniques prior to classification with the CNN model.
This document discusses deep learning and its applications in the real world. It begins with an introduction to deep learning and then discusses using pre-trained deep learning models for new problems and applications. Some key points discussed include starting from scratch to build a model for a new problem with no existing literature, repurposing pre-trained models for new ideas, and tips for using pre-trained models for mobile applications such as model conversion. Real-life examples of using pre-trained models for new applications like human pose estimation are also provided.
This document provides an introduction to deep learning. It defines artificial intelligence, machine learning, data science, and deep learning. Machine learning is a subfield of AI that gives machines the ability to improve performance over time without explicit human intervention. Deep learning is a subfield of machine learning that builds artificial neural networks using multiple hidden layers, like the human brain. Popular deep learning techniques include convolutional neural networks, recurrent neural networks, and autoencoders. The document discusses key components and hyperparameters of deep learning models.
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
This document discusses learning components and types of learning in artificial intelligence. It will differentiate between supervised, unsupervised and reinforcement learning, and implement applications of each. Students will learn and implement perceptron and neural networks, as well as ensemble learning techniques like bagging and boosting. The objective is to discuss learning components, types of learning in AI, and implement algorithms for supervised, unsupervised and reinforcement learning.
Similar to Cost-effective Interactive Attention Learning with Neural Attention Process (20)
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...MLAI2
Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer. This distance sensitivity with respect to the data aids in tasks such as uncertainty calibration and out-of-distribution (OOD) detection. In previous works, features extracted with a distance sensitive model are used to construct feature covariance matrices which are used in deterministic uncertainty estimation or OOD detection. However, in cases where there is a distribution over tasks, these methods result in covariances which are sub-optimal, as they may not leverage all of the meta information which can be shared among tasks. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. Additionally, we propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution which is well calibrated under a distributional dataset shift.
Online Hyperparameter Meta-Learning with Hypergradient DistillationMLAI2
Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based hyperparameter optimization (HO) methods, they suffer from the following issues. Unrolled differentiation methods do not scale well to high-dimensional hyperparameters or horizon length, Implicit Function Theorem (IFT) based methods are restrictive for online optimization, and short horizon approximations suffer from short horizon bias. In this work, we propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation. Specifically, we parameterize a single Jacobian-vector product (JVP) for each HO step and minimize the distance from the true second-order term. Our method allows online optimization and also is scalable to the hyperparameter dimension and the horizon length. We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets.
Online Coreset Selection for Rehearsal-based Continual LearningMLAI2
A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed later to alleviate catastrophic forgetting. In continual learning, the quality of the samples stored in the coreset directly affects the model's effectiveness and efficiency. The coreset selection problem becomes even more important under realistic settings, such as imbalanced continual learning or noisy data scenarios. To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting. We validate the effectiveness of our coreset selection mechanism over various standard, imbalanced, and noisy datasets against strong continual learning baselines, demonstrating that it improves task adaptation and prevents catastrophic forgetting in a sample-efficient manner.
Representational Continuity for Unsupervised Continual LearningMLAI2
Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (Lump), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningMLAI2
Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to unseen languages. In this paper, we first show that it is crucial for those tasks to align gradients between them in order to maximize knowledge transfer while minimizing negative transfer. Despite its importance, the existing methods for gradient alignment either have a completely different purpose, ignore inter-task alignment, or aim to solve continual learning problems in rather inefficient ways. As a result of the misaligned gradients between tasks, the model suffers from severe negative transfer in the form of catastrophic forgetting of the knowledge acquired from the pretraining. To overcome the limitations, we propose a simple yet effective method that can efficiently align gradients between tasks. Specifically, we perform each inner-optimization by sequentially sampling batches from all the tasks, followed by a Reptile outer update. Thanks to the gradients aligned between tasks by our method, the model becomes less vulnerable to negative transfer and catastrophic forgetting. We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks, where our method largely outperforms all the relevant baselines we consider.
While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible. To mitigate this issue, meta-reinforcement learning methods aim to enable fast learning on novel tasks by learning how to learn. Yet, the application has been limited to short-horizon tasks with dense rewards. To enable learning long-horizon behaviors, recent works have explored leveraging prior experience in the form of offline datasets without reward or task annotations. While these approaches yield improved sample efficiency, millions of interactions with environments are still required to solve complex tasks. In this work, we devise a method that enables meta-learning on long-horizon, sparse-reward tasks, allowing us to solve unseen target tasks with orders of magnitude fewer environment interactions. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning. Specifically, we propose to (1) extract reusable skills and a skill prior from offline datasets, (2) meta-train a high-level policy that learns to efficiently compose learned skills into long-horizon behaviors, and (3) rapidly adapt the meta-trained policy to solve an unseen target task. Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve long-horizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.
Edge Representation Learning with HypergraphsMLAI2
Graph neural networks have recently achieved remarkable success in representing graph-structured data, with rapid progress in both the node embedding and graph pooling methods. Yet, they mostly focus on capturing information from the nodes considering their connectivity, and not much work has been done in representing the edges, which are essential components of a graph. However, for tasks such as graph reconstruction and generation, as well as graph classification tasks for which the edges are important for discrimination, accurately representing edges of a given graph is crucial to the success of the graph representation learning. To this end, we propose a novel edge representation learning framework based on Dual Hypergraph Transformation (DHT), which transforms the edges of a graph into the nodes of a hypergraph. This dual hypergraph construction allows us to apply message-passing techniques for node representations to edges. After obtaining edge representations from the hypergraphs, we then cluster or drop edges to obtain holistic graph-level edge representations. We validate our edge representation learning method with hypergraphs on diverse graph datasets for graph representation and generation performance, on which our method largely outperforms existing graph representation learning methods. Moreover, our edge representation learning and pooling method also largely outperforms state-of-theart graph pooling methods on graph classification, not only because of its accurate edge representation learning, but also due to its lossless compression of the nodes and removal of irrelevant edges for effective message-passing. Code is available at https://github.com/harryjo97/EHGNN.
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...MLAI2
1) The document presents a method called FREED that uses fragment-based molecule generation guided by reinforcement learning to discover novel drug hits.
2) FREED explicitly constrains molecule generation to pharmacologically acceptable fragments to avoid toxic structures, which is more effective than implicit constraint methods.
3) FREED's exploratory RL algorithm prioritizes experience replay to encourage visiting novel states and finding diverse optima in the constrained chemical space.
Mini-Batch Consistent Slot Set Encoder For Scalable Set EncodingMLAI2
Most existing set encoding algorithms operate under the implicit assumption that all the set elements are accessible, and that there are ample computational and memory resources to load the set into memory during training and inference. However, both assumptions fail when the set is excessively large such that it is impossible to load all set elements into memory, or when data arrives in a stream. To tackle such practical challenges in large-scale set encoding, the general set-function constraints of permutation invariance and equivariance are not sufficient. We introduce a new property termed Mini-Batch Consistency (MBC) that is required for large scale mini-batch set encoding. Additionally, we present a scalable and efficient attention-based set encoding mechanism that is amenable to mini-batch processing of sets, and capable of updating set representations as data arrives. The proposed method adheres to the required symmetries of invariance and equivariance as well as maintaining MBC for any partition of the input set. We perform extensive experiments and show that our method is computationally efficient and results in rich set encoding representations for set-structured data.
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...MLAI2
While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be either partly labeled, or completely unlabeled with labeled data being available only at the server, which leads us to a new practical federated learning problem, namely Federated Semi-Supervised Learning (FSSL). In this work, we study two essential scenarios of FSSL based on the location of the labeled data. The first scenario considers a conventional case where clients have both labeled and unlabeled data (labels-at-client), and the second scenario considers a more challenging case, where the labeled data is only available at the server (labels-at-server). We then propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch). FedMatch improves upon naive combinations of federated learning and semi-supervised learning approaches with a new inter-client consistency loss and decomposition of the parameters for disjoint learning on labeled and unlabeled data. Through extensive experimental validation of our method in the two different scenarios, we show that our method outperforms both local semi-supervised learning and baselines which naively combine federated learning with semi-supervised learning.
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
Unsupervised learning aims to learn meaningful representations from unlabeled data which can captures its intrinsic structure, that can be transferred to downstream tasks. Meta-learning, whose objective is to learn to generalize across tasks such that the learned model can rapidly adapt to a novel task, shares the spirit of unsupervised learning in that the both seek to learn more effective and efficient learning procedure than learning from scratch. The fundamental difference of the two is that the most meta-learning approaches are supervised, assuming full access to the labels. However, acquiring labeled dataset for meta-training not only is costly as it requires human efforts in labeling but also limits its applications to pre-defined task distributions. In this paper, we propose a principled unsupervised meta-learning model, namely Meta-GMVAE, based on Variational Autoencoder (VAE) and set-level variational inference. Moreover, we introduce a mixture of Gaussian (GMM) prior, assuming that each modality represents each class-concept in a randomly sampled episode, which we optimize with Expectation-Maximization (EM). Then, the learned model can be used for downstream few-shot classification tasks, where we obtain task-specific parameters by performing semi-supervised EM on the latent representations of the support and query set, and predict labels of the query set by computing aggregated posteriors. We validate our model on Omniglot and Mini-ImageNet datasets by evaluating its performance on downstream few-shot classification tasks. The results show that our model obtain impressive performance gains over existing unsupervised meta-learning baselines, even outperforming supervised MAML on a certain setting.
Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2
Graph neural networks have been widely used on modeling graph data, achieving impressive results on node classification and link prediction tasks. Yet, obtaining an accurate representation for a graph further requires a pooling function that maps a set of node representations into a compact form. A simple sum or average over all node representations considers all node features equally without consideration of their task relevance, and any structural dependencies among them. Recently proposed hierarchical graph pooling methods, on the other hand, may yield the same representation for two different graphs that are distinguished by the Weisfeiler-Lehman test, as they suboptimally preserve information from the node features. To tackle these limitations of existing graph pooling methods, we first formulate the graph pooling problem as a multiset encoding problem with auxiliary information about the graph structure, and propose a Graph Multiset Transformer (GMT) which is a multi-head attention based global pooling layer that captures the interaction between nodes according to their structural dependencies. We show that GMT satisfies both injectiveness and permutation invariance, such that it is at most as powerful as the Weisfeiler-Lehman graph isomorphism test. Moreover, our methods can be easily extended to the previous node clustering approaches for hierarchical graph pooling. Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks with high memory and time efficiency, and obtains even larger performance gain on graph reconstruction and generation tasks.
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...MLAI2
Recently, sequence-to-sequence (seq2seq) models with the Transformer architecture have achieved remarkable performance on various conditional text generation tasks, such as machine translation. However, most of them are trained with teacher forcing with the ground truth label given at each time step, without being exposed to incorrectly generated tokens during training, which hurts its generalization to unseen inputs, that is known as the “exposure bias” problem. In this work, we propose to mitigate the conditional text generation problem by contrasting positive pairs with negative pairs, such that the model is exposed to various valid or incorrect perturbations of the inputs, for improved generalization. However, training the model with naïve contrastive learning framework using random non-target sequences as negative examples is suboptimal, since they are easily distinguishable from the correct output, especially so with models pretrained with large text corpora. Also, generating positive examples requires domain-specific augmentation heuristics which may not generalize over diverse domains. To tackle this problem, we propose a principled method to generate positive and negative samples for contrastive learning of seq2seq models. Specifically, we generate negative examples by adding small perturbations to the input sequence to minimize its conditional likelihood, and positive examples by adding large perturbations while enforcing it to have a high conditional likelihood. Such “hard” positive and negative pairs generated using our method guides the model to better distinguish correct outputs from incorrect ones. We empirically show that our proposed method significantly improves the generalization of the seq2seq on three text generation tasks — machine translation, text summarization, and question generation.
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...MLAI2
Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since it could be a result of overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms, without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model. Our final code is available at this https://github.com/anhtuan5696/TPAMTL.
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
MetaPerturb is a meta-learned perturbation function that can enhance generalization of neural networks on different tasks and architectures. It proposes a novel meta-learning framework involving jointly training a main model and perturbation module on multiple source tasks to learn a transferable perturbation function. This meta-learned perturbation function can then be transferred to improve performance of a target model on an unseen target task or architecture, outperforming baselines on various datasets and architectures.
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels. However, do we really need class labels at all, for adversarially robust training of deep neural networks? In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. Further, we present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data, which aims to maximize the similarity between a random augmentation of a data sample and its instance-wise adversarial perturbation. We validate our method, Robust Contrastive Learning (RoCL), on multiple benchmark datasets, on which it obtains comparable robust accuracy over state-of-the-art supervised adversarial learning methods, and significantly improved robustness against the black box and unseen types of attacks. Moreover, with further joint fine-tuning with supervised adversarial loss, RoCL obtains even higher robust accuracy over using self-supervised learning alone. Notably, RoCL also demonstrate impressive results in robust transfer learning.
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2
Many practical graph problems, such as knowledge graph construction and drug-drug interaction prediction, require to handle multi-relational graphs. However, handling real-world multi-relational graphs with Graph Neural Networks (GNNs) is often challenging due to their evolving nature, as new entities (nodes) can emerge over time. Moreover, newly emerged entities often have few links, which makes the learning even more difficult. Motivated by this challenge, we introduce a realistic problem of few-shot out-of-graph link prediction, where we not only predict the links between the seen and unseen nodes as in a conventional out-of-knowledge link prediction task but also between the unseen nodes, with only few edges per node. We tackle this problem with a novel transductive meta-learning framework which we refer to as Graph Extrapolation Networks (GEN). GEN meta-learns both the node embedding network for inductive inference (seen-to-unseen) and the link prediction network for transductive inference (unseen-to-unseen). For transductive link prediction, we further propose a stochastic embedding layer to model uncertainty in the link prediction between unseen entities. We validate our model on multiple benchmark datasets for knowledge graph completion and drug-drug interaction prediction. The results show that our model significantly outperforms relevant baselines for out-of-graph link prediction tasks.
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...MLAI2
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training, such that we can effectively adapt the language model to a particular target task (e.g. question answering). Specifically, we present a novel reinforcement learning-based framework which learns the masking policy, such that using the generated masks for further pre-training of the target language model helps improve task performance on unseen texts. We use off-policy actor-critic with entropy regularization and experience replay for reinforcement learning, and propose a Transformer-based policy network that can consider the relative importance of words in a given text. We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets using BERT and DistilBERT as the language models, on which it outperforms rule-based masking strategies, by automatically learning optimal adaptive maskings.
Adversarial Neural Pruning with Latent Vulnerability SuppressionMLAI2
Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppress them effectively. Explicitly, we define vulnerability for each latent feature and then propose a new loss for adversarial learning, Vulnerability Suppression (VS) loss, that aims to minimize the feature-level vulnerability during training. We further propose a Bayesian framework to prune features with high vulnerability to reduce both vulnerability and loss on adversarial samples. We validate our Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS) method on multiple benchmark datasets, on which it not only obtains state-of-the-art adversarial robustness but also improves the performance on clean examples, using only a fraction of the parameters used by the full network. Further qualitative analysis suggests that the improvements come from the suppression of feature-level vulnerability.
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...MLAI2
One of the most crucial challenges in question answering (QA) is the scarcity of labeled data, since it is costly to obtain question-answer (QA) pairs for a target text domain with human annotation. An alternative approach to tackle the problem is to use automatically generated QA pairs from either the problem context or from large amount of unstructured texts (e.g. Wikipedia). In this work, we propose a hierarchical conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts, while maximizing the mutual information between generated QA pairs to ensure their consistency. We validate our Information Maximizing Hierarchical Conditional Variational AutoEncoder (Info-HCVAE) on several benchmark datasets by evaluating the performance of the QA model (BERT-base) using only the generated QA pairs (QA-based evaluation) or by using both the generated and human-labeled pairs (semi-supervised learning) for training, against state-of-the-art baseline models. The results show that our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Cost-effective Interactive Attention Learning with Neural Attention Process
1. Jay Heo1, Junhyeon Park1, Hyewon Jeong1, Kwang joon Kim2,
Juho Lee3, Eunho Yang1 3, Sung Ju Hwang1 3
Cost-Effective Interactive Attention Learning
with Neural Attention Processes
KAIST1, Yonsei University College of Medicine2, AITRICS3
2. Model Interpretability
Main Network InferenceInput Data
Training
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
3. Model Interpretability
Main Network InferenceInput Data Model Interpretation
Interpretation tool
Inference
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
4. Model Interpretability
The complex nature of deep neural networks has led to the recent surge of interests
in interpretable models which provide model interpretations.
Main Network InferenceInput Data Model Interpretation
Interpretation tool
Inference
Provide explanations for model’s decision.
5. Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
Quality of training data
Quantity of training data
6. Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
Quality of training data
Quantity of training data
Whether a model learn
too many non-robust
features during training?
7. Challenge: Incorrect & Unreliable Interpretation
Not all machine-generated interpretations are correct or human-understandable.
• Correctness and reliability of a learning model heavily depends on quality and
quantity of training data.
• Neural networks tend to learn non-robust features that help with predictions, but
are not human-perceptible.
Model Interpretation
1. Is it correct?
2. Is it understandable
enough to trust?
Quality of training data
Quantity of training data
Whether a model learn
too many non-robust
features during training?
8. Interactive Learning Framework
Propose an interactive learning framework which iteratively update the model by
interacting with the human supervisors who adjust the provided interpretations.
• Actively use human supervisors as a channel for human-model communications.
Attentional Networks Human Annotator
Learning
Model
Physician
Interpretation
(Attentions)
Model’s decision
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
9. Interactive Learning Framework
Propose an interactive learning framework which iteratively update the model by
interacting with the human supervisors who adjust the provided interpretations.
• Actively use human supervisors as a channel for human-model communications.
Attentional Networks Human Annotator
Learning
Model
Physician
Interpretation
(Attentions)
Model’s decision
Annotation
AnnotateRetrain
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
10. Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
11. Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
12. Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
• Retraining the model with scarce human feedback may result in the model overfitting
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
Retraining
13. Challenge: Model Retraining Cost
To reflect human feedback, the model needs to be retrained, which is costly.
Physician
Learning Model
Annotated Examples
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
: Attention
Interpretation
Scarce feedback
Retraining
!
Overfitting
• Retraining the model with scarce human feedback may result in the model overfitting
14. Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈{0, 1}
Big data
15. Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
Big data
Annotate
16. Challenge: Expensive Human Supervision Cost
Obtaining human feedback on datasets with large numbers of training instances and
features is extremely costly.
• Obtaining feedback on already correct or previously corrected interpretations is
wasteful.
Annotator
Annotate
Big data
Costly
Annotate
!
17. Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
Counterfactual
Estimation
a
c
b
Granger Causality
18. Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
1. Neural Attention
Processes
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
Counterfactual
Estimation
a
c
b
Granger Causality
19. Interactive Attention Learning Framework
Domain experts interactively evaluate learned attentions and provide feedbacks to
obtain models that generate human-intuitive interpretations.
Learning
Model
Physician
Attention
Mechanism
LDL
Respiration
Cholesterol
Creatine
BMI
Diabetes
Heart
Failure
Hypertension
Deliver
attention
Annotate attention
Attention0.80.60.3
: Attention
𝑴 𝒂𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏
(𝒕)
∈ {0, 1}
Influence
Function
MC Dropout
Annotation Mask
Deep
Interpretation
Tool
Correlation & Causal relationship Analysis
Manipulate
Explain
Neural
Attention
Processes
Counterfactual
Estimation
1. Neural Attention
Processes
2. Cost-effective Reranking
Counterfactual
Estimation
a
c
b
Granger Causality
20. Neural Attention Processes (NAP)
NAP naturally reflects the information from the annotation summarization z via amor
tized inference.
Domain
Expert
Annotation
Neural Attention Processes
• NAP learns to summarize delivered annotations to a latent vector, and gives th
e summarization as an additional input to the attention generating network.
21. Neural Attention Processes (NAP)
NAP naturally reflects the information from the annotation summarization z via amor
tized inference.
Domain
Expert
Annotation
Neural Attention Processes
• NAP learns to summarize delivered annotations to a latent vector, and gives th
e summarization as an additional input to the attention generating network.
22. Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
23. Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
24. Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
25. Neural Attention Processes (NAP)
!
Attention
Context Points
"!
!! "! #!
!!
New Observations
!!"# "!"# !!"#
NAP minimizes retraining cost by incorporating new labeled instances without retrai
ning and overfitting.
First Round (s=1)
• NAP doesn’t require retraining for further new observations, in that NAP
automatically adapt to them at the cost of a forward pass through a network 𝑔!.
26. Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
27. Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
28. Neural Attention Processes (NAP)
NAP minimizes retraining cost by incorporating new labeled instances without retrain
ing and overfitting.
• NAP is trained in a meta-learning fashion for few-shot function estimation, where it
is trained to predict the attention mask of other labeled samples, given a randomly
selected labeled set as context.
!
Attention
"! !!
New Observations
!!"# "!"# !!"#
Context Points
!!
"! #!!!"$ "!"$ !!"$
Further Rounds (s=2,3,4)
29. Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
30. Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
𝑷
𝑲
…
…
…TrainTrainValid Valid
Estimate
𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!)
Instance-level Reranking
Select Re-rank &
select
31. Cost-Effective Instance & Features Reranking (CER)
Address the expensive human labeling cost by reranking the instances, features, and
timesteps (for time-series data) by their negative impacts.
Attentional
Network
Domain
Expert
Instance-wise and Feature-wise Reranking
𝑷
𝑲
…
…
…TrainTrainValid Valid
Estimate
𝑰(𝒖!) / 𝐕𝐚𝐫(𝒖!)
Instance-level Reranking
Train
Feature-level Reranking
Select Re-rank &
select
Estimate
𝑰(𝒖",$
(&)
) / 𝐕𝐚𝒓 (𝒖",$
(&)
) / 𝜓(𝒖",$
(&)
)
Feature
…
…
Feature
𝑭
Re-rank &
select
32. CER: 1. Influence Score
Use the influence function (Koh & Liang, 2017) to approximate the impact of
individual training points on the model prediction.
Fish
Dog
Dog
“Dog”
Training
Training data Test Input
[Koh and Liang 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
33. CER: 2. Uncertainty Score
Measure the negative impacts using the uncertainty which can be measured by Mon
te-Carlo sampling (Gal & Ghahramani, 2016).
Uncertainty-aware Attention Mechanism
• Less expensive approach to measure the negative impacts.
• Assume that instances having high-predictive uncertainties are potential
candidate to be corrected.
Uncertainty
[Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
34. CER: 2. Uncertainty Score
Measure the negative impacts using the uncertainty which can be measured by Mon
te-Carlo sampling (Gal & Ghahramani, 2016).
Measure instance-wise &
feature-wise uncertainty
0.80.60.3
Low
uncertainty
High
uncertainty
High
uncertainty
µ σ, )(N
SpO2
Pulse
Respiration
• Less expensive approach to measure the negative impacts.
• Assume that instances having high-predictive uncertainties are potential
candidate to be corrected.
Uncertainty-aware Attention Mechanism
[Jay Heo*, Haebeom Lee*, Saehun Kim,, Juho Lee, Gwangjun Kim , Eunho Yang, Sung Joo Hwang] Uncertainty-aware Attention Mechanism for Reliable prediction and Interpretation, Neurips 2018.
35. CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
36. CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
37. CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
38. CER: 3. Counterfactual Score
How would the prediction change if we ignore a certain feature by manually turning
on/off the corresponding attention value?
• Not need for retraining since we can simply set its attention value to zero.
• Used to rerank the features with regards to their importance.
Counterfactual Estimation Interface
39. Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
Binary Classification task
40. Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
2. Real-estate dataset
1) Housing price forecasting
Binary Classification task Regression task
41. Experimental setting – Datasets
Use electronic health records, real estate sales transaction records, and exercise squat
posture correction records for classification and regression tasks.
1. EHR Datasets
1) Cerebral Infarction
2) Cardio Vascular Disease
3) Heart Failure
2. Real-estate dataset 3. Squat Posture set
1) Housing price forecasting
1) Squat posture correction
Binary Classification task Regression task Multi-label
Classification task
47. Experiment Results
Conducted experiments on three risk prediction tasks, one fitness squat, and one real
estate forecasting task.
EHR Fitness
Squat
Real Estate
ForecastingHeart Failure Cerebral Infarction CVD
One-time
Training
RETAIN 0.6069 ± 0.01 0.6394 ± 0.02 0.6018 ± 0.02 0.8425 ± 0.03 0.2136 ± 0.01
Random-RETAIN 0.5952 ± 0.02 0.6256 ± 0.02 0.5885 ± 0.01 0.8221 ± 0.05 0.2140 ± 0.01
IF-RETAIN 0.6134 ± 0.03 0.6422 ± 0.02 0.5882 ± 0.02 0.8363 ± 0.03 0.2049 ± 0.01
Random
Re-ranking
Random-UA 0.6231 ± 0.03 0.6491 ± 0.01 0.6112 ± 0.02 0.8521 ± 0.02 0.2222 ± 0.02
Random-NAP 0.6414 ± 0.01 0.6674 ± 0.02 0.6284 ± 0.01 0.8525 ± 0.01 0.2061 ± 0.01
IAL
(Cost-effective)
AILA 0.6363 ± 0.03 0.6602 ± 0.03 0.6193 ± 0.02 0.8425 ± 0.01 0.2119 ± 0.01
IAL-NAP 0.6612 ± 0.02 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.01 0.1835 ± 0.01
• Random-UA, which is retrained with human attention-level supervision on randomly selected.
samples, performs worse than Random-NAP.
• IAL-NAP significantly outperforms Random-NAP, showing that the effect of attention annotation
cannot have much effect on the model when the instances are randomly selected.
48. Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly
Ablation study with IAL-NAP combinations
49. Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual
score was the most effective for feature-wise reranking.
Ablation study with IAL-NAP combinations
50. Experiment Results
Results of Ablation study with proposed IAL-NAP combinations for instance- and feat
ure-level reranking on all tasks.
IAL-NAP Variants EHR Fitness
Squat
Real Estate
ForecastingInstance-level Feature-level Heart Failure Cerebral Infarction CVD
Influence Function Uncertainty 0.6563 ± 0.01 0.6821 ± 0.02 0.6308 ± 0.02 0.8712 ± 0.01 0.1921 ± 0.01
Influence Function Influence Function 0.6514 ± 0.02 0.6825 ± 0.01 0.6329 ± 0.03 0.8632 ± 0.01 0.1865 ± 0.02
Influence Function Counterfactual 0.6592 ± 0.02 0.6921 ± 0.03 0.6379 ± 0.02 0.8682 ± 0.01 0.1863 ± 0.02
Uncertainty Counterfactual 0.6612 ± 0.01 0.6892 ± 0.03 0.6371 ± 0.02 0.8689 ± 0.02 0.1835 ± 0.02
• For instance-level scoring, influence and uncertainty scores work similarly, while the counterfactual
score was the most effective for feature-wise reranking.
• The combination of uncertainty-counterfactual is the most cost-effective solution since it avoids ex
pensive computation of the Hessians.
Ablation study with IAL-NAP combinations
51. Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
52. Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
53. Effect of Neural Attention Processes
Retraining time to retrain examples of human annotation and Mean Response Time
of human annotations on the risk prediction tasks.
(a) Heart Failure (b) Cerebral Infarction (c) CVD
54. Effect of Cost-Effective Reranking
Change of accuracy with 100 annotations across four rounds (S) between IAL-NAP (bl
ue) vs Random-NAP (red).
(b) Cerebral Infarction(a) Heart Failure (c) CVD (d) Squat
• IAL-NAP uses a smaller number of annotated examples (100 examples) than Rando
m-NAP (400 examples) to improve the model with the comparable accuracy (auc: 0.
6414).
55. Qualitative Analysis – Risk Prediction
Further analyze the contribution of each feature for a CVD patient (label=1).
A patient records
for Cardio Vascular
Disease
• At s=3, IAL allocated more attention weights on the important feature (Smoking), which the
initial training model missed to attend.
• Age
• Smoking : Whether smoke or
not
• SysBP : Systolic Blood Pressure
• HDL : High-density Lipoprotein
• LDL : Low-density Lipoprotein
(a) Pretrained (b) s=1 (c) s=2
56. Qualitative Analysis – Risk Prediction
Further analyze the contribution of each feature for a CVD patient (label=1).
A patient records
for Cardio Vascular
Disease
• At s=3, IAL allocated more attention weights on the important feature (Smoking), which the
initial training model missed to attend.
à Clinicians guided the model to learn it since smoking is a key factor to access CVD.
• Age
• Smoking : Whether smoke or
not
• SysBP : Systolic Blood Pressure
• HDL : High-density Lipoprotein
• LDL : Low-density Lipoprotein
(a) Pretrained (b) s=1 (c) s=2
57. Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
58. Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
• Neural Attention Processes is a novel attention mechanism which can generate attention on unla
beled instances given few labeled samples, and can incorporate new labeled instances without
retraining and overfitting.
59. Summary
Propose a novel interactive learning framework which iteratively updates the model
by interacting with the human supervisor via the generated attentions.
• Unlike conventional active learning, IAL allows the human annotators to “actively” interpret,
manipulate the model’s behavior, and see its effect. IAL allows for online learning without retrai
ning the main network, by training a “separate” attention generator.
• Neural Attention Processes is a novel attention mechanism which can generate attention on unla
beled instances given few labeled samples, and can incorporate new labeled instances without
retraining and overfitting.
• Our reranking strategy re-ranks the instance and features, which substantially reduces the annot
ation cost and time for high-dimensional inputs.