[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
This document proposes two new algorithms, L-SHAPLEY and C-SHAPLEY, for interpreting black-box machine learning models in an instance-wise and model-agnostic manner. L-SHAPLEY and C-SHAPLEY are approximations of the SHAPLEY value that take graph structure between features into account to improve computational efficiency. The algorithms were evaluated on text and image classification tasks and were shown to outperform baselines like KERNELSHAP and LIME, providing more accurate feature importance scores according to both automatic metrics and human evaluation.
[Paper Reading] Attention is All You NeedDaiki Tanaka
The document summarizes the "Attention Is All You Need" paper, which introduced the Transformer model for natural language processing. The Transformer uses attention mechanisms rather than recurrent or convolutional layers, allowing for more parallelization. It achieved state-of-the-art results in machine translation tasks using techniques like multi-head attention, positional encoding, and beam search decoding. The paper demonstrated the Transformer's ability to draw global dependencies between input and output with constant computational complexity.
This document proposes a new semantic relatedness measure based on representing words as co-occurrence networks instead of vectors. It addresses two key issues: 1) defining network operations to represent phrases and 2) measuring similarity between networks using a graph kernel. The approach is evaluated on tasks like synonym finding, word sense disambiguation, and translation disambiguation, showing improved performance over vector-based baselines.
Recently, WaveNet, which predicts the probability distribution of speech sample auto-regressively, provides a new paradigm in speech synthesis tasks.
Since the usage of WaveNet for speech synthesis varies by conditional vectors, it is very important to effectively design a baseline system structure.
In this talk, I would like to first introduce various types of WaveNet vocoders such as conventional speech-domain approach and recently proposed source-filter theory-based approach.
Then, I will explain a linear prediction (LP)-based WaveNet speech synthesis, i.e., LP-WaveNet, which overcomes the limitations of source-filter theory-based WaveNet vocoders caused by the mismatch between speech excitation signal and vocal tract filter.
While presenting experimental setups and results, I also would like to share some know-hows to successfully training the network.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
This document proposes two new algorithms, L-SHAPLEY and C-SHAPLEY, for interpreting black-box machine learning models in an instance-wise and model-agnostic manner. L-SHAPLEY and C-SHAPLEY are approximations of the SHAPLEY value that take graph structure between features into account to improve computational efficiency. The algorithms were evaluated on text and image classification tasks and were shown to outperform baselines like KERNELSHAP and LIME, providing more accurate feature importance scores according to both automatic metrics and human evaluation.
[Paper Reading] Attention is All You NeedDaiki Tanaka
The document summarizes the "Attention Is All You Need" paper, which introduced the Transformer model for natural language processing. The Transformer uses attention mechanisms rather than recurrent or convolutional layers, allowing for more parallelization. It achieved state-of-the-art results in machine translation tasks using techniques like multi-head attention, positional encoding, and beam search decoding. The paper demonstrated the Transformer's ability to draw global dependencies between input and output with constant computational complexity.
This document proposes a new semantic relatedness measure based on representing words as co-occurrence networks instead of vectors. It addresses two key issues: 1) defining network operations to represent phrases and 2) measuring similarity between networks using a graph kernel. The approach is evaluated on tasks like synonym finding, word sense disambiguation, and translation disambiguation, showing improved performance over vector-based baselines.
Recently, WaveNet, which predicts the probability distribution of speech sample auto-regressively, provides a new paradigm in speech synthesis tasks.
Since the usage of WaveNet for speech synthesis varies by conditional vectors, it is very important to effectively design a baseline system structure.
In this talk, I would like to first introduce various types of WaveNet vocoders such as conventional speech-domain approach and recently proposed source-filter theory-based approach.
Then, I will explain a linear prediction (LP)-based WaveNet speech synthesis, i.e., LP-WaveNet, which overcomes the limitations of source-filter theory-based WaveNet vocoders caused by the mismatch between speech excitation signal and vocal tract filter.
While presenting experimental setups and results, I also would like to share some know-hows to successfully training the network.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
This document proposes a method called Factor Transfer for compressing complex networks via knowledge transfer from a teacher network to a student network. It introduces paraphrasing and translating modules to extract factors from the teacher and student networks and minimize their difference, unlike existing methods that directly compare outputs. Experiments on image classification datasets CIFAR-10, CIFAR-100 and ImageNet, as well as object detection, show the proposed method helps increase student network accuracy compared to directly transferring knowledge or attention from the teacher.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
Revised presentation slide for PFN Seminar, 2017/3/9.
Learning Communication with Neural Networks.
Presentation video: https://www.youtube.com/watch?v=ZrLiNAMHszo
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Neural network basic and introduction of Deep learningTapas Majumdar
Deep learning tools and techniques can be used to build convolutional neural networks (CNNs). Neural networks learn from observational training data by automatically inferring rules to solve problems. Neural networks use multiple hidden layers of artificial neurons to process input data and produce output. Techniques like backpropagation, cross-entropy cost functions, softmax activations, and regularization help neural networks learn more effectively and avoid issues like overfitting.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
M. Qi, W. Li, Z. Yang, Y. Wang, and J. Luo.: Attentive relational networks for mapping images to scene graphs. In The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
Deep generative model 중 하나인 VAE의 Framework은 컴퓨터 비전, 자연어 처리 등 머신러닝의 전반에서 generative model의 변화를 가져왔다.
VAE를 처음 접하는 연구자들을 위해 대부분의 VAE tutorial은 구현을 목적으로 Neural Network구조와 Loss function에 초점을 맞추고 있다. 본 세미나는 Variational Inference 관점에서 Auto-encoding variational bayes에 나오는 수식들을 살펴보고자 한다. 본 수식들이 구현에서는 어떻게 적용되는지도 살펴보고자 한다.
Deep learning techniques are increasingly being used for recommender systems. Neural network models such as word2vec, doc2vec and prod2vec learn embedding representations of items from user interaction data that capture their relationships. These embeddings can then be used to make recommendations by finding similar items. Deep collaborative filtering models apply neural networks to matrix factorization techniques to learn joint representations of users and items from rating data.
The document discusses various deep learning techniques for recommendation systems, including representation learning and neural networks. It describes using embeddings to represent users, items, reviews and other data, as well as neural networks like multilayer perceptrons, convolutional neural networks and recurrent neural networks to model sequential data and generate recommendations. Architectures like joint models that combine user and item representations are also summarized.
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
The paper introduces a dual learning algorithm that utilizes monolingual data to improve neural machine translation. The algorithm trains two translation models in both directions simultaneously. Experimental results show that when trained with only 10% of parallel data, the dual learning model achieves comparable results to baseline models trained on 100% of data. The dual learning mechanism also outperforms baselines when trained on full data and can help address the lack of large parallel corpora.
Overview of TensorFlow For Natural Language Processingananth
TensorFlow open sourced recently by Google is one of the key frameworks that support development of deep learning architectures. In this slideset, part 1, we get started with a few basic primitives of TensorFlow. We will also discuss when and when not to use TensorFlow.
Recursive neural networks (RNNs) were developed to model recursive structures like images, sentences, and phrases. RNNs construct feature representations recursively from components. Later models like recursive autoencoders (RAEs), matrix-vector RNNs (MV-RNNs), and recursive neural tensor networks (RNTNs) improved on RNNs by handling unlabeled data, incorporating different composition rules, and reducing parameters. These recursive models achieved strong performance on tasks like image segmentation, sentiment analysis, and paraphrase detection.
Selective encoding for abstractive sentence summarizationKodaira Tomonori
This document describes a selective encoding model for abstractive sentence summarization. The model uses a selective gate to filter unimportant information from the encoder states before decoding. It achieves state-of-the-art results on several datasets, outperforming sequence-to-sequence and attention-based models. The model consists of an encoder, selective gate, and decoder. It is trained end-to-end to maximize the likelihood of generating reference summaries.
Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Taskmultimediaeval
In this paper we describe TUM's approach for the MediaEval's Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two different sets of acoustic and psychoacoustic features that have been previously proven as effective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coeficient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_7.pdf
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
In the task of summarization of a scientific paper, a lot of information stands to be gained about a reference paper, from the papers that cite it. Automatically generating the reference scope (the span of cited text) in a reference paper, corresponding to citances (sentences in the citing papers that cite it) has great significance in preparing a structured summary of the reference paper. We treat this task as a binary classification problem, by extracting feature vectors from pairs of citances and reference sentences. These features are lexical, corpus-based, surface and knowledge-based. We extend the current feature set employed for reference-citance pair identification in the current state-of-the-art system. Using these features, we present a novel classification approach for this task, that employs a deep Convolutional Neural Network along with two boosting ensemble algorithms. We outperform the existing state-of-the- art for distinguishing between cited spans and non-cited spans of text in the reference paper.
The document discusses relational knowledge distillation (RKD), a technique for transferring knowledge from a teacher model to a student model. It begins by providing background on knowledge distillation and recent approaches. It then introduces RKD, which transfers relational information between examples in the teacher's embedding space, such as distances and angles, rather than just individual example outputs. The document describes experiments applying RKD to metric learning, image classification, and few-shot learning, finding it improves student model performance over other distillation methods. It concludes RKD effectively leverages relational information to transfer knowledge between models.
Revised presentation slide for PFN Seminar, 2017/3/9.
Learning Communication with Neural Networks.
Presentation video: https://www.youtube.com/watch?v=ZrLiNAMHszo
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
Neural network basic and introduction of Deep learningTapas Majumdar
Deep learning tools and techniques can be used to build convolutional neural networks (CNNs). Neural networks learn from observational training data by automatically inferring rules to solve problems. Neural networks use multiple hidden layers of artificial neurons to process input data and produce output. Techniques like backpropagation, cross-entropy cost functions, softmax activations, and regularization help neural networks learn more effectively and avoid issues like overfitting.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
M. Qi, W. Li, Z. Yang, Y. Wang, and J. Luo.: Attentive relational networks for mapping images to scene graphs. In The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
Deep generative model 중 하나인 VAE의 Framework은 컴퓨터 비전, 자연어 처리 등 머신러닝의 전반에서 generative model의 변화를 가져왔다.
VAE를 처음 접하는 연구자들을 위해 대부분의 VAE tutorial은 구현을 목적으로 Neural Network구조와 Loss function에 초점을 맞추고 있다. 본 세미나는 Variational Inference 관점에서 Auto-encoding variational bayes에 나오는 수식들을 살펴보고자 한다. 본 수식들이 구현에서는 어떻게 적용되는지도 살펴보고자 한다.
Deep learning techniques are increasingly being used for recommender systems. Neural network models such as word2vec, doc2vec and prod2vec learn embedding representations of items from user interaction data that capture their relationships. These embeddings can then be used to make recommendations by finding similar items. Deep collaborative filtering models apply neural networks to matrix factorization techniques to learn joint representations of users and items from rating data.
The document discusses various deep learning techniques for recommendation systems, including representation learning and neural networks. It describes using embeddings to represent users, items, reviews and other data, as well as neural networks like multilayer perceptrons, convolutional neural networks and recurrent neural networks to model sequential data and generate recommendations. Architectures like joint models that combine user and item representations are also summarized.
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
The paper introduces a dual learning algorithm that utilizes monolingual data to improve neural machine translation. The algorithm trains two translation models in both directions simultaneously. Experimental results show that when trained with only 10% of parallel data, the dual learning model achieves comparable results to baseline models trained on 100% of data. The dual learning mechanism also outperforms baselines when trained on full data and can help address the lack of large parallel corpora.
Overview of TensorFlow For Natural Language Processingananth
TensorFlow open sourced recently by Google is one of the key frameworks that support development of deep learning architectures. In this slideset, part 1, we get started with a few basic primitives of TensorFlow. We will also discuss when and when not to use TensorFlow.
Recursive neural networks (RNNs) were developed to model recursive structures like images, sentences, and phrases. RNNs construct feature representations recursively from components. Later models like recursive autoencoders (RAEs), matrix-vector RNNs (MV-RNNs), and recursive neural tensor networks (RNTNs) improved on RNNs by handling unlabeled data, incorporating different composition rules, and reducing parameters. These recursive models achieved strong performance on tasks like image segmentation, sentiment analysis, and paraphrase detection.
Selective encoding for abstractive sentence summarizationKodaira Tomonori
This document describes a selective encoding model for abstractive sentence summarization. The model uses a selective gate to filter unimportant information from the encoder states before decoding. It achieves state-of-the-art results on several datasets, outperforming sequence-to-sequence and attention-based models. The model consists of an encoder, selective gate, and decoder. It is trained end-to-end to maximize the likelihood of generating reference summaries.
Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Taskmultimediaeval
In this paper we describe TUM's approach for the MediaEval's Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two different sets of acoustic and psychoacoustic features that have been previously proven as effective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coeficient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_7.pdf
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
In the task of summarization of a scientific paper, a lot of information stands to be gained about a reference paper, from the papers that cite it. Automatically generating the reference scope (the span of cited text) in a reference paper, corresponding to citances (sentences in the citing papers that cite it) has great significance in preparing a structured summary of the reference paper. We treat this task as a binary classification problem, by extracting feature vectors from pairs of citances and reference sentences. These features are lexical, corpus-based, surface and knowledge-based. We extend the current feature set employed for reference-citance pair identification in the current state-of-the-art system. Using these features, we present a novel classification approach for this task, that employs a deep Convolutional Neural Network along with two boosting ensemble algorithms. We outperform the existing state-of-the- art for distinguishing between cited spans and non-cited spans of text in the reference paper.
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
SummaRuNNer is a simple recurrent neural network model for extractive text summarization. It treats summarization as a sequence classification problem, making binary decisions about whether to include each sentence. The model uses a bi-directional GRU to encode sentences. It is trained end-to-end using abstractive summaries, which allows the model to be trained on data where only abstractive and not extractive summaries are available. Experimental results show SummaRuNNer performs comparably or better than state-of-the-art extractive models on several datasets.
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...ssuser4b1f48
1) The document proposes a new recommendation framework called NGCF that uses graph neural networks to explicitly encode collaborative signals from high-order user-item connections.
2) NGCF performs embedding propagation to refine user and item embeddings based on their neighbors' embeddings and connectivity.
3) Experiments on three million-size datasets show that NGCF outperforms other collaborative filtering methods and its performance improves as the number of propagation layers increases, demonstrating the importance of modeling high-order connections.
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
Presentation for "Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks" at the 7th Italian Information Retrieval Workshop.
See paper: http://ceur-ws.org/Vol-1653/paper_11.pdf
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
Slides for the presentation of the paper "Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks" at the 7th Italian Information Retrieval Workshop.
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
This document discusses the RAMSES project, which aims to develop a new science of end-to-end analytical performance modeling of science workflows in extreme-scale science environments. The RAMSES research agenda involves developing component and end-to-end models, tools to provide performance advice, data-driven estimation methods, automated experiments, and a performance database. The models will be evaluated using five challenge workflows: high-performance file transfer, diffuse scattering experimental data analysis, data-intensive distributed analytics, exascale application kernels, and in-situ analysis placement.
The document summarizes the Trajectory Transformer model, which frames reinforcement learning as a single sequence modeling problem that can be solved using a Transformer architecture. It describes how the model unifies components like the critic, actor, and dynamics model. The Trajectory Transformer directly models state, action, and reward sequences. It can be used for tasks like imitation learning, goal-reaching, and offline RL by applying techniques like beam search while conditioning on goals or rewards. Experiments show it achieves good performance on imitation learning, goal-reaching, and offline RL benchmarks.
network mining and representation learningsun peiyuan
This document discusses two papers related to network embedding and ranking over multilayer networks.
The first paper proposes metapath2vec, a network embedding technique for heterogeneous networks. It extends word2vec to learn latent representations of nodes in a heterogeneous network by considering metapath-guided random walks.
The second paper proposes CrossRank and CrossQuery algorithms for ranking and querying over a network of networks (NoN). CrossRank learns global ranking vectors for each domain network in the NoN by optimizing for within-network smoothness, query preference, and cross-network consistency. CrossQuery efficiently finds the top-k most relevant nodes in a target network for a query node in a source network. Both methods are evaluated on
This document describes research on implementing Curran's approximation algorithm for pricing Asian options using a dataflow architecture. The algorithm was implemented on a Maxeler dataflow engine (DFE) and compared to a CPU implementation. Different fixed-point precisions were tested on the DFE and 54-bit fixed-point provided the best balance of precision and resource usage. Implementing the algorithm across multiple DFEs provided speedups of 5-12x over a 48-core CPU. Further optimization of dynamic ranges allowed increasing the unrolling factor, improving performance and energy efficiency.
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
Iterative Multi-document Neural Attention for Multiple Answer Prediction is a method for conversational recommender systems that can answer questions and provide recommendations. It extends previous work to leverage evidence from multiple documents. The model iteratively performs attention over the query and documents to uncover relationships. It then uses attention weights to generate relevance scores and predict multiple answers. An evaluation on a movie dialog dataset shows it outperforms baselines at question answering and recommendation tasks. Future work includes improving evidence retrieval and incorporating user preferences into the model.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
LSH for Prediction Problem in RecommendationMaruf Aytekin
This document discusses using locality sensitive hashing (LSH) for recommendations. It summarizes user-based and item-based collaborative filtering approaches and then describes how LSH works by mapping similar users to the same "buckets". The document evaluates LSH on a movie rating dataset containing 100,000 ratings from 943 users on 1682 items. It finds that while LSH decreases prediction accuracy slightly, it significantly improves the scalability and performance of the recommendation system.
The document summarizes a student project to build a model that can efficiently represent nodes in large social networks as low-dimensional vectors. The model is based on the LINE algorithm presented in the baseline paper. The students implement both first-order and second-order proximity models in Torch, using the same node representations for both. Their model achieves F1 scores between 39-42% on the BlogCatalog dataset. The project took 5 weeks and challenges included understanding the baseline paper's mathematics and debugging neural networks in Lua.
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
The document summarizes a student project to build a model that can efficiently represent nodes in large social networks as low-dimensional vectors. The model is based on the LINE paper, which learns embeddings by optimizing for first-order and second-order proximity. For their project, the students implemented the LINE approach in Torch, using the same node representations for both proximities and evaluating on the BlogCatalog dataset. Their model achieved F1 scores between 39-41% for node classification.
The document describes a prototype that retrieves related scientific publications from different linked datasets through thesaurus alignment. It introduces several linked datasets, including Agrovoc, OpenAgris, STW and EconStor. The prototype matches concepts from a user query to concepts in the linked datasets' thesauri to identify related publications. Pseudocode is provided to illustrate the process of concept mapping and querying multiple datasets. The goal is to retrieve relevant publications from different sources through a single interface.
Poster: Controlled and Balanced Dataset for Japanese Lexical SimplificationKodaira Tomonori
This document presents a new controlled and balanced dataset for Japanese lexical simplification. The dataset contains 2,100 sentences each with a single difficult Japanese word. Five annotators provided substitution options for each complex word and ranked them in order of simplification. This dataset is the first for Japanese lexical simplification to only allow one complex word per sentence and include particles, resulting in higher correlation with human judgment than prior datasets. It will enable better machine learning methods for Japanese lexical simplification.
Noise or additional information? Leveraging crowdsource annotation item agree...Kodaira Tomonori
EMNLP2015論文読み会
小平知範
Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks.
Emily K. Jamison and Iryna Gurevych
論文紹介:
Presentation:小平
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
Aligning sentences from standard wikipedia to simple wikipediaKodaira Tomonori
Aligning Sentences from
Standard Wikipedia to
Simple Wikipedia
NAACL読み会
William Hwang; Hannaneh Hajishirzi; Mari Ostendorf; Wei Wu
University of Washington
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
3. Introduction
• Authors present an attention-based NN model for
generating abstractive summaries of opinionated text.
• Their system takes as input a set of text units, and
then outputs a one-sentence abstractive summary.
• Two type of opinionated text:
Movie reviews
Arguments on controversial topic
3
4. Introduction
• Systems
Attention-based model (Bahdanau et al. 2014)
An importance-based sampling method.
• The importance score of a text unit is estimated
from a regression model with pairwise preference-
based sampling.
4
5. Data Collection
Rotten Tomatoes (www.rottentomatoes.com)
• There are professional critics and user-generated reviews.
• For each movie has a one-sentence critic consensus.
Data
• 246,164 critics and their opinion consensus for 3,731 movies
• train: 2,458, validation: 536, test: 737 movies
Movie Reviews
5
6. Data Collection
idebate.org
• This site is wikipedia-style website for gathering pro
and con arguments on controversial issues.
• Each point contains a one-sentence central claim.
Data
• 676 debates with 2,259 claims.
• train: 450, validation: 67, test: 150 debates
Arguments on controversial topic
6
8. The Neural Network-Based Abstract
Generation Model
• summary y (composed by the sequence of words
y1 , …, |y|.
• input consists of an arbitrary number of reviews or
arguments -> text units
x = {x1, … , xM}
• Each text unit xk is composed by a sequence of words
xk
1 , …., xk
|x
k
|.
Problem Formulation
8
9. The Neural Network-Based Abstract
Generation Model
• a sequence of word-level predictions:
log P(y|x) = ∑j=1 log P(yj| y1, ….yj-1, x)
P(yi | y1 , …, yj-1, x) = sofmax(hj)
• hj is RNNs state variable.
hj = g(yj-1, hj-1, s)
• g is LSTM network (Hochreiter and Schmidhuber, 1997)
Decoder
9
10. The Neural Network-Based Abstract
Generation Model
• LSTM
• The model concatenates the representation of previous
output word yj-1 and the input representation s as uj
Decoder
10
11. The Neural Network-Based Abstract
Generation Model
• The representation of input text units s is computed
using an attention model (Bahdanau et al., 2014)
-> ∑i=1aibi
• Authors construct bi by building a bidirectional LSTM.
They use the LSTM formulation by setting uj = xj.
• ai = softmax(v(bi, hj-1))
v(bi, hj-1) = Ws•tanh(Wcgbi + Whghj-1)
Encoder
11
12. The Neural Network-Based Abstract
Generation Model
Their input consists multiple separate text units.
• one sequence z =
There two problem:
• The model is sensitive to the order of text units
• z may contain thousands of words.
Attention Over Multiple Inputs
12
13. The Neural Network-Based Abstract
Generation Model
Sub-sampling from the input
• They define an importance score f (xk) ∈ [0, 1] for
each document xk.
• K candidates are sampled
Attention Over Multiple Inputs
13
14. The Neural Network-Based Abstract
Generation Model
a ridge regression model and a regularizer.
• Learning f(xk) = rk•w
by minimizing ||Rw - L ||2
2 + λ•||R’w-L’||2
2 + β•||w||2
2.
• text unit xk is represented as an d-dimentional
feature vector rk ∈ Rd.
Importance Estimation
14
15. The Neural Network-Based Abstract
Generation Model
• For testing phase, they re-rank the n-best summaries
according to their cosine similarity with the input text units.
• The one with the highest similarity is included in the
final summary.
Post-processing
15
16. Experimental Setup
• Data Preprocessing
Stanford CoreNLP (Manning et al., 2014)
• Pre trained Embeddings and Features
word embedding: 300 dimension
They extend their model with additional features.
16
17. • Hyper parameters
The LSTMs are defined with states and cells of 150
dimensions.
The attention: 100 dimensions.
Training is performed via Adagrad (Duchi et al. 2011)
• Evaluation : BLEU
• The importance-based sampling rate K is set of 5
• Decoding: beam serch -> 20
Experimental Setup
17
22. Conclusion
• Authors presented a neural approach to generate
abstractive summaries for opinionated text.
• They employed an attention-based method that
finds salient information from different input text
units.
• They deploy an importance-based sampling
mechanism for model training.
• Their system obtained sota results.
22