Machine comprehension remains a challenging open area of research. While many question answering models have been explored for existing datasets, little work has been done with the newly released MS MARCO dataset, which mirrors the reality much more closely and poses many unique challenges. We explore an end-to-end neural architecture with attention mechanisms for comprehending relevant information and generating text answers for MS MARCO.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
The document summarizes several course outlines for computer science courses. It provides details on three courses: Digital Logic, Discrete Structure, and Microprocessor. For each course it lists information like the course title, number, credit hours, nature, goals, contents which are divided into units, textbooks, and lab works. It also includes two sample course outlines for Data Structures and Algorithms and another unnamed course. The courses cover topics in digital logic, discrete math, microprocessors, data structures, algorithms and other computer science fundamentals.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
Online reviews are a feedback to the product and play a key role in improving the product to cater to consumers. Online reviews that rely heavily on manual categorization are time consuming and labor intensive.The recurrent neural network in deep learning can process time series data, while the long and short term memory network can process long time sequence data well. This has good experimental verification support in natural language processing, machine translation, speech recognition and language model.The merits of the extracted data features affect the classification results produced by the classification model. The LDA topic model adds a priori a posteriori knowledge to classify the data so that the characteristics of the data can be extracted efficiently.Applied to the classifier can improve accuracy and efficiency. Two-way long-term and short-term memory networks are variants and extensions of cyclic neural networks.The deep learning framework Keras uses Tensorflow as the backend to build a convenient two-way long-term and short-term memory network model, which provides a strong technical support for the experiment.Using the LDA topic model to extract the keywords needed to train the neural network and increase the internal relationship between words can improve the learning efficiency of the model. The experimental results in the same experimental environment are better than the traditional word frequency features.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses various similarity measures that can be used to quantify the similarity between documents, queries, or a document and query in an information retrieval system. It describes classic measures like Dice coefficient, overlap coefficient, Jaccard coefficient, and cosine coefficient. It provides examples of calculating these measures and compares the relations between different measures. The document also discusses using term-document matrices and shows an example matrix.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
The document summarizes several course outlines for computer science courses. It provides details on three courses: Digital Logic, Discrete Structure, and Microprocessor. For each course it lists information like the course title, number, credit hours, nature, goals, contents which are divided into units, textbooks, and lab works. It also includes two sample course outlines for Data Structures and Algorithms and another unnamed course. The courses cover topics in digital logic, discrete math, microprocessors, data structures, algorithms and other computer science fundamentals.
This document provides an overview of natural language processing (NLP) research trends presented at ACL 2020, including shifting away from large labeled datasets towards unsupervised and data augmentation techniques. It discusses the resurgence of retrieval models combined with language models, the focus on explainable NLP models, and reflections on current achievements and limitations in the field. Key papers on BERT and XLNet are summarized, outlining their main ideas and achievements in advancing the state-of-the-art on various NLP tasks.
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
Online reviews are a feedback to the product and play a key role in improving the product to cater to consumers. Online reviews that rely heavily on manual categorization are time consuming and labor intensive.The recurrent neural network in deep learning can process time series data, while the long and short term memory network can process long time sequence data well. This has good experimental verification support in natural language processing, machine translation, speech recognition and language model.The merits of the extracted data features affect the classification results produced by the classification model. The LDA topic model adds a priori a posteriori knowledge to classify the data so that the characteristics of the data can be extracted efficiently.Applied to the classifier can improve accuracy and efficiency. Two-way long-term and short-term memory networks are variants and extensions of cyclic neural networks.The deep learning framework Keras uses Tensorflow as the backend to build a convenient two-way long-term and short-term memory network model, which provides a strong technical support for the experiment.Using the LDA topic model to extract the keywords needed to train the neural network and increase the internal relationship between words can improve the learning efficiency of the model. The experimental results in the same experimental environment are better than the traditional word frequency features.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses various similarity measures that can be used to quantify the similarity between documents, queries, or a document and query in an information retrieval system. It describes classic measures like Dice coefficient, overlap coefficient, Jaccard coefficient, and cosine coefficient. It provides examples of calculating these measures and compares the relations between different measures. The document also discusses using term-document matrices and shows an example matrix.
A Novel Approach for User Search Results Using Feedback SessionsIJMER
This document proposes a novel approach to improve user search results using feedback sessions. It first clusters feedback sessions containing clicked and unclicked URLs using the Fuzzy c-means clustering algorithm. It then generates "pseudo-documents" to represent each feedback session cluster. This reflects the information needs of users within each cluster. It evaluates the performance of restructured search results using a "Classified Average Precision" metric. The key steps are: 1) Clustering feedback sessions with Fuzzy c-means, 2) Creating pseudo-documents for each cluster, 3) Evaluating results using Classified Average Precision. Fuzzy c-means allows URLs to belong to multiple clusters, reflecting uncertain user needs.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
IRJET- Chatbot Using Gated End-to-End Memory NetworksIRJET Journal
The document describes a proposed chatbot system that uses a gated end-to-end memory network model for hospital appointment booking. The model is trained on dialog data consisting of user utterances and bot responses related to booking appointments. It uses an attention mechanism over the dialog memory to select relevant parts of the conversation. The model is trained end-to-end to dynamically regulate interactions with the memory. Experiments show it can handle new combinations of fields when booking appointments in a simulated hospital reservation scenario.
Cleveree: an artificially intelligent web service for Jacob voice chatbotTELKOMNIKA JOURNAL
Jacob is a voice chatbot that use Wit.ai to get the context of the question and give an answer based on that context. However, Jacob has no variation in answer and could not recognize the context well if it has not been learned previously by the Wit.ai. Thus, this paper proposes two features of artificial intelligence (AI) built as a web service: the paraphrase of answers using the Stacked Residual LSTM model and the question summarization using Cosine Similarity with pre-trained Word2Vec and TextRank algorithm. These two features are novel designs that are tailored to Jacob, this AI module is called Cleveree. The evaluation of Cleveree is carried out using the technology acceptance model (TAM) method and interview with Jacob admins. The results show that 79.17% of respondents strongly agree that both features are useful and 72.57% of respondents strongly agree that both features are easy to use.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing
improves anomaly detection accuracy.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding which used simple scoring functions based on distance between entity embeddings. More advanced models like semantic matching energy and neural tensor networks learn separate relation embeddings and use them to calculate entity interactions. The document also discusses applications of these embeddings for tasks like link prediction, question answering and knowledge base expansion. It provides details of various models' scoring functions, training objectives and datasets used for evaluation.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding and semantic matching energy which used single hidden layers. It then explains more complex models like neural tensor networks that use tensors to model relations. The document also discusses applications of these embeddings for tasks like link prediction, question answering, and knowledge base expansion. It provides details on model formulations, scoring functions, training objectives, and datasets used for evaluation.
The document presents an ensemble model for chunking natural language text that combines a transformer model (RoBERTa) with a bidirectional LSTM and CNN model. The authors train these models on common chunking datasets like CoNLL 2000 and English Penn Treebank. They find that by using an ensemble of the transformer and RNN-CNN models, which compensate for each other's weaknesses, they are able to achieve state-of-the-art results on chunking, with an F1 score of 97.3% on CoNLL 2000, exceeding previous work. The transformer model provides attention-based contextual embeddings while the RNN-CNN model uses custom embeddings including POS tags to improve accuracy on tags that the transformer model struggles with.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
In this paper we analyze the cryptanalysis of the simplified data encryption standard algorithm using metaheuristics
and in particular genetic algorithms. The classic fitness function when using such an algorithm
is to compare n-gram statistics of a the decrypted message with those of the target message. We show that
using such a function is irrelevant in case of Genetic Algorithm, simply because there is no correlation
between the distance to the real key (the optimum) and the value of the fitness, in other words, there is no
hidden gradient. In order to emphasize this assumption we experimentally show that a genetic algorithm
perform worse than a random search on the cryptanalysis of the simplified data encryption standard
algorithm.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
This document presents a new model called EQUIRS (Explicitly Query Understanding Information Retrieval System) based on Hidden Markov Models (HMM) to improve natural language processing for text query information retrieval. The proposed EQUIRS system is compared to previous fuzzy clustering methods. Experimental results on a dataset of 900 files across 5 categories show that EQUIRS has higher accuracy than fuzzy clustering, as measured by precision, recall, F-measure, though it has longer training and searching times. The document concludes that EQUIRS is an effective approach for information retrieval based on HMM.
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
Information and communication technology grows so fast nowadays, especially related to the internet. Twitter is one of internet applications that produce a large amount of textual data called tweets. The tweets may represent real-world situation discussed in a community. Therefore, Twitter can be an important media for urban monitoring. The ability to monitor the situations may guide local government to respond quickly or make public policy. Topic detection is an important automatic tool to understand the tweets, for example, using non-negative matrix factorization. In this paper, we conducted a study to implement Twitter as a media for the urban monitoring in Jakarta and its surrounding areas called Greater Jakarta. Firstly, we analyze the accuracy of the detected topics in term of their interpretability level. Next, we visualize the trend of the topics to identify popular topics easily. Our simulations show that the topic detection methods can extract topics in a certain level of accuracy and draw the trends such that the topic monitoring can be conducted easily.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
This document describes a project to predict stock prices using the Long Short Term Memory (LSTM) algorithm with Least Mean Squares (LMS). The system architecture includes data preprocessing, normalization, the LMS algorithm, LSTM algorithm and error calculation. Literature on previous stock price prediction methods using LSTM is reviewed, including the working of LSTM gates. The project achieves over 95% accuracy on stock datasets for Google, Nifty50, TCS, Infosys and Reliance. Future work plans to extend the application to cryptocurrency prediction and add sentiment analysis.
This document provides a tutorial and survey on attention mechanisms, transformers, BERT, and GPT. It first explains attention in sequence-to-sequence models, including the basic sequence-to-sequence model without attention and with attention. It then explains self-attention and how it allows for computing attention over sequences. The document goes on to describe transformers, which are composed solely of attention modules without recurrence. It explains the encoder and decoder components of transformers, including positional encoding, multi-head attention, and masked multi-head attention. Finally, it introduces BERT and GPT, which are stacks of transformer encoders and decoders, respectively, and explains their characteristics and workings.
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
LongT5 is a new Transformer architecture and attention mechanism called TGlobal that allows scaling of both input length and model size. LongT5 achieves state-of-the-art results on various datasets including arXiv, PubMed, BigPatent, MediaSum, and TriviaQA. TGlobal mimics the local/global mechanism of ETC but can be used as a drop-in replacement for regular attention in models like T5.
A Novel Approach for User Search Results Using Feedback SessionsIJMER
This document proposes a novel approach to improve user search results using feedback sessions. It first clusters feedback sessions containing clicked and unclicked URLs using the Fuzzy c-means clustering algorithm. It then generates "pseudo-documents" to represent each feedback session cluster. This reflects the information needs of users within each cluster. It evaluates the performance of restructured search results using a "Classified Average Precision" metric. The key steps are: 1) Clustering feedback sessions with Fuzzy c-means, 2) Creating pseudo-documents for each cluster, 3) Evaluating results using Classified Average Precision. Fuzzy c-means allows URLs to belong to multiple clusters, reflecting uncertain user needs.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
IRJET- Chatbot Using Gated End-to-End Memory NetworksIRJET Journal
The document describes a proposed chatbot system that uses a gated end-to-end memory network model for hospital appointment booking. The model is trained on dialog data consisting of user utterances and bot responses related to booking appointments. It uses an attention mechanism over the dialog memory to select relevant parts of the conversation. The model is trained end-to-end to dynamically regulate interactions with the memory. Experiments show it can handle new combinations of fields when booking appointments in a simulated hospital reservation scenario.
Cleveree: an artificially intelligent web service for Jacob voice chatbotTELKOMNIKA JOURNAL
Jacob is a voice chatbot that use Wit.ai to get the context of the question and give an answer based on that context. However, Jacob has no variation in answer and could not recognize the context well if it has not been learned previously by the Wit.ai. Thus, this paper proposes two features of artificial intelligence (AI) built as a web service: the paraphrase of answers using the Stacked Residual LSTM model and the question summarization using Cosine Similarity with pre-trained Word2Vec and TextRank algorithm. These two features are novel designs that are tailored to Jacob, this AI module is called Cleveree. The evaluation of Cleveree is carried out using the technology acceptance model (TAM) method and interview with Jacob admins. The results show that 79.17% of respondents strongly agree that both features are useful and 72.57% of respondents strongly agree that both features are easy to use.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing
improves anomaly detection accuracy.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding which used simple scoring functions based on distance between entity embeddings. More advanced models like semantic matching energy and neural tensor networks learn separate relation embeddings and use them to calculate entity interactions. The document also discusses applications of these embeddings for tasks like link prediction, question answering and knowledge base expansion. It provides details of various models' scoring functions, training objectives and datasets used for evaluation.
This document discusses several approaches for embedding knowledge bases and relations into continuous vector spaces using neural networks. It first describes earlier models like semantic embedding and semantic matching energy which used single hidden layers. It then explains more complex models like neural tensor networks that use tensors to model relations. The document also discusses applications of these embeddings for tasks like link prediction, question answering, and knowledge base expansion. It provides details on model formulations, scoring functions, training objectives, and datasets used for evaluation.
The document presents an ensemble model for chunking natural language text that combines a transformer model (RoBERTa) with a bidirectional LSTM and CNN model. The authors train these models on common chunking datasets like CoNLL 2000 and English Penn Treebank. They find that by using an ensemble of the transformer and RNN-CNN models, which compensate for each other's weaknesses, they are able to achieve state-of-the-art results on chunking, with an F1 score of 97.3% on CoNLL 2000, exceeding previous work. The transformer model provides attention-based contextual embeddings while the RNN-CNN model uses custom embeddings including POS tags to improve accuracy on tags that the transformer model struggles with.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
In this paper we analyze the cryptanalysis of the simplified data encryption standard algorithm using metaheuristics
and in particular genetic algorithms. The classic fitness function when using such an algorithm
is to compare n-gram statistics of a the decrypted message with those of the target message. We show that
using such a function is irrelevant in case of Genetic Algorithm, simply because there is no correlation
between the distance to the real key (the optimum) and the value of the fitness, in other words, there is no
hidden gradient. In order to emphasize this assumption we experimentally show that a genetic algorithm
perform worse than a random search on the cryptanalysis of the simplified data encryption standard
algorithm.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
This document presents a new model called EQUIRS (Explicitly Query Understanding Information Retrieval System) based on Hidden Markov Models (HMM) to improve natural language processing for text query information retrieval. The proposed EQUIRS system is compared to previous fuzzy clustering methods. Experimental results on a dataset of 900 files across 5 categories show that EQUIRS has higher accuracy than fuzzy clustering, as measured by precision, recall, F-measure, though it has longer training and searching times. The document concludes that EQUIRS is an effective approach for information retrieval based on HMM.
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
Information and communication technology grows so fast nowadays, especially related to the internet. Twitter is one of internet applications that produce a large amount of textual data called tweets. The tweets may represent real-world situation discussed in a community. Therefore, Twitter can be an important media for urban monitoring. The ability to monitor the situations may guide local government to respond quickly or make public policy. Topic detection is an important automatic tool to understand the tweets, for example, using non-negative matrix factorization. In this paper, we conducted a study to implement Twitter as a media for the urban monitoring in Jakarta and its surrounding areas called Greater Jakarta. Firstly, we analyze the accuracy of the detected topics in term of their interpretability level. Next, we visualize the trend of the topics to identify popular topics easily. Our simulations show that the topic detection methods can extract topics in a certain level of accuracy and draw the trends such that the topic monitoring can be conducted easily.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
This document describes a project to predict stock prices using the Long Short Term Memory (LSTM) algorithm with Least Mean Squares (LMS). The system architecture includes data preprocessing, normalization, the LMS algorithm, LSTM algorithm and error calculation. Literature on previous stock price prediction methods using LSTM is reviewed, including the working of LSTM gates. The project achieves over 95% accuracy on stock datasets for Google, Nifty50, TCS, Infosys and Reliance. Future work plans to extend the application to cryptocurrency prediction and add sentiment analysis.
This document provides a tutorial and survey on attention mechanisms, transformers, BERT, and GPT. It first explains attention in sequence-to-sequence models, including the basic sequence-to-sequence model without attention and with attention. It then explains self-attention and how it allows for computing attention over sequences. The document goes on to describe transformers, which are composed solely of attention modules without recurrence. It explains the encoder and decoder components of transformers, including positional encoding, multi-head attention, and masked multi-head attention. Finally, it introduces BERT and GPT, which are stacks of transformer encoders and decoders, respectively, and explains their characteristics and workings.
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
LongT5 is a new Transformer architecture and attention mechanism called TGlobal that allows scaling of both input length and model size. LongT5 achieves state-of-the-art results on various datasets including arXiv, PubMed, BigPatent, MediaSum, and TriviaQA. TGlobal mimics the local/global mechanism of ETC but can be used as a drop-in replacement for regular attention in models like T5.
Building a vietnamese dialog mechanism for v dlg~tabl systemijnlc
This paper introduces a Vietnamese automatic dialog mechanism which allows the V-DLG~TABL system to automatically communicate with clients. This dialog mechanism is based on the Question – Answering engine of V-DLG~TABL system, and composes the following supplementary mechanisms: 1) the mechanism of choosing the suggested question; 2) the mechanism of managing the conversations and suggesting information; 3) the mechanism of resolving the questions having anaphora; 4) the scenarios of dialog (in this stage, there is only one simple scenario defined for the system).
A new RSA public key encryption scheme with chaotic maps IJECEIAES
Public key cryptography has received great attention in the field of information exchange through insecure channels. In this paper, we combine the Dependent-RSA (DRSA) and chaotic maps (CM) to get a new secure cryptosystem, which depends on both integer factorization and chaotic maps discrete logarithm (CMDL). Using this new system, the scammer has to go through two levels of reverse engineering, concurrently, so as to perform the recovery of original text from the cipher-text has been received. Thus, this new system is supposed to be more sophisticated and more secure than other systems. We prove that our new cryptosystem does not increase the overhead in performing the encryption process or the decryption process considering that it requires minimum operations in both. We show that this new cryptosystem is more efficient in terms of performance compared with other encryption systems, which makes it more suitable for nodes with limited computational ability.
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
The document presents a new model for extractive text summarization that uses BERT (Bidirectional Encoder Representations from Transformers), a pretrained deep bidirectional transformer model, as the text encoder. The model consists of a BERT encoder and a sentence classifier. Sentences are encoded using BERT and classified as to whether they should be included in the summary. Evaluation on the CNN/Daily Mail corpus shows the model achieves state-of-the-art results comparable to other top models according to automatic metrics and human evaluation, making it the first work to apply BERT to text summarization.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Alexander Sirenko - Query expansion for Question AnsweringAlexander Sirenko
The document describes a cognitive semiotic model for query expansion in question answering systems. Key points:
1. The model builds a context-free and context-sensitive grammar for the Russian language using associations, synonyms, cognems and other data.
2. A classifier is trained on 16,650 cognems to identify relevant expansion terms. Testing shows area under the curve values from 0.75 to 0.90 depending on query characteristics.
3. Future work includes expanding the grammar with English data and testing expanded queries against a QA corpus to assess performance improvements over popular search engines.
The document summarizes several course outlines for computer science courses. It provides details on three courses: Digital Logic, Discrete Structure, and Microprocessor. For each course it lists information like the course title, number, credit hours, nature, goals, contents which are divided into units, textbooks, and lab works. It also includes two sample course outlines for Data Structures and Algorithms and another unnamed course. The courses cover topics in digital logic, discrete math, microprocessors, data structures, algorithms and other computer science fundamentals.
This document contains the answers to an assignment on data stream processing techniques. It discusses using sampling to estimate metrics like average grade and fraction of high-performing students from a student grades data stream. It also covers the Bloom filter and estimating the number of distinct elements in a stream using the Flajolet-Martin algorithm and Count-Sketch. The assignment calculates false positive rates for Bloom filters, applies different hash functions to estimate distinct elements, and shows how Count-Sketch can estimate the join size between two streams.
Apply deep learning to improve the question analysis model in the Vietnamese ...IJECEIAES
Question answering (QA) system nowadays is quite popular for automated answering purposes, the meaning analysis of the question plays an important role, directly affecting the accuracy of the system. In this article, we propose an improvement for question-answering models by adding more specific question analysis steps, including contextual characteristic analysis, pos-tag analysis, and question-type analysis built on deep learning network architecture. Weights of extracted words through question analysis steps are combined with the best matching 25 (BM25) algorithm to find the best relevant paragraph of text and incorporated into the QA model to find the best and least noisy answer. The dataset for the question analysis step consists of 19,339 labeled questions covering a variety of topics. Results of the question analysis model are combined to train the question-answering model on the data set related to the learning regulations of Industrial University of Ho Chi Minh City. It includes 17,405 pairs of questions and answers for the training set and 1,600 pairs for the test set, where the robustly optimized BERT pretraining approach (RoBERTa) model has an F1-score accuracy of 74%. The model has improved significantly. For long and complex questions, the mode has extracted weights and correctly provided answers based on the question’s contents.
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
The document describes a study that used BERT (Bidirectional Encoder Representations from Transformers), a pretrained language model, for extractive text summarization. The researchers developed a two-phase encoder-decoder model where BERT encoded sentences from documents and classified them as included or not included in the summary. They evaluated the model on the CNN/Daily Mail corpus and found it achieved state-of-the-art results comparable to previous models based on both automatic metrics and human evaluation.
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
Digital media, applications, copyright defense, and multimedia security have become a vital aspect of our daily life. Digital watermarking is a technology used for the copyright security of digital applications. In this work we have dealt with a process able to mark digital pictures with a visible and semi invisible hided information, called watermark. This process may be the basis of a complete copyright protection system. Watermarking is implemented here using Haar Wavelet Coefficients and Principal Component analysis. Experimental results show high imperceptibility where there is no noticeable difference between the watermarked video frames and the original frame in case of invisible watermarking, vice-versa for semi visible implementation. The watermark is embedded in lower frequency band of Wavelet Transformed cover image. The combination of the two transform algorithm has been found to improve performance of the watermark algorithm. The robustness of the watermarking scheme is analyzed by means of two distinct performance measures viz. Peak Signal to Noise Ratio (PSNR) and Normalized Coefficient (NC).
This document summarizes a final project report for a parallel text mining framework that analyzes tweets in real-time. The project crawls tweets from major news outlets using Twitter's API and analyzes each tweet using hidden Markov models for part-of-speech tagging. The analysis is performed in parallel across 30 machines with 16 cores each using MPI. Bottlenecks include Twitter's API rate limits and scraping news articles, which are addressed through multi-threading and batch processing tweets.
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
1) The document proposes a method to synthetically enlarge training sets for emotion recognition systems by modifying Mel Frequency Cepstral Coefficients (MFCCs).
2) It applies pitch shifting to MFCC features by scaling their frequency values, which allows new patterns to be generated without changing emotional content.
3) Experimental results on a speech emotion database show the proposed MFCC modification approach reduces test error rates compared to using the original training set, demonstrating it effectively increases generalization of the emotion recognition system.
Similar to NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder Architecture (20)
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Recycled Concrete Aggregate in Construction Part III
NLP Project: Machine Comprehension Using Attention-Based LSTM Encoder-Decoder Architecture
1. LSTM Encoder-Decoder Architecture with
Attention Mechanism for Machine Comprehension
Eugene Nho
MBA/MS, Intelligent Systems
Stanford University
enho@stanford.edu
Brian Higgins
BS, Computer Science
Stanford University
bhiggins@stanford.edu
Abstract
Machine comprehension remains a challenging open area of research. While many question answering
models have been explored for existing datasets, little work has been done with the newly released
MS MARCO dataset, which mirrors the reality much more closely and poses many unique challenges.
We explore an end-to-end neural architecture with attention mechanisms for comprehending relevant
information and generating text answers for MS MARCO.
1 Introduction
Machine comprehension—building systems that comprehend natural language documents—is a challenging open area
of research within natural language processing (NLP). Since the widespread use of statistical machine learning and
more recently deep learning, its progress has been made largely in lockstep with the introduction of large-scale datasets.
The latest such dataset is MS MARCO [1]. Unlike previous machine comprehension and question answering datasets,
MS MARCO consists of real questions generated by anonymized queries from Bing, and has target answers in the
form of sequence of words. Compared to other forms of answers like multiple choice, single-token answer or span of
words, generating text is significantly more challenging. To our knowledge, no literature exists on end-to-end systems
specifically addressing this dataset yet.
In this paper, we explore an attention-based encoder-decoder architecture that comprehends input text and generates
text answers.
2 Related Work
Machine comprehension and question answering tasks went through multiple stages of evolution over the past decade.
Traditionally, these tasks relied on complex NLP pipelines involving steps like syntactic parsing, semantic parsing, and
question classification. With the rise of neural networks, end-to-end neural architectures have increasingly been applied
to comprehension tasks [2][3][4][5]. The evolution of available datasets was a driving force behind the progress in
this domain—from models simply choosing between multiple choice answers [4][5] to those generating single-token
answers [2][3] or span-of-words answers [6].
Throughout this evolution, attention has emerged as a key concept, in particular as the task demanded by dataset became
more complex. Attention was first utilized more in context of non-NLP tasks such as learning alignments between image
objects and agent actions [7], or between visual features of an image and text description in the caption generation task.
Bahdanau et al. (2015) applied the attention mechanism to Neural Machine Translation (NMT) [8], and Luong et al.
(2015) explored different architectures for attention-based NMT, including a global approach and a local approach [9].
2. Figure 1: Architecture of the encoder-decoder model with attention
Several approaches have been tried to incorporate attention into machine comprehension and question answering tasks.
Seo et al. (2017) applied bidirectional models with attention to achieve near state-of-the-art results for SQuAD [10].
Wang & Jiang (2016) combined match-LSTM, an attention model originally proposed for text entailment, with Pointer
Net, a sequence-to-sequence model proposed by Vinyals et al (2015) that constrains output words to the tokens from
input sequences [6][11].
3 Dataset: MS MARCO
MS MARCO is a reading comprehension dataset with a number of characteristics that make it the most realistic
available dataset for question answering. All questions are real anonymized queries from Bing, and the context passages,
from which target answers are derived, are sampled from real web documents, closely mirroring the real-world scenario
of finding an answer to a question on a search engine. The target answers, as mentioned earlier, are sequence of words
and are human-generated.
The dataset has 100,000 queries. Each query consists of one question, approximately 10 context passages, and target
answers. While most queries have one answer, some have many and some have none. The average length of the
questions, passages, and target answers are approximately 15, 85, and 8 tokens, respectively. There are no particular
topical focus, but the queries fall into one of the following five categories: description (52.6%), numeric (28.4%), entity
(10.5%), location (5.7%) and person (2.7%).
Given the nature of text generation task required by the dataset, we use ROUGE-L and BLEU as evaluation metrics.
4 Methods
The problem we are trying to solve can be defined formally as follows. For each example in the dataset, we are given a
question and a set of potentially relevant context passages. The question is represented as Q 2 Rd⇥n
where d is the
embedding size and n is the length of the question. The set of passages is represented as S = {P1, P2, ..., P }, where
is the number of passages. Our objective is to (1) choose the most relevant passage P out of S, and (2) generate the
answer consisting of a sequence of words, represented as R 2 Rd⇥k
where k is the length of the answer. Because both
tasks require very similar architectures, this paper will only focus on the second objective, assuming the best passage P
has already been selected.
2
3. We use an encoder-decoder architecture with multiple attention mechanisms to achieve this, as shown in Figure 1. The
components of this model are:
1. Question Encoder is an LSTM layer mapping the question information to a vector space.
2. Passage Encoder with Attention is an LSTM layer with an attention mechanism connecting the passage
information with the encoded knowledge about the question.
3. Attention Encoder is an extra LSTM layer further distilling the output from the Passage Encoder with
Attention.
4. Decoder is an LSTM layer that takes in information from the three encoders and generates the answer. It has
an attention mechanism connecting to the Attention Encoder.
4.1 Question Encoder
The purpose of the question encoder is to incorporate contextual information from each token of the question to a
vector space. We use a standard unidirectional LSTM [12] layer to process the question, represented as follows:
Hq
=
!
LSTM(Q) (1)
The output matrix Hq
2 Rh⇥n
is the hidden state representation of the question, where h is the dimension of the
hidden state. hq
i represents the ith column of Hq
, and encapsulates the contextual information up to the ith token of the
question (qi).
4.2 Passage Encoder with Attention
The objective of this layer is to capture the information from the passage relevant to the contents of the question. To that
end, it employs a standard LSTM layer with the global attention mechanism proposed by Luong et al [9]. P 2 Rd⇥l
is the matrix representing the most relevant context passage, where d is the embedding size and l is the length of the
passage. At each time step t, this encoder uses the embedding of tth token of the passage and the entire hidden state
representation from the question encoder to capture contextual information up to the tth token relevant to the question.
This is represented as follows:
˜hp
t =
!
AttentionLSTM(pt, Hq
) (2)
where ˜hp
t 2 Rh
is the hidden vector capturing the information, pt 2 Rd
is the tth token of the passage (i.e. tth column
of P), and Hq
is the matrix representing all the hidden states of the question encoder.
!
AttentionLSTM is an abstraction involving the following two steps: First, it captures information from the tth token
using a regular LSTM cell, represented as hp
t =
!
LSTM(pt), where hp
t 2 Rh
. Second, using Hq
and hp
t , it derives a
context vector ct 2 Rh
that captures relevant information from the questions. ct is concatenated with hp
t to generate ˜hp
t ,
as shown below:
˜hp
t =
!
AttentionLSTM(pt, Hq
) = tanh(Wc[hp
t ; ct] + bc) (3)
where Wc 2 Rh⇥2h
is a weight matrix and bc 2 Rh
is bias.
The context vector ct captures all the hidden states from the question encoder weighted by how relevant each question
word is to the tth passage word. To derive ct, we first calculate the relevance score between the tth passage word,
represented by hp
t , and jth question word, represented by hq
j :
score(hp
t , hq
j ) = (Wshp
t + bs)|
hq
j (4)
where Ws 2 Rh⇥h
is a weight matrix and bs 2 Rh
is bias. We then create the attention vector at = {a
(1)
t , a
(2)
t , ... ,
a
(n)
t } 2 R1⇥n
. at is a horizontal vector whose jth element is a softmax value of score(hp
t , hq
j ). ct is the weighted
average of Hq
based on the attention vector at.
a
(j)
t =
exp(score(hp
t , hq
j ))
Pn
⇢ exp(score(hp
t , hq
⇢))
(5)
gt = Hq ¯at (6)
ct =
nX
i
g
(i)
t (7)
3
4. Figure 2: Loss and evaluation metrics
where ¯at 2 Rh⇥n
is the vertically broadcasted matrix of at, and g
(i)
t is the ith column of gt. All the formulas aside, the
important intuition here is that the hidden state of this encoder at each time step t incorporates information about not
only the passage tokens up to that step (as is the case with a normal LSTM hidden state), but also a measure of how
relevant each of the question tokens is to that particular tth passage token.
4.3 Attention Encoder
The purpose of this LSTM layer is to further distill the contextual information captured by the passage encoder. It is
a variant of the Match LSTM layer, first proposed by Wang & Jiang (2016) for text entailment and later applied to
question answering by the same authors [6]. We utilize a standard unidirectional LSTM layer taking as input all the
hidden states of the Passage Encoder with Attention, as shown below:
Hm
=
!
LSTM( ˜Hp
) (8)
where ˜Hp
= {˜hp
1, ˜hp
2, ..., ˜hp
l } 2 Rh⇥l
is the hidden state representation from the passage encoder.
4.4 Decoder
We use an LSTM layer with global attention between the decoder and the Attention Encoder to generate the answer.
The purpose of this attention mechanism is to let the decoder "peek" at the relevant information encapsulating the
passage and the question as it generates the answer. In addition, to pass along the contextual information captured by
the encoders, we concatenate the last hidden states from all three encoders and feed the combined matrix as the initial
hidden state to the decoder. At each time step t, the decoder takes as input the embedding of the generated token from
the previous step, and uses the same attention mechanism described in the earlier section to derive the hidden state
representation ˜hr
t 2 R3h
:
˜hr
t =
!
AttentionLSTM(rt 1, Hm
) (9)
where rt 1 2 Rd
is the embedding of the word generated in the last time step. Then we apply matrix multiplication and
softmax to generate the output vector ot 2 RV
.
ot = softmax(U˜hr
t + b) (10)
where V is the vocabulary size. rt is the word embedding corresponding to ot.
4
5. Description Person Numeric Location Entity Full Data Set
BLEU-1 9.1 3.2 7.4 3.8 7.4 9.3
ROUGE-L 15.1 2.8 13.7 3.4 5.5 12.8
Table 1: Evaluation metrics by question type from the held out validation set
5 Experiments and Discussion
5.1 Results
A variant of the model described earlier with a double-layer LSTM decoder achieved ROUGE-L of 12.8 and BLEU of
9.3 on the held out validation set. While this result is not sate-of-the-art, our model outperformed several benchmark
models such as Seq2Seq model with Memory Networks [1].
Figure 2 shows the training progression for our model. Loss steadily declines, but the model starts overfitting around
the 10th epoch, where both evaluation metrics on the development set peaked. Our classifier for selecting the most
relevant context passage achieved an accuracy of 100% on both the development set and the held out validation set.
We conducted over 40 runs for hyperparameter optimization, and found the following:
• Larger batch size generally performed better, with a batch size of 256 outperforming 64 (9.3 vs 6.4 BLEU)
• L2 loss outperformed cross entropy (9.3 vs 7.0 BLEU)
• Our model was sensitive to learning rate, with evaluation metrics dropping precipitously when the rate
approached 0.01 range (vs 0.001 or 0.0001)
5.2 Error Analysis
As illustrated by Table 1, our model was much more effective in generating descriptions and numeric answers than
people and locations. Its relative weakness on people and locations most likely occurred because of the rarity of their
names. The dataset contained a total of approximately 650,000 tokens. Because of memory restrictions, we used a
vocabulary size of 20,000 during training, and the rest of the words were cast to the < unknown > token. This practice
led to lower performance on the people and location questions, which on average contain more uncommon tokens
outside of the vocabulary size.
We also found that longer answers were much harder to predict. This is well known and reported in text generation
tasks because the farther into the decoder the model gets, the more diluted the original hidden state becomes. Because
of this, it is very challenging to predict long sequences of words.
Lastly, our design decision on how to treat the < end > token in the generated answer impacted our error rate. When
training, we masked the generated answers based on the lengths of ground truth. Our hypothesis was that this would
encourage our model to generate predictions that are of the same length as the target answers. However, because of this
decision, the model was not penalized for generating often irrelevant tokens beyond the length of the target answer.
This led to lower evaluation metrics on the held out validation set.
6 Conclusion
We see a few areas of improvement for future work. First, we would like to reduce the burden on the decoder softmax
by limiting the vocabulary to only the tokens used in that particular query’s question and passage. Predicting the right
token out of 20,000 options is challenging, especially when the model is required to get that right 20 to 30 times in a
row. Second, we would like to train a language model to provide a warm start for our decoder, so the decoder does not
have to learn how to generate sensible sequence of words in English in addition to responding to the questions and
context passages simultaneously.
5
6. References
[1] Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A
Human Generated MAchine Reading COmprehension Dataset. In NIPS 2016.
[2] Karl Moritz Hermann, Tomáš Koˇciský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom.
2015. Teaching Machines To Read And Comprehend In arXiv:1506.03340 [cs.CL].
[3] Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, and Jan Kleindienst. 2016. Text Understanding With The Attention Sum Reader
Network. In ACL 2016.
[4] Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. 2015. The Goldilocks Principle: Reading Children’s Books with
Explicit Memory Representations. In arXiv:1511.02301 [cs.CL].
[5] Wenpeng Yin, Sebastian Ebert, and Hinrich Schütze. 2016. Attention-Based Convolutional Neural Network For Machine
Comprehension. In arXiv:1602.04341 [cs.CL].
[6] Shuohang Wang and Jing Jiang. 2016. Machine comprehension using Match-LSTM and answer pointer. Under review as a
conference paper at ICLR 2017.
[7] Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In NIPS.
[8] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. 2014. Neural Machine Translation By Jointly Learning To Align And
Translate. In arXiv:1409.0473 [cs.CL].
[9] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine
Translation. In arXiv:1508.04025 [cs.CL].
[10] Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional Attention Flow for Machine
Comprehension. In Proceedings of ICLR 2017.
[11] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. arXiv:1506.03134 [stat.ML].
[12] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9.8 (1997): 1735-1780.
[13] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine
Comprehension of Text. In Proceedings of EMNLP 2016.
[14] Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. Reasonet: Learning To Stop Reading In Machine Comprehension.
In arXiv:1609.05284 [cs.LG].
6