Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Derric A. Alkis C
Abstract:
Delivering the customer to a high degree of confidence and the seller for more information about the products and the desire of customers through the use of modern technology and Machine Learning through comments left on the product to see and evaluate the comments added later and thus evaluate the product, whether good or bad.
Intent Classifier with Facebook fastText
Facebook Developer Circle, Malang
22 February 2017
This is slide for Facebook Developer Circle meetup.
This is for beginner.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
The quality of Neural Machine Translation (NMT) systems like Statistical Machine Translation (SMT) systems, heavily depends on the size of training data set, while for some pairs of languages, high-quality parallel data are poor resources. In order to respond to this low-resourced training data bottleneck reality, we employ the pivoting approach in both neural MT and statistical MT frameworks. During our experiments on the Persian-Spanish, taken as an under-resourced translation task, we discovered that, the aforementioned method, in both frameworks, significantly improves the translation quality in comparison to the standard direct translation approach.
Transformer Models have taken over most of the Natural language Inference tasks. In recent
times they have proved to beat several benchmarks. Chunking means splitting the sentences into
tokens and then grouping them in a meaningful way. Chunking is a task that has gradually
moved from POS tag-based statistical models to neural nets using Language models such as
LSTM, Bidirectional LSTMs, attention models, etc. Deep neural net Models are deployed
indirectly for classifying tokens as different tags defined under Named Recognition Tasks. Later
these tags are used in conjunction with pointer frameworks for the final chunking task. In our
paper, we propose an Ensemble Model using a fine-tuned Transformer Model and a recurrent
neural network model together to predict tags and chunk substructures of a sentence. We
analyzed the shortcomings of the transformer models in predicting different tags and then
trained the BILSTM+CNN accordingly to compensate for the same.
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
In this paper, we propose a compression based multi-document summarization technique by incorporating
word bigram probability and word co-occurrence measure. First we implemented a graph based technique
to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule
based syntactic constraint to prune our compressed sentences. Finally we use probabilistic measure while
exploiting word co-occurrence within a sentence to obtain our summaries. The system can generate summaries for any user-defined compression rate.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
A Neural Probabilistic Language Model.pptx
Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
A goal of statistical language modeling is to learn the joint probability function of sequences of
words in a language. This is intrinsically difficult because of the curse of dimensionality: a word
sequence on which the model will be tested is likely to be different from all the word sequences seen
during training. Traditional but very successful approaches based on n-grams obtain generalization
by concatenating very short overlapping sequences seen in the training set. We propose to fight the
curse of dimensionality by learning a distributed representation for words which allows each
training sentence to inform the model about an exponential number of semantically neighboring
sentences. The model learns simultaneously (1) a distributed representation for each word along
with (2) the probability function for word sequences, expressed in terms of these representations.
Generalization is obtained because a sequence of words that has never been seen before gets high
probability if it is made of words that are similar (in the sense of having a nearby representation) to
words forming an already seen sentence. Training such large models (with millions of parameters)
within a reasonable time is itself a significant challenge. We report on experiments using neural
networks for the probability function, showing on two text corpora that the proposed approach
significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to
take advantage of longer contexts.
This paper presents machine translation based on machine learning, which learns the semantically
correct corpus. The machine learning process based on Quantum Neural Network (QNN) is used to
recognizing the corpus pattern in realistic way. It translates on the basis of knowledge gained during
learning by entering pair of sentences from source to target language. By taking help of this training data
it translates the given text. own text.The paper consist study of a machine translation system which
converts source language to target language using quantum neural network. Rather than comparing
words semantically QNN compares numerical tags which is faster and accurate. In this tagger tags the
part of sentences discretely which helps to map bilingual sentences.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Derric A. Alkis C
Abstract:
Delivering the customer to a high degree of confidence and the seller for more information about the products and the desire of customers through the use of modern technology and Machine Learning through comments left on the product to see and evaluate the comments added later and thus evaluate the product, whether good or bad.
Intent Classifier with Facebook fastText
Facebook Developer Circle, Malang
22 February 2017
This is slide for Facebook Developer Circle meetup.
This is for beginner.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
presentation from my thesis defense on text summarization, discusses already existing state of art models along with efficiency of AMR or Abstract Meaning Representation for text summarization, we see how we can use AMRs with seq2seq models. We also discuss other techniques such as BPE or Byte Pair Encoding and its effectiveness for the task. Also we see how data augmentation with POS tags and AMRs effect the summarization with s2s learning.
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
The quality of Neural Machine Translation (NMT) systems like Statistical Machine Translation (SMT) systems, heavily depends on the size of training data set, while for some pairs of languages, high-quality parallel data are poor resources. In order to respond to this low-resourced training data bottleneck reality, we employ the pivoting approach in both neural MT and statistical MT frameworks. During our experiments on the Persian-Spanish, taken as an under-resourced translation task, we discovered that, the aforementioned method, in both frameworks, significantly improves the translation quality in comparison to the standard direct translation approach.
Transformer Models have taken over most of the Natural language Inference tasks. In recent
times they have proved to beat several benchmarks. Chunking means splitting the sentences into
tokens and then grouping them in a meaningful way. Chunking is a task that has gradually
moved from POS tag-based statistical models to neural nets using Language models such as
LSTM, Bidirectional LSTMs, attention models, etc. Deep neural net Models are deployed
indirectly for classifying tokens as different tags defined under Named Recognition Tasks. Later
these tags are used in conjunction with pointer frameworks for the final chunking task. In our
paper, we propose an Ensemble Model using a fine-tuned Transformer Model and a recurrent
neural network model together to predict tags and chunk substructures of a sentence. We
analyzed the shortcomings of the transformer models in predicting different tags and then
trained the BILSTM+CNN accordingly to compensate for the same.
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
In this paper, we propose a compression based multi-document summarization technique by incorporating
word bigram probability and word co-occurrence measure. First we implemented a graph based technique
to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule
based syntactic constraint to prune our compressed sentences. Finally we use probabilistic measure while
exploiting word co-occurrence within a sentence to obtain our summaries. The system can generate summaries for any user-defined compression rate.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
A Neural Probabilistic Language Model.pptx
Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
A goal of statistical language modeling is to learn the joint probability function of sequences of
words in a language. This is intrinsically difficult because of the curse of dimensionality: a word
sequence on which the model will be tested is likely to be different from all the word sequences seen
during training. Traditional but very successful approaches based on n-grams obtain generalization
by concatenating very short overlapping sequences seen in the training set. We propose to fight the
curse of dimensionality by learning a distributed representation for words which allows each
training sentence to inform the model about an exponential number of semantically neighboring
sentences. The model learns simultaneously (1) a distributed representation for each word along
with (2) the probability function for word sequences, expressed in terms of these representations.
Generalization is obtained because a sequence of words that has never been seen before gets high
probability if it is made of words that are similar (in the sense of having a nearby representation) to
words forming an already seen sentence. Training such large models (with millions of parameters)
within a reasonable time is itself a significant challenge. We report on experiments using neural
networks for the probability function, showing on two text corpora that the proposed approach
significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to
take advantage of longer contexts.
This paper presents machine translation based on machine learning, which learns the semantically
correct corpus. The machine learning process based on Quantum Neural Network (QNN) is used to
recognizing the corpus pattern in realistic way. It translates on the basis of knowledge gained during
learning by entering pair of sentences from source to target language. By taking help of this training data
it translates the given text. own text.The paper consist study of a machine translation system which
converts source language to target language using quantum neural network. Rather than comparing
words semantically QNN compares numerical tags which is faster and accurate. In this tagger tags the
part of sentences discretely which helps to map bilingual sentences.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
Word2vec on the italian language: first experimentsVincenzo Lomonaco
Word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent years. The vector representations of words learned by word2vec models have been proven to be able to carry semantic meanings and are useful in various NLP tasks. In this work I try to reproduce the previously obtained results for the English language and to explore the possibility of doing the same for the Italian language.
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements.
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
In this paper, we describe the developed model of the Convolutional Neural Networks CNN to a
classification of advertisements. The developed method has been tested on both texts (Arabic and Slovak
texts).The advertisements are chosen on a classified advertisements websites as short texts. We evolved a
modified model of the CNN, we have implemented it and developed next modifications. We studied their
influence on the performing activity of the proposed network. The result is a functional model of the
network and its implementation in Java and Python. And analysis of model results using different
parameters for the network and input data. The results on experiments data show that the developed model
of CNN is useful in the domains of Arabic and Slovak short texts, mainly for some classification of
advertisements. This paper gives complete guidelines for authors submitting papers for the AIRCC
Journals.
Imran Sarwar Bajwa, [2010], "Context Based Meaning Extraction by Means of Markov Logic", in International Journal of Computer Theory and Engineering - (IJCTE) 2(1) pp:35-38, February 2010
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with
application-oriented texts. There are many different approaches to solving this problem, including using
neural network algorithms. This article explores using neural networks to sort news articles based on their
category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the
word2vec model was applied to CNN. We have measured the accuracy of the classification when applying
these methods for ad texts datasets. The experimental results have shown that both of the models show us
quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant
results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture,
has produced a compact feature map on its last convolutional layer, which can then be used in the future
text representation. I.e. Using CNN as a text encoder and for learning transfer.
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with application-oriented texts. There are many different approaches to solving this problem, including using neural network algorithms. This article explores using neural networks to sort news articles based on their category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the word2vec model was applied to CNN. We have measured the accuracy of the classification when applying these methods for ad texts datasets. The experimental results have shown that both of the models show us quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture, has produced a compact feature map on its last convolutional layer, which can then be used in the future text representation. I.e. Using CNN as a text encoder and for learning transfer.
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsAusaf Ahmed
Introduced a state-of-the-art text classifier by addressing the capability of language semantics and polysemy in Natural Language Processing tasks. Used contextual representations of a word to achieve a ~5% increase in metrics outperforming existing models.
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
This is the poster for the paper entitled the same presented at the 14th International Conference on Natural Language Processing (ICON), 2017 in Kolkata
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Turkish language modeling using BERT
1. TURKISH LANGUAGE MODELING
Chaza Alkis, Abdurrahim Derric
Department of computer engineering
Yildiz Technical University, 34220 Istanbul, Türkiye
shaza.alqays@hotmail.com, abdelrahimdarrige@gmail.com
Abstract—Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
Keywords—Statistical Language Modelling, Neural Language
Models, Markov Chain, Artificial Neural Networks, Deep Learn-
ing, BERT.
I. INTRODUCTION
Our project is a new technique to guess the appropriate
word in a certain sentence, in this regard to get a good
result we studied some models and tested on the Turkish
language, including the statistical language modeling and
neural language models.
II. LANGUAGE MODELING
Language modeling is central to many important natural
language processing tasks.
III. STATISTICAL LANGUAGE MODELING
A statistical language model SLM is a probability
distribution over sequences of words.
The language model learns the probability of words
occurring based on examples of text. Simpler models may
appear in the context of a short series of words, while larger
models may work at the level of sentences or paragraphs.
Most commonly, language models work at the word level.
The language model can be developed and used
independently, such as creating new sequences of text
that appear to come from the set of documents.
Language modeling is an essential problem for a wide
range of natural language processing tasks. In a more
practical way, language models are used in the front or
back of a more sophisticated model for a task that requires
understanding the language.
Developing better language models often results in models
that perform better in the intended natural language
processing task. This is the motivation for developing better
and more accurate language models, [1].
IV. NEURAL LANGUAGE MODELS
Recently, the use of neural networks in the development
of linguistic models has become so popular that it may now
be the preferred approach.
The use of neural networks in language modeling is often
called Neural Language Modeling, or NLM for short.
Neural network approaches yield better results than classic
methods in independent language models and when models
are incorporated into larger models in challenging tasks
such as speech recognition and machine translation.
The main reason behind the improvements in performance
may be the ability of the method to generalize.
Specifically, an inclusive word that uses the real value
vector to represent each word in the project vector space is
approved. This learned representation of words based on
their use of words with a similar meaning allows to have a
similar representation.
This generalization is something that is not easily achievable
in linguistic representation in classical statistical language
models.
Furthermore, the distributed representation approach allows
for better representation of inclusion in measurement with
vocabulary size. Classical methods with one separate
representation of each word fight dimensional curse with
larger and larger vocabulary of words that lead to longer
and more separate representations.
The neural network approach to language modeling can be
described using the three following model properties:
• Associate each word in the vocabulary with a
distributed word feature vector.
• Express the joint probability function of word
sequences in terms of the feature vectors of these
words in the sequence.
• Learn simultaneously the word feature vector and
the parameters of the probability function.
This represents a relatively simple model where both
representation and probability model are learned together
directly from raw text data.
Recently, neurotic based approaches have begun and
consistently outperformed classical statistical approaches.
V. MODELS STUDY
A. Markov chain
A Markov chain is a stochastic model describing a
sequence of possible events in which the probability of each
event depends only on the state attained in the previous
event.
More formally, a separate Markov chain is a series of
2. random variables X1, X2, X3, ... that satisfies the Markov
feature - the probability of moving from the current state
to the next state depends only on the current state.
With respect to probability distribution, given that the
system is at the right time n, the conditional distribution
of states in the next instance, n + 1, is conditionally
independent of the state of the system in temporal cases 1,
2,. . . , n-1.
This can be written as follows:
Pr(Xn+1 = x|X1 = x1, X2 = x2,..., Xn = xn) =
Pr(Xn+1 = x|Xn = xn)
1)Markov chain graph representation: Markov chains are
often represented using vector diagrams. The nodes in the
vector diagrams represent the various possible states of
random variables, while the edges represent the probability
that the system will move from one state to another the next
time.
For example, in the weather forecast there are three possible
states for the random variable Weather = Sunny, Rainy,
Snowy, and possible Markov chains can be represented as
shown in the Figure 1 One of the main points to understand
Figure 1 Markov chain graph representation
in Markov chains is that you design the results of a series
of random variables over time. The nodes in the above
graph represent the different weather condition, and the
edges between them show the possibility that the next
random variable will change as many different states as
possible, given the condition of the current random variable.
Self-loops show the probability that the model will remain
in its current state.
In the Markov series above, the observed state of the current
random variable is Sunny. Then, the probability that the
random variable will take an instance of next time is Sunny
is 0.8. It may also take Rainy with a probability of 0.19 or
Snowy with a probability of 0.01.
2)Parameterization of Markov chains: Another way to
represent state transitions is to use a transition matrix.
The transition matrix, as the name implies, uses a tabular
representation of the transition probabilities.
The following table shows the transition matrix for the
Markov chain shown in Figure 1. The probability values
represent the probability of the system going from the state
in the row to the states mentioned in the columns, see Table
1.
Table 1 Transition matrix
state sunny rainy snowy
sunny 0.8 0.19 0.01
rainy 0.2 0.7 0.1
snowy 0.1+ 0.2 0.7
B. BERT
Bidirectional Encoder Representations from Transformers
(BERT) is a technique for NLP (Natural Language
Processing) pre-training developed by Google.
Modern NLP models based on deep learning see benefits
from much larger amounts of data, which improve upon
training in millions, or billions, from examples of annotated
training. To help fill this gap in the data, researchers have
developed a variety of techniques to train general purpose
language models using a massive amount of unexplained
text on the web (known as pre-training) as BERT.
1)Why BERT is different: BERT is the first non-supervised
bi-directional linguistic representation, pre-trained with a
plain story block.
For example, in the sentence "You have accessed the bank
account", a one-way contextual model would represent
"bank" based on "you have accessed" but not "account."
However, BERT represents a "bank" using both its previous
and next context - "I have accessed ... account" - starting
from below the deep neural network, making it deeply
bidirectional [2].
2)Masked language modelig: BERT has been pre-trained
on masked language modeling and next sentence prediction
(next sentence prediction will be explained in next section).
Masked language modeling is the task of predicting the next
word given a sequence of words. In masked language mod-
eling instead of predicting every next token, a percentage
of input tokens is masked at random and only those masked
tokens are predicted.
The masked words are not always replaced with the masked
token – [MASK] because then the masked tokens would
never be seen before fine-tuning. Therefore,
• 15% of the tokens are chosen at random.
• 80% of the time tokens are actually replaced with
the token [MASK].
• 10% of the time tokens are replaced with a random
token.
• 10% of the time tokens are left unchanged.
3. 3)Next sentence prediction: The missing word is
predicted, if the next word is the same as missing then the
model made a right guess, for example:
Input = [CLS] the man want to [MASK] store [SEP]
he bought a gallon [MASK] milk [SEP]
Label = IsNext
Input = [CLS] the man [MASK] to the store [SEP]
penguin [MASK] are flight less birds [SEP]
Label = NotNext
This task can be easily created from any single language
group. It is useful because many of the downstream tasks
such as question and answer and reasoning of natural
language require understanding the relationship between
two sentences.
4)Input text presentation before feeding to BERT: The
input representation used by BERT is capable of representing
a single text sentence as well as a pair of sentences
(for example, [Question, Answer]) in a single sequence of
symbols.
• The first token of every input sequence is the
special classification token – [CLS]. This token is
used in classification tasks as an aggregate of the
entire sequence representation. It is ignored in
non-classification tasks.
• For single text sentence tasks, this [CLS] token is
followed by the WordPiece tokens and the separator
token – [SEP],
[CLS] my cat is very good [SEP]
• For sentence pair tasks, the WordPiece tokens of the
two sentences are separated by another [SEP] token.
This input sequence also ends with the [SEP] token,
[CLS] my cat is cute [SEP] he likes play ing [SEP]
• A sentence referring to sentence A or sentence B
is added to each symbol. Decorations are similar to
symbols / word decorations with vocabulary 2.
• A positional embedding is also added to each token
to indicate its position in the sequence.
BERT uses the symbolism of WordPiece. The vocabulary is
initialized with all the individual letters of the language,
hence the most common / most likely groups of words in
the vocabulary are added frequently.
Any word that does not occur in the vocabulary is broken
down into sub-words greedily. For example, if play, ing, and
ed are present in the vocabulary but playing and played are
OOV words then they will be broken down into play + ing
and play + ed respectively. ( is used to represent sub-words).
And the maximum sequenced length of the input is 512
tokens, [3].
VI. RESULTS ANALYSING
A. Markov Cahin model dataset size effect comparison
Here we compare the affect of the size of the dataset,
we notice that when we use large dataset there is slight
improvement. It’s expected result because some words may
not be found in the dataset and as the dataset be larger as
we find more words. Also the best effect shown in order 1,
see Figure 2.
Figure 2 20K - 40K - 100K datasets comparison
B. Smoothing algorithms comparison
Smoothing is searching for the result by passing through
third order to second order back to first order, we noticed
that it had good effect on the result, see Figure 3.
Figure 3 Smoothed - Unsmoothed algorithms comparison
C. BERT model results comparison
Here we will compare BERT results through 20k, 40k
and 100k, we notice that the higher dataset size effect the
most, see Figure 4
Figure 4 BERT results comparison
4. D. BERT vs Google Multilingual
Here we will compare BERT and google multilingual,We
notice that Multilingual gives much lower results than our
BERT model, and it is unsuccessful in finding the missing
word because it focuses on more than 100 languages and
cannot focus on one language. As for our BERT model, it
is learning on the Turkish language alone, so its ability to
link Turkish words and the meanings between sentences
are stronger and this is the reason for the big difference
in results, see Figure 5
Figure 5 BERT vs Multilingual
E. Comparison of statistical language modeling and neural
language model
Here we will see the effect of the training dataset size in
each moddel, and the accuracy of each by comparing results
of top1 and top5.
By comparing Markov and BERT we find the Figure 6 which
means that BERT gives higher results than Markov chain
when dataset size going bigger, in this figure we use 3
datasets and see how they effect.
Figure 6 BERT vs Markov Chain
VII. CONCLUSION
From our study and previous studies, we notice that the
statistical language modeling, although it is considered an
old technique compared to BERT’s deep learning model, still
gives good results.
We notice that BERT, although it is deep learning model, did
not succeed much because the language contains hundreds
of thousands of words and these words may be names or
verbs with different terms, and they can also be found in
different locations of the sentence and this gives us millions
of possibilities.
So all from the statistical language modeling to the natural
language Models, it gets results of approximately 30 percent
or 40 percent of guesses.
Based on the graphics that we extracted from our study,
we see that the size of the dataset greatly affects the
probability of guesswork, so in the future a larger size of
the dataset and new techniques that improve the computer’s
understanding of the language can be used. But in return,
increasing the size of the dataset will lead to an increase in
mathematical operations, for example, when the size of the
dataset was 100K, the operating time was approximately 56
hours. Assuming we would have a million-volume data, the
operation is expected to be lengthened for months using the
current processors.
REFERENCES
[1] J. Brownlee. (2017) Gentle introduction to statistical
language modeling and neural language models. [Online].
Available: https://machinelearningmastery.com/statistical-language-
modeling-and-neural-language-models/
[2] J. Devlin and M.-W. Chang. (2018) Open sourcing bert: State-
of-the-art pre-training for natural language processing. [Online].
Available: https://ai.googleblog.com/2018/11/open-sourcing-bert-
state-of-art-pre.html
[3] Y. SETH. (2019) Bert explained. [Online].
Available: https://yashuseth.blog/2019/06/12/bert-explained-faqs-
understand-bert-working/