Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
One of the difficult tasks on Natural Language
Processing (NLP) is to resolve the sense ambiguity of
characters or words on text, such as polyphones, homonymy,
and homograph. The paper addresses the ambiguity issue of
Chinese character polyphones and disambiguity approach for
such issues. Three methods, dictionary matching, language
models and voting scheme, are used to disambiguate the
prediction of polyphones. Compared with the well-known MS
Word 2007 and language models (LMs), our approach is
superior to these two methods for the issue. The final precision
rate is enhanced up to 92.75%. Based on the proposed
approaches, we have constructed the e-learning system in
which several related functions of Chinese transliteration are
integrated.
This document discusses NLP's "Imagenet Moment" with the emergence of transfer learning approaches like BERT, ELMo, and GPT. It explains that these models were pretrained on large datasets and can now be downloaded and fine-tuned for specific tasks, similar to how pretrained ImageNet models revolutionized computer vision. The document also provides an overview of BERT, including its bidirectional Transformer architecture, pretraining tasks, and performance on tasks like GLUE and SQuAD.
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
This document provides an overview of deep learning techniques for natural language processing (NLP). It discusses some of the challenges in language understanding like ambiguity and productivity. It then covers traditional ML approaches to NLP problems and how deep learning improves on these approaches. Some key deep learning techniques discussed include word embeddings, recursive neural networks, and language models. Word embeddings allow words with similar meanings to have similar vector representations, improving tasks like sentiment analysis. Recursive neural networks can model hierarchical structures like sentences. Language models assign probabilities to word sequences.
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
Talk about representation learning using word vectors such as Word2Vec, Paragraph Vector. Also introduced to neural network language models. Expose some applications using NNLM such as sentiment analysis and information retrieval.
- The document discusses neural word embeddings, which represent words as dense real-valued vectors in a continuous vector space. This allows words with similar meanings to have similar vector representations.
- It describes how neural network language models like skip-gram and CBOW can be used to efficiently learn these word embeddings from unlabeled text data in an unsupervised manner. Techniques like hierarchical softmax and negative sampling help reduce computational complexity.
- The learned word embeddings show meaningful syntactic and semantic relationships between words and allow performing analogy and similarity tasks without any supervision during training.
This document presents machine learning techniques to identify verb subcategorization frames from Czech language corpora. It compares three statistical techniques and shows they can discover new subcategorization frames and label dependents as arguments or adjuncts with 88% accuracy. It describes the task, relevant properties of Czech including free word order and case marking, and how the techniques are applied without assuming a predefined frame set.
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
One of the difficult tasks on Natural Language
Processing (NLP) is to resolve the sense ambiguity of
characters or words on text, such as polyphones, homonymy,
and homograph. The paper addresses the ambiguity issue of
Chinese character polyphones and disambiguity approach for
such issues. Three methods, dictionary matching, language
models and voting scheme, are used to disambiguate the
prediction of polyphones. Compared with the well-known MS
Word 2007 and language models (LMs), our approach is
superior to these two methods for the issue. The final precision
rate is enhanced up to 92.75%. Based on the proposed
approaches, we have constructed the e-learning system in
which several related functions of Chinese transliteration are
integrated.
This document discusses NLP's "Imagenet Moment" with the emergence of transfer learning approaches like BERT, ELMo, and GPT. It explains that these models were pretrained on large datasets and can now be downloaded and fine-tuned for specific tasks, similar to how pretrained ImageNet models revolutionized computer vision. The document also provides an overview of BERT, including its bidirectional Transformer architecture, pretraining tasks, and performance on tasks like GLUE and SQuAD.
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
This document provides an overview of deep learning techniques for natural language processing (NLP). It discusses some of the challenges in language understanding like ambiguity and productivity. It then covers traditional ML approaches to NLP problems and how deep learning improves on these approaches. Some key deep learning techniques discussed include word embeddings, recursive neural networks, and language models. Word embeddings allow words with similar meanings to have similar vector representations, improving tasks like sentiment analysis. Recursive neural networks can model hierarchical structures like sentences. Language models assign probabilities to word sequences.
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
Talk about representation learning using word vectors such as Word2Vec, Paragraph Vector. Also introduced to neural network language models. Expose some applications using NNLM such as sentiment analysis and information retrieval.
- The document discusses neural word embeddings, which represent words as dense real-valued vectors in a continuous vector space. This allows words with similar meanings to have similar vector representations.
- It describes how neural network language models like skip-gram and CBOW can be used to efficiently learn these word embeddings from unlabeled text data in an unsupervised manner. Techniques like hierarchical softmax and negative sampling help reduce computational complexity.
- The learned word embeddings show meaningful syntactic and semantic relationships between words and allow performing analogy and similarity tasks without any supervision during training.
This document presents machine learning techniques to identify verb subcategorization frames from Czech language corpora. It compares three statistical techniques and shows they can discover new subcategorization frames and label dependents as arguments or adjuncts with 88% accuracy. It describes the task, relevant properties of Czech including free word order and case marking, and how the techniques are applied without assuming a predefined frame set.
Learning to understand phrases by embedding the dictionaryRoelof Pieters
The document describes a model that uses an RNN with LSTM cells to learn useful representations of phrases by mapping dictionary definitions to word embeddings, addressing the gap between lexical and phrasal semantics. The model is applied to two tasks: a reverse dictionary/concept finder that takes phrases as input and outputs words, and a general knowledge question answering system for crosswords. The RNN is trained on dictionary definitions to map phrases to target word embeddings, then tested on new input phrases.
This document discusses natural language processing and language models. It begins by explaining that natural language processing aims to give computers the ability to process human language in order to perform tasks like dialogue systems, machine translation, and question answering. It then discusses how language models assign probabilities to strings of text to determine if they are valid sentences. Specifically, it covers n-gram models which use the previous n words to predict the next, and how smoothing techniques are used to handle uncommon words. The document provides an overview of key concepts in natural language processing and language modeling.
This document discusses fine-tuning the BERT model with PyTorch and the Transformers library. It provides an overview of BERT, how it was trained, its special tokens, the Transformers library, preprocessing text for BERT, using the BertModel class, the approach to fine-tuning BERT for a task, creating a dataset and data loaders, and training and validating the model.
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
Neural machine translation of rare words with subword unitsTae Hwan Jung
This paper proposes using subword units generated by byte-pair encoding (BPE) to address the open-vocabulary problem in neural machine translation. The paper finds that BPE segmentation outperforms a back-off dictionary baseline on two translation tasks, improving BLEU by up to 1.1 and CHRF by up to 1.3. BPE learns a joint encoding between source and target languages which increases consistency in segmentation compared to language-specific encodings, further improving translation of rare and unseen words.
This document provides an overview of representation learning techniques for natural language processing (NLP). It begins with introductions to the speakers and objectives of the workshop, which is to provide a deep dive into state-of-the-art text representation techniques. The workshop is divided into four modules: word vectors, sentence/paragraph/document vectors, and character vectors. The document provides background on why text representation is important for NLP, and discusses older techniques like one-hot encoding, bag-of-words, n-grams, and TF-IDF. It also introduces newer distributed representation techniques like word2vec's skip-gram and CBOW models, GloVe, and the use of neural networks for language modeling.
BERT is a language representation model that was pre-trained using two unsupervised prediction tasks: masked language modeling and next sentence prediction. It uses a multi-layer bidirectional Transformer encoder based on the original Transformer architecture. BERT achieved state-of-the-art results on a wide range of natural language processing tasks including question answering and language inference. Extensive experiments showed that both pre-training tasks, as well as a large amount of pre-training data and steps, were important for BERT to achieve its strong performance.
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
This document discusses how deep learning techniques can be applied to natural language processing tasks. It begins by explaining some of the limitations of traditional rule-based and machine learning approaches to NLP, such as the lack of semantic understanding and difficulty of feature engineering. Deep learning approaches can learn features automatically from large amounts of unlabeled text and better capture semantic and syntactic relationships between words. Recurrent neural networks are well-suited for NLP because they can model sequential data like text, and convolutional neural networks can learn hierarchical patterns in text.
The document discusses the use of deep neural networks and text mining. It provides an overview of key developments in deep learning for natural language processing, including word embeddings using Word2Vec, convolutional neural networks for modeling sentences and documents, and applications such as machine translation, relation classification and topic modeling. The document also discusses parameter tuning for deep learning models.
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
This document discusses the Word2Vec model for learning word representations. It outlines some limitations of classic NLP techniques, such as treating words as atomic units. Word2Vec uses a neural network model to learn vector representations of words in a way that captures semantic and syntactic relationships. Specifically, it describes the skip-gram and negative sampling techniques used to efficiently train the model on large amounts of unlabeled text data. Applications mentioned include machine translation and dimensionality reduction.
The paper presents a neural probabilistic language model that overcomes the curse of dimensionality in probabilistic language modeling. It develops a neural network model with distributed word representations as parameters to learn the probability of sequences. The model learns representations for each word and the probability function as a function of these representations using a hidden and softmax layer. This allows the model to estimate probabilities of unseen sequences during training by taking advantage of longer contexts through continuous representations.
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
This document discusses natural language processing (NLP) and language modeling. It covers the basics of NLP including what NLP is, its common applications, and basic NLP processing steps like parsing. It also discusses word and sentence modeling in NLP, including word representations using techniques like bag-of-words, word embeddings, and language modeling approaches like n-grams, statistical modeling, and neural networks. The document focuses on introducing fundamental NLP concepts.
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
Analyzing individual neurons in pre trained language modelsken-ando
This document analyzes individual neurons in pretrained language models to understand what linguistic information they capture. It finds that:
1) Single neurons can focus on specific linguistic properties like parts of speech or semantic roles.
2) Lexical tasks like POS tagging rely on neurons in lower layers, while more complex syntax tasks use neurons from higher layers.
3) BERT spreads top neurons across all layers, while other models concentrate them in fewer layers, showing how knowledge is distributed.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Arabic named entity recognition using deep learning approachIJECEIAES
Most of the Arabic Named Entity Recognition (NER) systems depend massively on external resources and handmade feature engineering to achieve state-of-the-art results. To overcome such limitations, we proposed, in this paper, to use deep learning approach to tackle the Arabic NER task. We introduced a neural network architecture based on bidirectional Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) and experimented with various commonly used hyperparameters to assess their effect on the overall performance of our system. Our model gets two sources of information about words as input: pre-trained word embeddings and character-based representations and eliminated the need for any task-specific knowledge or feature engineering. We obtained state-of-the-art result on the standard ANERcorp corpus with an F1 score of 90.6%.
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
12/22 Deep Learning勉強会@小町研 にて
"Learning Character-level Representations for Part-of-Speech Tagging" C ́ıcero Nogueira dos Santos, Bianca Zadrozny
を紹介しました。
Paper Introduction,
"Translating into Morphologically Rich Languages with Synthetic Phrases"
Victor Chahuneau, Eva Schlinger, Noah A. Smith, Chris Dyer (EMNLP2013)
The document provides information about an upcoming bootcamp on natural language processing (NLP) being conducted by Anuj Gupta. It discusses Anuj Gupta's background and experience in machine learning and NLP. The objective of the bootcamp is to provide a deep dive into state-of-the-art text representation techniques in NLP and help participants apply these techniques to solve their own NLP problems. The bootcamp will be very hands-on and cover topics like word vectors, sentence/paragraph vectors, and character vectors over two days through interactive Jupyter notebooks.
GPSInsights is a scalable framework for efficiently storing and mining massive volumes of real-time GPS data reported every second from millions of vehicles. The framework was developed to address challenges with analyzing this type of location data including its size, imprecise measurements, and need to integrate the data with maps during processing.
Keeping up to date & comparing journal apps. the stockholm workshop 2016Guus van den Brekel
This document compares three journal apps: BrowZine, Docphin, and Read by QXMD. It outlines criteria for comparing the apps, including registration process, user interface, performance, access to full text, PDF viewing options, sharing abilities, notifications, search options, and support. The document then provides brief descriptions of each app and scores them based on the criteria. It also discusses tools for keeping up to date, including databases, RSS, and aggregation/curation platforms like Feedly, Flipboard, and Rebelmouse.
Learning to understand phrases by embedding the dictionaryRoelof Pieters
The document describes a model that uses an RNN with LSTM cells to learn useful representations of phrases by mapping dictionary definitions to word embeddings, addressing the gap between lexical and phrasal semantics. The model is applied to two tasks: a reverse dictionary/concept finder that takes phrases as input and outputs words, and a general knowledge question answering system for crosswords. The RNN is trained on dictionary definitions to map phrases to target word embeddings, then tested on new input phrases.
This document discusses natural language processing and language models. It begins by explaining that natural language processing aims to give computers the ability to process human language in order to perform tasks like dialogue systems, machine translation, and question answering. It then discusses how language models assign probabilities to strings of text to determine if they are valid sentences. Specifically, it covers n-gram models which use the previous n words to predict the next, and how smoothing techniques are used to handle uncommon words. The document provides an overview of key concepts in natural language processing and language modeling.
This document discusses fine-tuning the BERT model with PyTorch and the Transformers library. It provides an overview of BERT, how it was trained, its special tokens, the Transformers library, preprocessing text for BERT, using the BertModel class, the approach to fine-tuning BERT for a task, creating a dataset and data loaders, and training and validating the model.
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
Neural machine translation of rare words with subword unitsTae Hwan Jung
This paper proposes using subword units generated by byte-pair encoding (BPE) to address the open-vocabulary problem in neural machine translation. The paper finds that BPE segmentation outperforms a back-off dictionary baseline on two translation tasks, improving BLEU by up to 1.1 and CHRF by up to 1.3. BPE learns a joint encoding between source and target languages which increases consistency in segmentation compared to language-specific encodings, further improving translation of rare and unseen words.
This document provides an overview of representation learning techniques for natural language processing (NLP). It begins with introductions to the speakers and objectives of the workshop, which is to provide a deep dive into state-of-the-art text representation techniques. The workshop is divided into four modules: word vectors, sentence/paragraph/document vectors, and character vectors. The document provides background on why text representation is important for NLP, and discusses older techniques like one-hot encoding, bag-of-words, n-grams, and TF-IDF. It also introduces newer distributed representation techniques like word2vec's skip-gram and CBOW models, GloVe, and the use of neural networks for language modeling.
BERT is a language representation model that was pre-trained using two unsupervised prediction tasks: masked language modeling and next sentence prediction. It uses a multi-layer bidirectional Transformer encoder based on the original Transformer architecture. BERT achieved state-of-the-art results on a wide range of natural language processing tasks including question answering and language inference. Extensive experiments showed that both pre-training tasks, as well as a large amount of pre-training data and steps, were important for BERT to achieve its strong performance.
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
This document discusses how deep learning techniques can be applied to natural language processing tasks. It begins by explaining some of the limitations of traditional rule-based and machine learning approaches to NLP, such as the lack of semantic understanding and difficulty of feature engineering. Deep learning approaches can learn features automatically from large amounts of unlabeled text and better capture semantic and syntactic relationships between words. Recurrent neural networks are well-suited for NLP because they can model sequential data like text, and convolutional neural networks can learn hierarchical patterns in text.
The document discusses the use of deep neural networks and text mining. It provides an overview of key developments in deep learning for natural language processing, including word embeddings using Word2Vec, convolutional neural networks for modeling sentences and documents, and applications such as machine translation, relation classification and topic modeling. The document also discusses parameter tuning for deep learning models.
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
This document discusses the Word2Vec model for learning word representations. It outlines some limitations of classic NLP techniques, such as treating words as atomic units. Word2Vec uses a neural network model to learn vector representations of words in a way that captures semantic and syntactic relationships. Specifically, it describes the skip-gram and negative sampling techniques used to efficiently train the model on large amounts of unlabeled text data. Applications mentioned include machine translation and dimensionality reduction.
The paper presents a neural probabilistic language model that overcomes the curse of dimensionality in probabilistic language modeling. It develops a neural network model with distributed word representations as parameters to learn the probability of sequences. The model learns representations for each word and the probability function as a function of these representations using a hidden and softmax layer. This allows the model to estimate probabilities of unseen sequences during training by taking advantage of longer contexts through continuous representations.
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
This document discusses natural language processing (NLP) and language modeling. It covers the basics of NLP including what NLP is, its common applications, and basic NLP processing steps like parsing. It also discusses word and sentence modeling in NLP, including word representations using techniques like bag-of-words, word embeddings, and language modeling approaches like n-grams, statistical modeling, and neural networks. The document focuses on introducing fundamental NLP concepts.
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
Analyzing individual neurons in pre trained language modelsken-ando
This document analyzes individual neurons in pretrained language models to understand what linguistic information they capture. It finds that:
1) Single neurons can focus on specific linguistic properties like parts of speech or semantic roles.
2) Lexical tasks like POS tagging rely on neurons in lower layers, while more complex syntax tasks use neurons from higher layers.
3) BERT spreads top neurons across all layers, while other models concentrate them in fewer layers, showing how knowledge is distributed.
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
This document summarizes the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". It discusses how the researchers trained sentence embeddings using supervised data from the Stanford Natural Language Inference dataset. They tested several sentence encoder architectures and found that a BiLSTM network with max pooling produced the best performing universal sentence representations, outperforming prior unsupervised methods on 12 transfer tasks. The sentence representations learned from the natural language inference data consistently achieved state-of-the-art performance across multiple downstream tasks.
Arabic named entity recognition using deep learning approachIJECEIAES
Most of the Arabic Named Entity Recognition (NER) systems depend massively on external resources and handmade feature engineering to achieve state-of-the-art results. To overcome such limitations, we proposed, in this paper, to use deep learning approach to tackle the Arabic NER task. We introduced a neural network architecture based on bidirectional Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) and experimented with various commonly used hyperparameters to assess their effect on the overall performance of our system. Our model gets two sources of information about words as input: pre-trained word embeddings and character-based representations and eliminated the need for any task-specific knowledge or feature engineering. We obtained state-of-the-art result on the standard ANERcorp corpus with an F1 score of 90.6%.
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
12/22 Deep Learning勉強会@小町研 にて
"Learning Character-level Representations for Part-of-Speech Tagging" C ́ıcero Nogueira dos Santos, Bianca Zadrozny
を紹介しました。
Paper Introduction,
"Translating into Morphologically Rich Languages with Synthetic Phrases"
Victor Chahuneau, Eva Schlinger, Noah A. Smith, Chris Dyer (EMNLP2013)
The document provides information about an upcoming bootcamp on natural language processing (NLP) being conducted by Anuj Gupta. It discusses Anuj Gupta's background and experience in machine learning and NLP. The objective of the bootcamp is to provide a deep dive into state-of-the-art text representation techniques in NLP and help participants apply these techniques to solve their own NLP problems. The bootcamp will be very hands-on and cover topics like word vectors, sentence/paragraph vectors, and character vectors over two days through interactive Jupyter notebooks.
GPSInsights is a scalable framework for efficiently storing and mining massive volumes of real-time GPS data reported every second from millions of vehicles. The framework was developed to address challenges with analyzing this type of location data including its size, imprecise measurements, and need to integrate the data with maps during processing.
Keeping up to date & comparing journal apps. the stockholm workshop 2016Guus van den Brekel
This document compares three journal apps: BrowZine, Docphin, and Read by QXMD. It outlines criteria for comparing the apps, including registration process, user interface, performance, access to full text, PDF viewing options, sharing abilities, notifications, search options, and support. The document then provides brief descriptions of each app and scores them based on the criteria. It also discusses tools for keeping up to date, including databases, RSS, and aggregation/curation platforms like Feedly, Flipboard, and Rebelmouse.
We overload our minds with so much information. Our rate of communication goes up, and the quality of our communication goes down. We are in danger of losing ourselves.
Tobacco causes about 40,000 deaths per year in Spain and leads to illnesses like lung cancer, heart attacks, and coughing. It is forbidden to smoke in schools, hospitals, and other enclosed public spaces to protect non-smokers. Alcohol limits brain functions and damages the nervous system, causing memory loss and heart issues. Cannabis smoking absorbs THC quickly into the lungs and brain, risking concentration and memory problems as well as reduced coordination. Obesity, defined as excessive fat accumulation, affects over 1.6 billion adults worldwide and is expected to affect over 2.3 billion adults and 700 million with obesity by 2015. Main causes are nutritional imbalances and sedentary lifestyles, while consequences include
Social Media is a Promise (here's how to keep it) by Graham D Brown Graham Brown
The document discusses how social media promises to let customer stories be heard, but many top-down brands fail to deliver on this promise. It argues that brands should adopt a "bottom-up" approach like Lego, where customers are given control to tell their own stories and connect with each other, rather than the brand dictating the conversation. This approach of empowering customers has helped transform Lego into a successful brand by creating meaningful engagement and connections to what customers love. Indifference, not dislike, is the true enemy for brands in the current environment where attention is scarce.
This study guide provides exercises to help students practice imperative verbs, prepositions of place and movement, and identifying locations on a map in preparation for an upcoming test. The guide includes questions asking for directions to get to various places in a city like the park, middle school, city hall, and cemetery. It also has exercises matching pictures to prepositions of place and completing sentences with prepositions.
How do Amazon, Apple and O2 create young fans?Graham Brown
1) The document discusses how Net Promoter Score (NPS), a metric that measures customer satisfaction and likelihood to recommend a brand, is often used ineffectively by mobile companies as just a score to track, rather than as a system to identify brand advocates and influencers.
2) It recommends that companies use NPS and other earned media indexes to identify "fans" - people who highly recommend and influence others about the brand - and focus marketing efforts on engaging these influencers, rather than just measuring overall customer satisfaction.
3) When used as a system to identify and activate fans, sometimes called "Fanspotting", NPS can help companies better understand their key influencers and communities and
Steve Jobs was an American businessman and inventor who co-founded Apple Inc. He is widely recognized for his role in the microcomputer revolution of the 1970s and 1980s. A quote from Jobs encourages following one's heart and intuition to discover what you are meant to become. Apple was founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne to develop and sell personal computers and has since become the world's largest technology company, known for popular products like the iPhone, iPad, and Mac computers.
Happy Birthday to you Munny. This short message contains a greeting wishing Munny a happy birthday. In just 3 words, it celebrates Munny's special day.
Hopelessly Addicted to Web 2.0, Social Networks, & FacebookDave McClure
This document discusses metrics for evaluating the success of web 2.0 startups and social networks. It outlines key elements for a successful web 2.0 project like APIs, user-generated content, social features, and monetization strategies. Metrics for tracking user acquisition, activation, retention, referral, and revenue are presented. A brief history of social networks from early platforms to today's large networks like Facebook is also summarized.
Presentation given at Internet Librarian International 2013 - 15 October 2013 on developments in social media - looking at new social media tools, the importance of mobile for social media, etc.
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and WatsonCapgemini
In our current world, we need to manage very big systems with a huge number of assets. In this context, SOGETI developed by leveraging IBM software and IBM Cloud solutions (Bluemix, Watson) a cockpit to control and command those systems of systems. This session will present one of our first implementations of Cockpit for Big Systems (CBS) and Cockpit for Big IoT Systems (CBIoTS) for precision farming with Drotek—a new solution for better analysis and control of crop production with improved efficiency and reduced environmental impact.
Engaging Audiences, while Increasing Content MonetizationNEPA BlogCon
The document discusses strategies for improving engagement and unlocking revenue in today's mobile and social world. It recommends building a mobile-first strategy by making content fast, consistent, and easy to use across devices. The document also suggests generating unique, visual content like video and sharing emotion-provoking stories. Additionally, it provides tips for social media best practices and monetization strategies like Google ads and surveys to earn money from content. The overall aim is to think mobile-first, create unique content, leverage social media, and unlock revenue through various monetization methods.
Scarab4 is a branding and marketing agency that provides various services including branding, marketing materials, website development, PR, and videography. The document highlights some of Scarab4's recent work including developing brand identities and logos for new companies, refreshing brands, and creating marketing collateral, websites, PR coverage, and videos for clients across various industries.
TMA Mobile Solutions is a company that provides mobile development and testing services. For over 8 years, they have created mobile value-added services and applications for mobile operators, content providers, software developers, and enterprises. They have skills in technologies like WML, XHTML, SMS, GPS, and various mobile platforms. Some of their products include iPad VisualYellowPages, mobileAUCTION, mobileCALENDAR, and mobilePORTAL. They also offer outsourcing services for projects involving technologies like the iPhone SDK, CoreLocation, push notifications, and various APIs. Their aim is to shorten time to market for clients while keeping costs low through their offshore model and pre-built frameworks.
The Elephant in the Mobile Industry’s Room: WomenGraham Brown
The document discusses the lack of representation of women in the mobile industry and creative sectors. It argues that the mobile industry makes 5 common mistakes when trying to cater to women, such as creating pink products without understanding women's needs. While women are key drivers of mobile trends and innovation, the creative agencies advising the industry are run mostly by men. To better engage female customers, the industry needs to remove barriers and directly immerse itself in the lives of women to understand their perspectives.
Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEkevig
This paper contributes the first evaluation of neural
network models on NER task for Myanmar language. The experimental results show that those neural
sequence models can produce promising results compared to the baseline CRF model. Among those neural
architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This
work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing
as well as to promote further researches on this understudied language.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
In recent years, there has been an increasing use of social media among people in Myanmar and writing review on social media pages about the product, movie, and trip are also popular among people. Moreover, most of the people are going to find the review pages about the product they want to buy before deciding whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very important and time consuming for people. Sentiment analysis is one of the important processes for extracting useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar Language.
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
In recent years, there has been an increasing use of social media among people in Myanmar and writing
review on social media pages about the product, movie, and trip are also popular among people. Moreover,
most of the people are going to find the review pages about the product they want to buy before deciding
whether they should buy it or not. Extracting and receiving useful reviews over interesting products is very
important and time consuming for people. Sentiment analysis is one of the important processes for extracting
useful reviews of the products. In this paper, the Convolutional LSTM neural network architecture is
proposed to analyse the sentiment classification of cosmetic reviews written in Myanmar Language. The
paper also intends to build the cosmetic reviews dataset for deep learning and sentiment lexicon in Myanmar
Language.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
This document provides an overview of deep learning techniques for natural language processing. It begins with an introduction to distributed word representations like word2vec and GloVe. It then discusses methods for generating sentence embeddings, including paragraph vectors and recursive neural networks. Character-level models are presented as an alternative to word embeddings that can handle morphology and out-of-vocabulary words. Finally, some general deep learning approaches for NLP tasks like text generation and word sense disambiguation are briefly outlined.
The document discusses the evolution of language models from RNN to BERT. It begins with an overview of language models and how they estimate probabilities of word sequences. Neural language models using RNNs were developed to address limitations of N-gram models by allowing unlimited context, but training RNNs was difficult. ELMo used a bidirectional LSTM to learn deep contextual word representations in a two-stage training process. Transformers replaced RNNs with self-attention and were highly parallelizable. A milestone was BERT, which combined the benefits of ELMo (deep bidirectional context) and Transformers (attention mechanism).
Deep Learning for Natural Language ProcessingParrotAI
The document discusses deep learning approaches for natural language processing (NLP). It introduces NLP and common applications. Word representations like one-hot and distributed representations are covered, with a focus on Word2Vec models. Recurrent neural networks (RNNs) are described as useful for sequential language data, including variants like bidirectional RNNs and applications such as neural machine translation and sentiment analysis.
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
Presentazione al Meetup di Marzo del Machine Learning / Data Science Meetup di Roma: https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/248063386/
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
An expert system for automatic reading of a text written in standard arabicijnlc
In this work we present our expert system of Automatic reading or speech synthesis based on a text
written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data
base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is
done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text with
the aim of transforming it into his corresponding phonetics sequence, and secondly by the generation of
the voice signal which corresponds to the chain transcribed. We spread out the different of conception of
the system, as well as the results obtained compared to others works studied to realize TTS based on
Standard Arabic.
Word2vec on the italian language: first experimentsVincenzo Lomonaco
Word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent years. The vector representations of words learned by word2vec models have been proven to be able to carry semantic meanings and are useful in various NLP tasks. In this work I try to reproduce the previously obtained results for the English language and to explore the possibility of doing the same for the Italian language.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET Journal
This document discusses using deep learning approaches like long short-term memory (LSTM) neural networks to generate suggestions for erroneous parts of sentences in Indian languages. Indian languages pose unique challenges due to their morphological richness and structure differences from English. The document reviews natural language processing techniques like recurrent neural networks, convolutional neural networks, and LSTMs. It proposes using LSTMs to model sentence structure and generate possible corrections for errors in an unsupervised manner. The goal is to develop this technique for morphologically complex Indian languages like Malayalam.
Similar to A Vietnamese Language Model Based on Recurrent Neural Network (20)
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
This document provides an introduction to big data, including definitions of big data, where big data comes from, and the value of big data. It describes the Hadoop ecosystem as a framework for distributed storage and processing of large data sets. Key components of Hadoop include HDFS for storage, MapReduce for processing, and related tools like Pig, Hive, HBase, Sqoop, Kafka and Oozie. Machine learning platforms like H2O and Spark MLlib are also discussed. The document provides an overview of starting research in big data and common big data job titles.
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
- There is a need for a national real estate database in Vietnam to increase transparency in the market. Big data analytics can help by indexing millions of property listings and providing insights like price timelines.
- The author proposes using geographically weighted regression on Spark to perform large-scale automatic property appraisals. Experiments show their distributed approach scales to large training and regression datasets and outperforms other methods on cluster.
- A prototype applies their methods to predict land values and create heat maps, demonstrating the potential for real estate analytics to benefit investors and buyers in Vietnam. However, more research is still needed to uncover additional hidden values from real estate data.
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
Large-Scale Geographically Weighted Regression on SparkViet-Trung TRAN
Geographically Weighted Regression (GWR) is a local version of spatial regression that captures spatial dependency in regression analysis. GWR has many application in practice as a visualization and prediction tool for spatial exploration- (e.g in climate, economy, medical). However, this locally regression model is slow in process upon the volume of calculations and the spatial getting bigger. Improving performance of GWR is an critical issue, but their distributed implementations have not been studied. Recently, with the advent of Spark as well MapReduce framework, the development of machine learning applications and parallel programming becomes easier. In this article, we propose several large-scale implementations of distributed GWR, leveraging Spark framework. We implemented and evaluated these approaches with large datasets. To our best knowledge, this is the first work addressing GWR at large-scale.
Recent progress on distributing deep learningViet-Trung TRAN
This document summarizes recent progress in distributed deep learning. It discusses the state of the art in neural networks and deep learning, as well as factors driving advances in deep learning like big data and increased computing power. It then covers approaches for scaling deep learning through model parallelism, data parallelism, and distributed training frameworks. Several deep learning applications developed in Vietnam are presented as examples, including optical character recognition and predictive text. The document concludes with principles for machine learning system design in distributed settings.
This document provides guidance on developing successful project proposals. It outlines key steps like finding the right funding call, developing a clear idea, forming a strategic partnership, and completing the application form while addressing the evaluation criteria. When completing the form, it advises having clear rational and objectives, a logical framework, good writing in the required language, relevant references and experience, and a realistic budget. The conclusion emphasizes aligning with call objectives, a clear and logical proposal, understandable writing, a strong partnership, good management and technical references, and not sharing proposals with others.
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
This document discusses using deep learning techniques like LSTM and CTC for optical character recognition (OCR), specifically for Vietnamese documents. It provides an overview of OCR, the history including Tesseract, and challenges with traditional approaches. Connectionist temporal classification (CTC) is introduced as a way to directly train RNNs on unsegmented sequence data. CTC combined with LSTM networks allows for end-to-end training of OCR without needing pre-segmented text. The document demonstrates how this approach can be applied to perform OCR on Vietnamese documents.
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
This document presents GPSInsights, a framework for efficiently storing and mining massive real-time vehicle location data. The framework uses an architecture based on scalable open-source components like Apache Spark, Storm, MongoDB and GeoMesa to store GPS data in real-time and perform mining tasks like scalable map-matching. The document describes experiments on 12 million GPS records which showed the framework can complete map-matching within seconds and scale to support millions of vehicles.
The document discusses the history and development of artificial neural networks and deep learning. It describes early neural network models like perceptrons from the 1950s and their use of weighted sums and activation functions. It then explains how additional developments led to modern deep learning architectures like convolutional neural networks and recurrent neural networks, which use techniques such as hidden layers, backpropagation, and word embeddings to learn from large datasets.
This document discusses decision trees and random forests for classification problems. It explains that decision trees use a top-down approach to split a training dataset based on attribute values to build a model for classification. Random forests improve upon decision trees by growing many de-correlated trees on randomly sampled subsets of data and features, then aggregating their predictions, which helps avoid overfitting. The document provides examples of using decision trees to classify wine preferences, sports preferences, and weather conditions for sport activities based on attribute values.
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
This document provides an overview of recommender systems, including content-based and collaborative filtering approaches. It discusses how content-based systems make recommendations based on item profiles and calculating similarity between user and item profiles. Collaborative filtering is described as finding similar users and making predictions based on their ratings. The document also covers evaluation metrics, complexity issues, and tips for building recommender systems.
Locality Sensitive Hashing (LSH) is a technique for finding similar items in large datasets. It works in 3 steps:
1. Shingling converts documents to sets of n-grams (sequences of tokens). This represents documents as high-dimensional vectors.
2. MinHashing maps these high-dimensional sets to short signatures or sketches, in a way that preserves similarity according to the Jaccard coefficient. It uses random permutations to select the minimum value in each permutation.
3. LSH partitions the signature matrix into bands and hashes each band separately, so that similar signatures are likely to hash to the same buckets. Candidate pairs are those that share buckets in one or more bands, reducing
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
SVD (singular value decomposition) is a technique for dimensionality reduction that decomposes a matrix A into three matrices: U, Σ, and V. U and V are orthogonal matrices that represent the left and right singular vectors of A. Σ is a diagonal matrix containing the singular values of A in descending order. SVD can be used to reduce the dimensionality of data by projecting it onto only the first few principal components represented by the top singular vectors in U and V. This provides an interpretation of the data in a lower dimensional space while minimizing reconstruction error.
This document provides an overview of mining massive datasets based on a Stanford University course. It defines data mining as the process of discovering knowledge from large amounts of data. The document outlines that data mining involves storing, managing, and analyzing data to discover patterns and make predictions. It also discusses different data mining tasks like descriptive and predictive methods, as well as challenges in ensuring the meaningfulness of results.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
A Vietnamese Language Model Based on Recurrent Neural Network
1. A Vietnamese Language Model
Based on Recurrent Neural Network
Viet-Trung Tran∗, Kiem-Hieu Nguyen∗, Duc-Hanh Bui†
Hanoi University of Science and Technology
∗Email: {trungtv,hieunk}@soict.hust.edu.vn
†Email: hanhbd.hedspi@gmail.com
Abstract—Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text predic-
tion, machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
I. INTRODUCTION
One of fundamental tasks in NLP is statistical language
modeling (SLM), statistically formalized as inferring a proba-
bility distribution of word sequence. SLM has many applica-
tions such as machine translation, speech recognition and text
prediction [1], [2].
N-gram has been de-facto for language modeling [3]. An
n-gram model predicts the next word using (n − 1) preceding
words in the context p(w|w0...wn−2). Such probability is
estimated by occurrence frequency of word sequences of
length n from a large text collection. At n = 2, 3, 4, or 5,
the model captures frequent word patterns as long as enough
training texts are provided. However, as n increases, it faces
the curse of dimensionality issue.
• Firstly, given a context, it could only predict a word that
has been “seen” before with the context in the training
texts.
• Secondly, it can only “see” a few words back, i.e. (n −
1) preceding words. A longer context, such as previous
sentences, couldn’t be considered in predicting the next
word. This is not intuitively the way human learn and
capture a context.
Neural language models, i.e. neural networks-based lan-
guage model, successfully surpassed these two shortcoming
[4], [5].
For the first shortcoming, similarly to an n-gram model, a
feed-forward neural network (FFNN) predicts the next word
using preceding n−1 words context [4]. The idea is distributed
word representations learned by the network can capture words
similarity. For instance, by capturing similarity between ‘con
chó’ (dog) and ‘con mèo’ (cat), ‘căn phòng’ (room) and ‘phòng
ngủ’ (bedroom), ‘chạy’ (running) and ‘đi’ (walking), and so
on, the model could generate the sentence “Một con chó đang
chạy trong căn phòng” (A dog was running in a room) given
the sentence “Con mèo đang đi trong phòng ngủ” (The cat is
walking in the bedroom). Their model reportedly outperforms
n-gram language model on benchmark datasets.
For the second shortcoming, although the avantage of re-
current neural networks (RNNs) over Feedforward neural net-
works (FNNs) is still a debating subject [6], RNNs explicitly
allow learning from arbitrary long context using a recurrent
mechanism within the network [5], [7]. The long-term context
is particularly useful in learning via the long-short-term mem-
ory architecture [8]. Experimental results reported in [5] show
that an RNN-based language model outperforms its FFNN-
based counterpart.
One may argue for n-gram approach for its simplicity
and efficiency. But as hardware is more and more evolving;
and efficient learning algorithms for neural networks such as
[9] are proposed, neural language modeling is promising. In
another direction, [10] proposes a multi-modality framework
for jointly modeling images and texts in a unified neural
language model.
In this work, we investigate in experimenting RNNs for
Vietnamese language model. Our contributions are summa-
rized as follows:
• Building an RNN-based syllable-level language model for
Vietnamese.
• Building an RNN-based character-level language model
for Vietnamese.
• Running extensive experiments on a 24M syllables
dataset constructed from 1,500 movie subtitles.
The rest of the paper is organized into four sections:
Section II discusses our main contributions of building RNN-
based language models for Vietnamese; Section III records
our test-bed and results of the experiments. Related works are
introduced in Section IV. We conclude the paper with final
remarks in Section V.
II. RNN-BASED LANGUAGE MODELING FOR VIETNAMESE
In this section, we first present RNN-based language mod-
eling (Section II-A. We then briefly introduce some impor-
tant characteristics of Vietnamese (Section II-B). Finally, we
proposed RNN-based language models at both character and
syllable levels for Vietnamese (Sections II-C and II-D).
2. A. RNN
Given a training sequence (x1, ..., xT ), an RNN computes
the output vectors (o1, ..., oT ) and then applies the softmax
function which maximizes the total log probability of the
obtained predictive distribution P(xt+1|x0x1...xt). In a simple
configuration, our RNN is constructed with three layer: an
input layer x, a hidden layer s and an output layer y. At a
time t, input, output and hidden layer state of the network
are denoted as x(t), o(t), and s(t) respectively. As our
network is recurrent, the previous hidden layer state s(t − 1)
is intentionally input to the hidden layer s(t). Mathematically,
these layers are computed as follows:
s(t) = f(Ux(t) + Ws(t − 1)) (1)
o(t) = softmax(V s(t)) (2)
U, W, V denote the parameters of our RNN to be learn. f
is an activation function such as sigmoid, ReLU, and tanh.
Our network and its unfolding in time of the computation are
shown in Figure 1.
Figure 1. An RNN and its unfolding in time of the computation
RNN has the following characteristics:
• The hidden state s(t) is seen as the memory of the
network. s(t) captures all the information of the previous
time steps. In practice due to the complexity of the train-
ing algorithm, s(t) computation only takes into account
a limited number of recently previous steps.
• RNN is different from other type of neural network in
which it uses the same parameters across all steps. More
precisely, (U, V, W) are shared and are independent of the
inputs. This reduce the number of parameters to learn.
As clearly discussed in [11], training a RNN with gradient
decent is practically difficult. This means that RNN is hard
to learn long-term dependencies as the gradient blows up or
vanishes exponentially over time. This is called the vanishing
gradient problem. To address this problem, the hidden layer
within our RNN will be implemented with a so-called long
short term memory (LSTM) layout.
LSTM cells (in hidden layer) are designed to be capable
of learning long-term dependencies (Figure 2). It is first
introduced in [12]. Each LSTM cell consists of three gates to
control the information into the cell and from the cell output
back to the network. The forget gate decides whether the cell
state at time (t − 1) will be filtered out or not. The input
gate regulates the inputs. Similarly, the output gate activates
or deactivates its outputs to the network (Figure 2).
Figure 2. An LSTM cell
Beside powering RNN with LSTM cells, an RNN can also
be stacking to form a deeper neural network. In this setting,
the output of the first layer will become the input of the
second and so on, as depicted in Figure 3. By going deeper,
RNN can capture more context, thus, better modeling the
sequence distribution [13]. Of course, this increases the model
parameters, training time, and requires more input dataset.
Figure 3. Stacking multiple networks
B. Characteristics of Vietnamese language
In this section, we briefly describe some important charac-
teristics of Vietnamese regarding under-word level language
modeling task.
Vietnamese scripts are written in Latin characters (‘kí tự’ or
‘chữ’ in Vietnamese). There are 29 characters in Vietnamese
alphabet, including 12 vowels and 17 consonants. There are
five accent marks in Vietnamese. An accent mark could be
attached to a vowel to modify pronounce tone, resulting in 60
additional ‘vowel’ variants (Table I [14]).
Characters are grouped into syllables (‘âm tiết’ in Viet-
namese). Syllables, in turn, are grouped into words, which are
the smallest meaningful unit in a language. Words could be
3. Initial char Variants
a [a, à, á, ả, ã, ạ, ă, ằ, ắ, ẳ, ẵ, ặ, â, ầ, ấ, ẩ, ẫ, ậ]
e [e, è, é, ẻ, ẽ, ẹ, ê, ề, ế, ể, ễ, ệ]
o [o, ò, ó, ỏ, õ, ọ, ô, ồ, ố, ổ, ỗ, ộ, ơ, ờ, ớ, ở, ỡ, ợ]
u [u, ù, ú, ủ, ũ, ụ, ư, ừ, ứ, ử, ữ, ự]
i [i, ì, í, ỉ, ĩ, ị]
y [y, ỳ, ý, ỷ, ỹ, ỵ]
Table I
VOWEL CHARACTERS AND VARIANTS IN VIETNAMESE ALPHABET.
composed of a single syllable, e.g. ‘mua’ (to buy), or multiple
syllables, e.g. ‘nghiên cứu’ (research). Typically, words are
composed of two syllables, standing for 70% of words in a
general-domain Vietnamese dictionary [15].
Due to complicated originality of Vietnamese morphology
and lexicon, word segmentation is a fundamental yet chal-
lenging task in Vietnamese language processing [16]. In this
paper, we shall let word-level language model for further
investigation. We instead focus on sub-word levels, i.e. char
and syllable levels.
C. Character-Level Language Model
At character level, our objective is to predict the next
character given a sequence of previous characters. The input X
is thus a character sequence. x(t) ∈ X is the character at time
t. Since the number of character (char-level vocabulary size
V ) is small, we directly use the one-hot-encoding. Each x(t)
is a vector of size V with the component corresponding to the
character itself having value one and all the other components
having value zero.
The output o(t) at each time step t is appropriately a vector
of size Vchar. Specifically, each component of o(t) indicates
the probability of generating that character giving the previous
context from 0 to t. Given h len as size of the hidden layer
s (Section II-A), followings are domain space of our learning
parameters:
x(t) ∈ RVchar
o(t) ∈ RVchar
s(t) ∈ Rh len
U ∈ RVchar×h len
V ∈ Rh len×Vchar
W ∈ RVchar×Vchar
(3)
D. Syllable-Level Language Model
At syllable level, our objective is to predict the next syllable
given a sequence of previous syllables. Therefore, x(t) would
be the one-hot encoding of the syllable at time t. We observe
that the number of all Vietnamese syllables, Vsyl, is about
7000. Given Vsyl and h len as sizes of syllable vocabulary
and the hidden layer s, respectively, the domain space of our
learning parameters is the same as in Equation 3 except with
a much bigger Vsyl.
III. EXPERIMENTS
A. Datasets
This work focuses on conversational data instead of general
texts (e.g. from news articles). Therefore, we have manually
collected movie subtitles from 1,500 movies available on the
Internet. Subtitles were contributed by volunteers translators
from its original languages to Vietnamese. For instance, three
sequences {“How many bags can you carry?”, “If you don’t
think for yourself”, “think for your mother!”} were extracted
from the part as shown in Figure 4.
132
00:07:33,920 –> 00:07:35,560
Chú có thể vác được bao nhiêu bao cơ chứ?
133
00:07:35,680 –> 00:07:36,760
Nếu chú không nghĩ đến bản thân...
134
00:07:36,880 –> 00:07:39,280
...thì cũng phải nghĩ đến mẹ ở nhà!
Figure 4. A sample movie subtitle original file with triples of (frame number,
frame offset, and subtitle).
We picked movie subtitle for two reasons. First, it is widely
available and public over Internet. Second, the language itself
is close to conversational Vietnamese, generated by thousands
of contributors, thus barely imply specific writing style.
The following steps were processed to clean data:
• Tokenizing: Each sequence is tokenized into tokens using
a regular expression processor. A token could be a
syllable, a foreign word, or a punctuation mark.
• Detecting and removing incomplete translations: We
could not expect high commitment from volunteer trans-
lators. In fact, we found many subtitles which were par-
tially translated. Fortunately, most of the origin subtitles
were in English. Such origin subtitles obviously generate
noise when learning a language model for Vietnamese. In
order to detect and remove such subtitles, we use Prince-
ton WordNet ([17]) to extract an English vocabulary. We
then scan each subtitle for tokens that could be found
in the vocabulary. If a half of all tokens matches this
criteria, the subtitle is considered untranslated. We also
used langdetect package1
to filter out sentence others than
in Vietnamese. Manual observation on a random subset
of removed subtitles shows that this heuristic filter works
sufficiently for our purpose.
• Normalizing foreign words: For tokens that were detected
by the English vocabulary, the tokens were all assigned
the value ‘OOV’.
• Adding start symbol: Each sequence in our dataset is
a subtitle displayed in a movie frame. As seen in
conversational data and particularly movie subtitle data,
a sequence is not necessarily a complete sentence or
1https://pypi.python.org/pypi/langdetect
4. syntactic phrase in linguistics. Unlike general texts, this
characteristic is challenging for language modeling. To
signify the beginning of a sequence, we added a ‘START’
symbol ahead.
• Punctuation: We keep all punctuation as in original sub-
titles.
Data for syllable models: Since the inputs to our RNNs
are vectors, not strings, we create a mapping between syl-
lables and indices, INDEX TO SYLLABLE, and SYLLA-
BLE TO INDEX. For example, a syllable ‘anh’ has index
1043, which holds two entries in the two mapping tables. A
training sample x looks like [0, 56, 78, 454] where 0 is the
index of ‘START’.
Data for char models: Similarly to syllable models, a
mapping between characters and indices have to be cre-
ated using two mapping tables, INDEX TO CHAR and
CHAR TO INDEX. Again, 0 is the index of ‘START’.
B. Evaluation
We built four neural language models on the Tensor-
flow framework [18], with three variable parameter values
10, 20, 40 of the sequence length:
• Char-1-layer-40-seq: A simple char-level model with
only one hidden layer and sequence length of 40.
• Char-3-layer-40-seq: A more complicated char-level
model with three hidden layers and sequence length of
40.
• Syl-1-layer-[10,20,40]-seq: Three variant of a simple
syllable-level model with one hidden layer and sequence
length of 10, 20, and 40, respectively.
• Syl-3-layer-[10,20,40]-seq: Three variant of a deep
syllable-level model with three hidden layers and se-
quence length of 10, 20, and 40, respectively.
BASELINE: We used a bi-gram language model as base-
line2
. To confront unknown words, we used an add-one
Laplace smoothing.
Model PPL
Char-1-layer-40-seq 116
Char-3-layer-40-seq 109
Syl-1-layer-10-seq 85.6
Syl-3-layer-10-seq 89.1
Syl-1-layer-20-seq 83.8
Syl-3-layer-20-seq 95.5
Syl-1-layer-40-seq 99.7
Syl-3-layer-40-seq 95.7
Bi-gram 145
Table II
PERPLEXITY PER SYLLABLE ON OUR MODELS.
We measured the perplexity which is the inverse probability
of the input sequence. The perplexity is computed as in 4:
PPL = N
N
i=1
1
P(wi|w1...wi−1)
(4)
2Note that this is not the most accurate in n-gram approach. However, we
select this model because it is easy to implement, parameter-free, and is a
relatively strong baseline.
Where N is the number of words and P(wi|w1...wi−1) is the
probability of word wi given the previous sequence wi...wi−1.
As shown in Table II, all neural models significantly out-
perform the bi-gram baseline. Our best model obtained a
perplexity of 83.8 for syllable-based, one layer and trained
with RNN sequence length of 20. Note that RNN sequence
length signifies the maximum size of an unfolded recurrent
network, which means the largest context used to generate a
new character/syllable. This shows that increasing the context
size to 40 in syllable model does not help reduce overall
perplexity. This could be due to the nature of our training
corpus: it contains mostly conversational short sentences. We
let the investigation of this hypothesis as a future work.
Intuitively, the character language models in our settings
are not better than the syllable models as they have to first
predict multiple characters in order to form a syllable. Stacking
3-layers RNN increases the number of learning parameters,
thus giving the models more expressive power to predict new
words. However, in the cases of Syl-3-layer-[10,20]-seq, more
number of learning parameters hurts perplexity scores.
We show below in Figure 5 several predictions from our
best-performed syllable model i.e. Syl-1-layer-20-seq in Ta-
ble II.
START cô
Suggestions 1: ấy, 2: OOV, 3:
——————————————
START cô ấy
Suggestions 1: , 2: bị, 3: đã
——————————————
START cô ấy là
Suggestions 1: OOV, 2: một, 3: người
——————————————
START cô ấy là bác
Suggestions 1: sĩ, 2: sỹ, 3:
——————————————
START cô ấy là bác sĩ
Suggestions 1: , 2: OOV, 3: của
——————————————
START cô ấy là bác sĩ tâm
Suggestions 1: thần, 2: lý, 3: lí
——————————————
START cô ấy là bác sĩ tâm lý
Suggestions 1: , 2: của, 3: OOV
Figure 5. A sample syllable prediction that suggested 3 options with highest
probabilities for the sentence: "cô ấy là bác sĩ tâm lý" (she is a psychiatrist).
IV. RELATED WORK
This section discusses related works in language modeling
for Vietnamese. Language modeling is an important NLP
tasks. However, to our knowledge, only a handful works focus
on this task for Vietnamese [19], [20], [21].
In a pioneering work on language modeling for automatic
speech recognition in Vietnamese, [19] applies n-gram statis-
tical language modeling to train on a corpus of 10M sentences
5. collected from Web documents. They report a perplexity
per word of 252 on the best configuration. A priori and
a posterior methods for calculating perplexity over n-gram
language models for Vietnamese are also discussed in [20].
Perhaps closer to our direction is the work presented in
[21]. They investigate neural language model at syllable level.
Experiments were conducted on a corpus of 76k sentences in
Vietnamese text. As reported, a feed forward neural language
model achieves a perplexity of 121.
Word-level RNN language models can also be built for
Vietnamese providing word segmented texts. Both word and
syllable-level language models have the drawback of learning
new words. They need to be retrained upon adding new
words to the vocabulary. On the other hand, character level
language model has no problem with learning new words since
words are constructed by only predefined characters. Hence, a
character level language model not only predicts the next word,
but also completes current word. We also plan to investigate
as embedding vectors trained from a large text collection to
better capture word semantic of meaning and to reduce vector
size.
V. CONCLUSIONS
This paper presents a neural language model for Viet-
namese. We investigate RNNs at both character and syllable
levels. Experimental results show that the best configuration
yields a perplexity of 83.8, which is a reasonable performance
for a language model. All of our models outperform n-gram
language model in the experiments.
For future work, we plan to investigate RNNs at word level.
We anticipate that embedding representations using techniques
like Convolutional Neural Networks will also improve both
accuracy and efficiency of our approach.
ACKNOWLEDGMENT
This work is supported by Hanoi University of Science and
Technology (HUST) under project No. T2016-PC-047
REFERENCES
[1] R. Rosenfeld, “Two Decades Of Statistical Language Modeling: Where
Do We Go From Here?” in Proceedings of the IEEE, vol. 88, 2000.
[Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.29.463
[2] K. Yao, G. Zweig, M. Hwang, Y. Shi, and D. Yu,
“Recurrent neural networks for language understanding,”
in INTERSPEECH 2013, 14th Annual Conference of the
International Speech Communication Association, Lyon, France,
August 25-29, 2013, 2013, pp. 2524–2528. [Online]. Available:
http://www.isca-speech.org/archive/interspeech 2013/i13 2524.html
[3] S. F. Chen and J. Goodman, “An empirical study of smoothing
techniques for language modeling,” in Proceedings of the 34th
Annual Meeting on Association for Computational Linguistics, ser.
ACL ’96. Stroudsburg, PA, USA: Association for Computational
Linguistics, 1996, pp. 310–318. [Online]. Available: http://dx.doi.org/
10.3115/981863.981904
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A neural
probabilistic language model,” J. Mach. Learn. Res., vol. 3, pp.
1137–1155, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.
cfm?id=944919.944966
[5] T. Mikolov, M. Karafi´at, L. Burget, J. Cernock´y, and S. Khudanpur,
“Recurrent neural network based language model,” in INTERSPEECH
2010, 11th Annual Conference of the International Speech Communi-
cation Association, Makuhari, Chiba, Japan, September 26-30, 2010,
2010, pp. 1045–1048.
[6] M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg, R. Schl
”uter, and H. Ney, “Comparison of feedforward and recurrent neural
network language models,” in IEEE International Conference on Acous-
tics, Speech, and Signal Processing, Vancouver, Canada, May 2013, pp.
8430–8434.
[7] T. Mikolov and G. Zweig, “Context dependent recurrent neural network
language model,” in 2012 IEEE Spoken Language Technology Workshop
(SLT), Miami, FL, USA, December 2-5, 2012, 2012, pp. 234–239.
[Online]. Available: http://dx.doi.org/10.1109/SLT.2012.6424228
[8] M. Sundermeyer, R. Schl
”uter, and H. Ney, “Lstm neural networks for language modeling,” in
Interspeech, Portland, OR, USA, Sep. 2012, pp. 194–197.
[9] F. Morin and Y. Bengio, “Hierarchical probabilistic neural network
language model,” in Proceedings of the Tenth International Workshop
on Artificial Intelligence and Statistics, AISTATS 2005, Bridgetown,
Barbados, January 6-8, 2005, 2005. [Online]. Available: http:
//www.gatsby.ucl.ac.uk/aistats/fullpapers/208.pdf
[10] R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Multimodal neural
language models,” in Proceedings of the 31th International Conference
on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014,
2014, pp. 595–603. [Online]. Available: http://jmlr.org/proceedings/
papers/v32/kiros14.html
[11] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependen-
cies with gradient descent is difficult,” IEEE Transactions on Neural
Networks, vol. 5, no. 2, pp. 157–166, 1994.
[12] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available:
http://dx.doi.org/10.1162/neco.1997.9.8.1735
[13] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct
deep recurrent neural networks.” CoRR, vol. abs/1312.6026, 2013.
[14] K. Nguyen and C. Ock, “Diacritics restoration in vietnamese: Letter
based vs. syllable based model,” in PRICAI 2010: Trends in Artificial
Intelligence, 11th Pacific Rim International Conference on Artificial
Intelligence, Daegu, Korea, August 30-September 2, 2010. Proceedings,
ser. Lecture Notes in Computer Science, B. Zhang and M. A. Orgun,
Eds., vol. 6230. Springer, 2010, pp. 631–636.
[15] D. Dinh, P. Pham, and Q. Ngo, “Some lexical issues in building
electronic vietnamese dictionary,” in Proceedings of PAPILLON-2003
Workshop on Multilingual Lexical Databases, 2003.
[16] H. P. Le, N. T. M. Huyˆen, A. Roussanaly, and H. T. Vinh, “A hybrid
approach to word segmentation of vietnamese texts,” in Language and
Automata Theory and Applications, Second International Conference,
LATA 2008, Tarragona, Spain, March 13-19, 2008. Revised Papers,
ser. Lecture Notes in Computer Science, C. Mart´ın-Vide, F. Otto, and
H. Fernau, Eds., vol. 5196. Springer, 2008, pp. 240–249.
[17] G. A. Miller, “Wordnet: A lexical database for english,” Commun.
ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995. [Online]. Available:
http://doi.acm.org/10.1145/219717.219748
[18] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale
machine learning on heterogeneous distributed systems,” arXiv preprint
arXiv:1603.04467, 2016.
[19] V. B. Le, B. Bigi, L. Besacier, and E. Castelli, “Using the web for fast
language model construction in minority languages,” in 8th European
Conference on Speech Communication and Technology, EUROSPEECH
2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4,
2003. ISCA, 2003.
[20] L. Q. Ha, T. T. T. Van, H. T. Long, N. H. Tinh, N. N. Tham,
and L. T. Ngoc, “A Posteriori individual word language models
for vietnamese language,” IJCLCLP, vol. 15, no. 2, 2010. [Online].
Available: http://www.aclclp.org.tw/clclp/v15n2/v15n2a2.pdf
[21] A. Gandhe, F. Metze, and I. R. Lane, “Neural network language models
for low resource languages,” in INTERSPEECH 2014, 15th Annual
Conference of the International Speech Communication Association,
Singapore, September 14-18, 2014, H. Li, H. M. Meng, B. Ma, E. Chng,
and L. Xie, Eds. ISCA, 2014, pp. 2615–2619.