NLP Town's Yves Peirsman talk about how word embeddings, LSTMs/RNNs, Attention, Encoder-Decoder architectures, and more have helped NLP forward, including which challenges remain to be tackled (and some techniques to do just that).
Technical Development Workshop - Text Analytics with PythonMichelle Purnama
In this workshop, we covered:
- How to code using Python on Jupyter Notebook
- What is NLP (Natural Language Processing) and how it is useful to analyze sentiments of a given text
- Hands-on coding exercises on how to use Python and its libraries to perform tokenization, stopwords removal, stemming and lemmatization, and POS tagging.
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
Parallel corpora play a central role in current approaches to machine and computer-assisted translation and also in any corpus-based study that involves original text and its translation. This talk motivates the use of parallel data, as well as its desired properties. It then introduces practical methodologies to automatically acquire and prepare parallel data for the task at hand. Finally, it glances at the neighbouring field of Translation Studies to assert that translations can differ to a great extent depending on the strategy followed by the translator, which might lead to the translation being more or less appropriate for its use in corpus-based studies.
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
This was a presentation given at the European Patent Office's annual Patent Information Conference in Madrid, Spain on November 10th, 2016.
In it, we give an overview of how machine translation works, latest advances in neural MT, and how this can be applied to patents and intellectual property content, not only for translations but also information extraction and other NLP applications.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
In this slides the basic concept of machine translation is described.MT challenges are represented and describes rule-based and statistical MT briefly. Some notes about evaluation is described too
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Technical Development Workshop - Text Analytics with PythonMichelle Purnama
In this workshop, we covered:
- How to code using Python on Jupyter Notebook
- What is NLP (Natural Language Processing) and how it is useful to analyze sentiments of a given text
- Hands-on coding exercises on how to use Python and its libraries to perform tokenization, stopwords removal, stemming and lemmatization, and POS tagging.
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
Parallel corpora play a central role in current approaches to machine and computer-assisted translation and also in any corpus-based study that involves original text and its translation. This talk motivates the use of parallel data, as well as its desired properties. It then introduces practical methodologies to automatically acquire and prepare parallel data for the task at hand. Finally, it glances at the neighbouring field of Translation Studies to assert that translations can differ to a great extent depending on the strategy followed by the translator, which might lead to the translation being more or less appropriate for its use in corpus-based studies.
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
This was a presentation given at the European Patent Office's annual Patent Information Conference in Madrid, Spain on November 10th, 2016.
In it, we give an overview of how machine translation works, latest advances in neural MT, and how this can be applied to patents and intellectual property content, not only for translations but also information extraction and other NLP applications.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
In this slides the basic concept of machine translation is described.MT challenges are represented and describes rule-based and statistical MT briefly. Some notes about evaluation is described too
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
Video recording of full talk: http://lectures.ms.mff.cuni.cz/view.php?rec=255
There has been an increasing interest in using statistical machine translation (SMT) for the task of Grammatical Error Correction (GEC) for English-as-a-Second-Language (ESL) learners. Two of the three highest-scoring systems of the CoNLL-2014 Shared Task were SMT-based. The currently highest-scoring result published for the CoNLL-2014 test set has been achieved by a system combination of the five best CoNLL-2014 submissions built with MEMT (a tool for MT system combination). In this talk, we demonstrate how a single SMT-based system can match and outperform the result of the mentioned combined system. Furthermore, this system outperforms any other published results (including our own CoNLL-2014 submission) for a single system by a margin of several percent F-Score when the same training data is being used. These results are achieved by adapting current state-of-the art methods for phrase-based SMT specifically to the problem of GEC.
We report on the effects of:
- Parameter tuning for GEC
- Introducing GEC-specific dense and sparse features
- Using large-scale data
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
A Simple Explanation of the paper XLNet(https://arxiv.org/abs/1906.08237).
It would be helpful to get to grips with the concepts XLNet before you dive into the paper.
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
Learning with limited labelled data in NLP: multi-task learning and beyondIsabelle Augenstein
When labelled training data for certain NLP tasks or languages is not readily available, different approaches exist to leverage other resources for the training of machine learning models. Those are commonly either instances from a related task or unlabelled data.
An approach that has been found to work particularly well when only limited training data is available is multi-task learning.
There, a model learns from examples of multiple related tasks at the same time by sharing hidden layers between tasks, and can therefore benefit from a larger overall number of training instances and extend the models' generalisation performance. In the related paradigm of semi-supervised learning, unlabelled data as well as labelled data for related tasks can be easily utilised by transferring labels from labelled instances to unlabelled ones in order to essentially extend the training dataset.
In this talk, I will present my recent and ongoing work in the space of learning with limited labelled data in NLP, including our NAACL 2018 papers 'Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces [1] and 'From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings’ [2].
[1] https://t.co/A5jHhFWrdw
[2] https://arxiv.org/abs/1802.09375
==========
Bio from my website http://isabelleaugenstein.github.io/index.html:
I am a tenure-track assistant professor at the University of Copenhagen, Department of Computer Science since July 2017, affiliated with the CoAStAL NLP group and work in the general areas of Statistical Natural Language Processing and Machine Learning. My main research interests are weakly supervised and low-resource learning with applications including information extraction, machine reading and fact checking.
Before starting a faculty position, I was a postdoctoral research associate in Sebastian Riedel's UCL Machine Reading group, mainly investigating machine reading from scientific articles. Prior to that, I was a Research Associate in the Sheffield NLP group, a PhD Student in the University of Sheffield Computer Science department, a Research Assistant at AIFB, Karlsruhe Institute of Technology and a Computational Linguistics undergraduate student at the Department of Computational Linguistics, Heidelberg University.
Here is a presentation I created quite a few years back when giving a presentation to students on programming languages. I have updated it with some recent trends.
Moving to neural machine translation at google - gopro-meetupChester Chen
Main Talk: Google's Neural Machine Translation System and Research progress
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. In this talk, I will talk about the model architecture, word-pieces design, training algorithm and how to make training/serving faster. Possibly I will mention about the zero-shot for Multilingual model as well. Also, I will cover what/how translation research makes continuous progress from last year.
Speaker:Xiaobing Liu
Xiaobing Liu is Google Brain Staff software engineer and machine learning researcher. In his work, Xiaobing focuses on Tensorflow and some key applications where Tensorflow could be applied to improve Google products, such as Google Search, Play recommendation and Google translation and Medical Brain. His research interests span from system to the practice of machine learning. His research contributions have been successfully implemented into various commercial products at Tencent, Yahoo. and Google He has served in the program committee for ACL 2017 and session chair for AAAI 2017, including publications at top conference such as Recsys, NIPS, ACL.
The best known natural language processing tool is GPT-3, from OpenAI, which uses AI and statistics to predict the next word in a sentence based on the preceding words. NLP practitioners call tools like this “language models,” and they can be used for simple analytics tasks, such as classifying documents and analyzing the sentiment in blocks of text, as well as more advanced tasks, such as answering questions and summarizing reports. Language models are already reshaping traditional text analytics, but GPT-3 was an especially pivotal language model because, at 10x larger than any previous model upon release, it was the first large language model, which enabled it to perform even more advanced tasks like programming and solving high school–level math problems. The latest version, called InstructGPT, has been fine-tuned by humans to generate responses that are much better aligned with human values and user intentions, and Google’s latest model shows further impressive breakthroughs on language and reasoning.
For businesses, the three areas where GPT-3 has appeared most promising are writing, coding, and discipline-specific reasoning. OpenAI, the Microsoft-funded creator of GPT-3, has developed a GPT-3-based language model intended to act as an assistant for programmers by generating code from natural language input. This tool, Codex, is already powering products like Copilot for Microsoft’s subsidiary GitHub and is capable of creating a basic video game simply by typing instructions. This transformative capability was already expected to change the nature of how programmers do their jobs, but models continue to improve — the latest from Google’s DeepMind AI lab, for example, demonstrates the critical thinking and logic skills necessary to outperform most humans in programming competitions.
Models like GPT-3 are considered to be foundation models — an emerging AI research area — which also work for other types of data such as images and video. Foundation models can even be trained on multiple forms of data at the same time, like OpenAI’s DALL·E 2, which is trained on language and images to generate high-resolution renderings of imaginary scenes or objects simply from text prompts. Due to their potential to transform the nature of cognitive work, economists expect that foundation models may affect every part of the economy and could lead to increases in economic growth similar to the industrial revolution.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. Deep Learning in NLP
2012
Deep Learning
● Comeback of neural networks
● Unified framework for many
problems
1990
Statistical NLP
● Machine learning from data
● Many different models
1950s
Rule-based NLP
● Hand-written linguistic rules
● Knowledge models
20??
??
4. Deep Learning in NLP
Basic models
The basic NLPer’s
toolkit
Main advantages
Why has DL become
so popular in NLP?
Beyond the hype
Deeper dive & recent trends
5. Word embeddings
Words as atomic units Words as dense embeddings
The movie has an excellent cast. M
I like the cover of the book. B
There were too many pages in the
novel.
?
book
novel
movie
7. Models
Distinct models for different problems Unified toolkit
Text
classification
NER MT ...
PP PP PP
SVM CRF LM TM
decoder
Text
classification
NER MT ...
LSTM (or similar)
8. NLP Toolkit: LSTM for classification
Applications: text classification, language modelling
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings
dense layer
weights
biases
The
movie
was
boring
.
positive
neutral
negative
9. NLP Toolkit: inside the LSTM
forget input tanh output
tanh
was boring
The movie ...
10. NLP Toolkit: LSTM for sequence labelling
Applications: named entity recognition
LSTM
LSTM
LSTM
LSTM
LSTM
embeddings dense layer
logits
B-PER
O
O
B-LOC
O
weights
biases
John
lives
in
London
.
11. NLP Toolkit: Encoder-Decoder Architectures
Applications: machine translation, text summarization, dialogue modelling, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.
12. NLP Toolkit: Attention
Applications: machine translation, question answering, etc.
LSTM
LSTM
LSTM
LSTM
source
embeddings
LSTM
LSTM
LSTM
LSTM
LSTM
target
embeddings
Je
t’
aime
.
<END>
I
love
you
.
attention
13. NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.
Many DL approaches do not model any linguistic knowledge.
They view language as a sequence of strings.
Is this the end of NLP as a separate discipline?
14. NLP under threat?
Deep learning models have taken NLP by storm, achieving superior
results across many applications.
16. Language models
● Great performance when explicitly trained for the task: 99% correct
○ > 120,000 sentence starts, labelled with singular or plural.
○ 50-dimensional LSTM followed by logistic regression.
○ In > 95% of the cases, the last noun determines the number.
● Performance drop for generic language models: 93% correct
○ Worse than chance on cases where a noun of the “incorrect” number occurs between the
subject and the verb
Linzen, Dupoux & Goldberg 2016, https://arxiv.org/pdf/1611.01368.pdf
17. Machine Translation
● NMT can behave strangely
● Problems for languages with a very different syntax, such as English and
Chinese:
○ 25% of Chinese noun phrases are translated into discontinuous phrases in English
○ Chinese noun phrases are often translated twice
Li et al. 2017, https://arxiv.org/abs/1705.01020
20. Deep Learning in NLP
● Deep Learning produces great results on many tasks.
● But:
○ Race to the bottom on standard data sets:
■ Language models: Penn Treebank, WikiText-2
■ Machine Translation: WMT datasets
■ Question Answering: SQuAD
○ Its ignorance of linguistic structure is problematic in the evolution towards NLU
● So:
○ What do neural networks model?
○ How can we make them better?
21. Linguistic knowledge in MT
● What linguistic knowledge does MT model?
● Simple syntactic labels
○ Encoder output + logistic regression
■ Word-level output: part-of-speech
■ Sentence-level output: voice (active or passive), tense (past or present)
● Deep syntactic structure
○ Encoder output + decoder to predict parse trees
● Two benchmarks:
○ Upper bound: neural parser
○ Lower bound: English-to-English “MT” auto-encoder
Shi et al. 2016, https://www.isi.edu/natural-language/mt/emnlp16-nmt-grammar.pdf
24. Linguistic knowledge in MT
Solution 1: present the encoder with both syntactic and lexical information
Li et al. 2017, https://arxiv.org/abs/1705.01020
26. Linguistic knowledge in MT
Solution 2: combine MT with
parsing in multi-task learning
Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
27. Linguistic knowledge in MT
Eriguchi et al. 2017
http://www.aclweb.org/anthology/P/P17/P17-2012.pdf
28. Linguistic knowledge in QA
● Most answers to questions are
constituents in the sentence.
● Restricting our candidate answers
to constituents reduces the search spaces.
● Instead of feeding the network flat
sequences, we need to feed it syntax trees.
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
29. Linguistic knowledge in QA
Xie and Xing 2017, http://www.aclweb.org/anthology/P/P17/P17-1129.pdf
30. Conclusions
● Deep Learning works great for NLP, but it is not a silver bullet.
● For simple tasks, simple string input may suffice, but for deeper natural
language understanding likely not.
● To tackle this challenge, we need to:
○ Better understand what neural networks model,
○ Help them model more linguistic knowledge,
○ Combine language with other modalities.
yves@nlp.town