SlideShare a Scribd company logo
1 of 59
Thomas Delteil – Machine Learning Scientist @ AWS AI
tdelteil@amazon.com
8th March 2018
Recent advances in
Natural Language Processing
Objective
- NLP domain overview
- Traditional methods
- Word Embeddings (word2vec)
- Contextualized word embeddings (ELMo)
- Bidirectional Encoder Representation from Transformers
(BERT)
- Generative Pre-Training 2 (GPT-2)
What is covered in NLP
Text classification
Language Modelling
𝑃(𝑤𝑡|𝑤𝑡−1, 𝑤𝑡−2, … )
See you later […]
alligator
today
𝑃(𝑤𝑡|𝑤𝑡+1, 𝑤𝑡+2, … )
[…] abhors a vacuum
Nature
Fido
David Gascoyne
Automatic Text Generation
http://botpoet.com
The crow crooked on more beautiful and free,
He journeyed off into the quarter sea.
His radiant ribs girdled empty and very
least beautiful as dignified to see.
The smooth plain with its mirrors listens to the cliff
Like a basilisk eating flowers.
And the children, lost in the shadows of the catacombs,
Call to the mirrors for help:
“Strong-bow of salt, cutlass of memory,
Write on my map the name of every river.”
Natural Language Understanding
Alexa, remind me to
buy groceries after work
Intent detection:
Create Reminder
Slot filling:
What
When
Where
Alexa, remind me to
buy groceries after work
Machine Translation
Sometimes, in the morning, I wonder whether AI bots will kill us all
時々、午前中に、AIボットが私たち全員を殺すのだろうか?
Text Summarization
A Neural Attention Model for Abstractive Sentence Summarization, Alexander M. Rush et al. 2015
Question Answering:
“Who was president when Barack Obama was born?”
John Fitzgerald Kennedy
Part of speech tagging
Sentence similarity
Commonsense Reasoning
Coreference Resolution
…
Classical Methods
Text representation:
Lexicon based  quickly explodes with N >> 10000
 Text preprocessing
Text Pre-Processing
I’d love to drive again in the mountainous roads of Crete.
I would love to drive again in the mountainous roads of crete.
I · would · love · to · drive · again · in · the · mountainous · roads · of · crete · .
I · would · love · to · drive · again · in · the · mountainous · roads · of · crete · .
would · love · drive · again · mountainous · roads · crete · .
would · love · drive · again · mountain · road · crete · .
Normalization Tokenization Stop words removal Lemmatization
Grapheme/Token representation: One-Hot encoding
Define words as a vector
I’d love to drive … preprocessing
drive
love
would
would love drive1
0
0
0
1
0
0
0
1
Sentence representation: Bag of words
Sum of one-hot encoded word vectors
I’d love to drive …
drive
love
would
I’d love to drive
Dictionary size = 3
If dictionary size >>> 1
Very sparse!
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
1
1
1
TF*IDF
Term frequency inverse document frequency
TF =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑜𝑐
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑟𝑚𝑠
IDF = 𝑙𝑛
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛
0
0
0
0
2.3
0
0
0.1
0
0
0
0
0
0
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
0
0
0
1.2
0
0
0
0.5
0
Classifiers
SVM
MLP
Naïve Bayes
XGBoost
Limitations: no semantic information
With one-hot encoding:
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
- -= = √2
2 2
|| vautomobile – vcar ||2 = || vautomobile – vmountain ||2 = √2
Ideally we would want:
|| vautomobile – vcar ||2 ≈ 0
Word order matters
• Context dependent information
• The place of the word in the sentence matters
My kindle is easy to use,
I do not need help
I do need help, my kindle
is not easy to use
Better grapheme representation
Better context understanding
Word2vec: Efficient Estimation of Word Representations in Vector Space
Mikolov et al. 13 2013
Learn word embeddings:
Skip-gram: predict context given center word
Continuous Bag of Words (CBOW): predict center word given context
CBOW model
… The cake is a lie …
Context words at
t-2 and t-1
Context words at
t+1 and t+2
Word to predict at t
Estimate: 𝑃(𝑤𝑡|𝑤𝑡−2, 𝑤𝑡−1, 𝑤𝑡+1, 𝑤𝑡+2)
Learning process ℒ = −log(𝑃 𝑤𝑡 𝑤𝑡−2, 𝑤𝑡−1, 𝑤𝑡+1, 𝑤𝑡 )
source: https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html
source: https://opensource.googleblog.com/2013/08/learning-meaning-behind-words.html
Using Word Representation in Neural Networks
Amazon is amazing
2910 79 1927
W2910 W79 W1927
W1
W2
…
Wi
…
W|V|
Neural Layers
?
Output Layer
{Wi} are the word embeddings. They are
parameters that the Neural Networks
can modify through. Can be pre-trained.
indexing
lookup
1
|V|
N
Recurrent Neural Networks
Recurrent Neural Network: Language Modelling
RNNRNNRNN
h0 h1 h2
hinit
RNN
h3
Proj Proj Proj Proj
𝑃 𝑤 ℎ0 𝑃 𝑤 ℎ1 𝑃 𝑤 ℎ2 𝑃 𝑤 ℎ3
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1
N
1
1 2910 79 1927
𝑙𝑜𝑠𝑠 = − log 𝑃( 𝑤=𝐴𝑚𝑎𝑧𝑜𝑛 ℎ0)) − log 𝑃( 𝑤=𝑖𝑠 ℎ1)) − log 𝑃( 𝑤=𝑎𝑚𝑎𝑧𝑖𝑛𝑔 ℎ2)) − log 𝑃( 𝑤=<𝐸𝑂𝑆> ℎ3))
Convolutional Neural Network for Text Classification
Source: Character-level Convolutional Networks for Text Classification,
Zhang et al. 15
Embeddings
Time
Time
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1 C0,0
C
C
N
1
N
T
1 2910 79 1927 2
C0,0
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1 C0,0
C0,1
C
N
1
N
T
1 2910 79 1927 2
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,1
C
C0,0
C0,1
C0,2
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C5,0
C5,1
C5,0
C5,1
C5,2
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C5,0
C5,1
C5,2
C0,0
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C5,0
C5,1
C5,2
C0,0
C0,0
C0,1
C0,0
C0,1
C3,0C3,0
C3,1
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C5,0
C5,1
C5,2
C0,0
C0,0
C0,1
C0,0
C0,1
C3,0
C3,1
Receptive field allows
long range
dependencies
Convolutional Neural Network
<BOS> Amazon is amazing <EOS>
W2910 W79 W1927W1 W2
W2910
W79
W2
W1927
W1
N
1
N
T
1 2910 79 1927 2
C0,0
C0,0
C0,1
C0,0
C0,1
C3,0
C3,1
…
x0
x1
x…
xn-2
xn-1
xn
Wpos
Wneut
Wneg
softmax
Pos 92%
Neutral 8%
Neg 0%
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C0,0
C0,1
C0,2
C5,0
C5,1
C5,2
Limitations
• Rare words are not well represented or just <UNK>
Half-way solutions:
• fastText and sum of subwords embeddings
• Character ngrams
• Byte Pair Encoding (BPE)
Limitations
Polysemy: meaning of a word
• Java
• Python
Depends on the context
• I love travelling. I am going to explore Java.
https://en.wikipedia.org/wiki/Java
Context can be bidirectional:
I went to the bank, to drop off some money
Context can be bidirectional:
I went to the bank, to drop off some money
Limitations
ELMo Embeddings (Peters et al. 18)
Contextualized word embeddings
𝑥 𝑏𝑜𝑠 𝑥1 𝑥2 𝑥3 𝑥 𝑛 𝑥 𝑒𝑜𝑠
Embedding (Char-CNN)
𝒉 𝒉 𝒉 𝒉
𝒉𝒉𝒉 𝒉
𝑆𝑖𝑆0
𝑆0
′
𝑆𝑖
′
Θ 𝑒
Θ𝑗 𝐿𝑆𝑇𝑀
Θ𝑗 𝐿𝑆𝑇𝑀
SoftmaxΘ 𝑠
𝑦1 𝑦2 𝑦3 𝑦 𝑛Pre-Training on bidirectional
language modelling:
ELMo Embeddings (Peters et al. 18))
𝑥 𝑏𝑜𝑠 𝑥1 𝑥2 𝑥3 𝑥 𝑛 𝑥 𝑒𝑜𝑠
Embedding (Char-CNN)
𝒉 𝒉
𝒉𝒉 𝒉
𝑆0
𝑆0
′
Θ 𝑒
Θ𝑗 𝐿𝑆𝑇𝑀
Θ𝑗 𝐿𝑆𝑇𝑀
SoftmaxΘ 𝑠
𝑦1 𝑦2 𝑦3 𝑦 𝑛
ELMo Embeddings (Peters et al. 18)
Contextualized word embeddings
Fine-Tuning:
Task Specific Neural Network
𝑅1 𝑅2 𝑅3 𝑅 𝑛
Learnt linear combination of hidden states
Embedding (Char-CNN)
ELMo Embeddings (Peters et al. 18)
ELMo
Softmax Layer
Input sentence
Output (probabilities over V)
Pretraining on Language Model
Your task NN
Input sentence
Output
Train on your Task
BERT (Devlin et al. 18)
Bidirectional Encoder Representations from Transformers (BERT)
BERT (Devlin et al. 18)
Inspired by “Improving Language Understanding by Generative Pre-Training”,
Radford et al. 2018 GPT-1 OpenAI
Based on Transformer and the multi-head self-attention model
Source: https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
BERT (Devlin et al. 18)
Self-attention: “The apple is red, it is delicious”
The apple is red , it is delicious
The apple is red , it is delicious
The apple is red , it is delicious
BERT (Devlin et al. 18)
INPUT
WordPieces
Embeddings
Sentence
Embeddings
Position
Embeddings
BERT INPUT REPRESENTATION:
Learned during the
(pre)training
process
MASK
EMASK
In pre-training 15% of the input tokens
are masked for the masked LM task
Training objects in slightly modified BERT models for downstream
tasks. (Image source: original paper)
Fine-tuning
For classification tasks:
token 𝐶𝐿𝑆 , 𝒉 𝐿
[𝐶𝐿𝑆]
Small weight matrix W:
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝒉 𝐿
𝐶𝐿𝑆
𝑾 𝐶𝐿𝑆)
BERT (Devlin et al. 18)
- No need for custom Neural Network for fine-tuning
BERT (Devlin et al. 18)
XLNet (Yang et al. 19)
XLNet: Generalized Autoregressive Pretraining for Language
Understanding
Problems with Bert:
1. The [MASK] token used in training does not appear during fine-tuning
2. BERT generates predictions independently
I went to [MASK] [MASK] and saw the [MASK] [MASK] [MASK].
XLNet (Yang et al. 19)
XLNet: Generalized Autoregressive Pretraining for Language
Understanding
Bidirectionnal context through randomized prediction of ordered tokens
OpenAI GPT-2 (Radford et al. 19)
Language Models are Unsupervised Multitask Learners
Source: original paper
Trained on language model task
40GB corpus dataset
1.5B parameters
OpenAI GPT-2 (Radford et al. 19)
Language Models are Unsupervised Multitask Learners
All the downstream language tasks are framed as predicting conditional
probabilities and there is no task-specific fine-tuning.
Zero-shot learning:
Summarization:
𝑃 𝑤 “𝑡𝑒𝑥𝑡 𝑡𝑜 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑧𝑒” + “ 𝑇𝐿; 𝐷𝑅: <? > ”)
Question Answering:
𝑃 𝑤 “𝑡𝑒𝑥𝑡” + “𝑄: … 𝐴: … 𝑄: … 𝐴: <? > ”)
Machine Translation:
𝑃 𝑤 “𝐼 𝑙𝑖𝑘𝑒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑠 = 𝐽′ 𝑎𝑖𝑚𝑒 𝑙𝑒𝑠 𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒𝑢𝑟𝑠; 𝐼 𝑙𝑖𝑣𝑒 𝑖𝑛 𝑉𝑎𝑛𝑐𝑜𝑢𝑣𝑒𝑟 =<? > ”)
Source: https://blog.openai.com/better-language-models/
Source: original paper
OpenAI GPT-2 (Radford et al. 19)
Language Models are Unsupervised Multitask Learners
Conclusion
- Count-based word representation (tf-idf)
- Learnt word representation (word2vec)
- Contextualized embeddings + custom network (ELMo)
- Sentence embeddings + fine-tuning (BERT, XLNet)
- Zero-short transfer with large language model (GPT-2)
Language representation
Task specific adaptation
References
Word embeddings:
LSA - Indexing by latent semantic analysis, Dumais et al. 1990
Word2Vec - Efficient Estimation of Word Representations in Vector Space, Mikolov et al. 2013
GloVe - GloVe: Global Vectors for Word Representation. Pennington et al. 2014
Subword embeddings
CNN character embedding layer - Character-Aware Neural Language Models, Kim et al. 2015
FastText - Enriching Word Vectors with Subword Information, Bojanowski et al. 2017
WordPiece - Google’s NMT System: Bridging the Gap between Human and Machine Translation, Wu et al. 2016
Contextualized embeddings
ELMo - Deep contextualized word representations, Peters et al. 2018
CoVe - Learned in Translation: Contextualized Word Vectors, McCann et al. 2017
Pre-trained deep learning architecture
Transformer - Attention Is All You Need, Vaswani et al. 2017
OpenAI GPT - Improving language understanding with unsupervised learning, Radford et al. 2018
BERT - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. 2018
OpenAI GPT-2 - Language Models are Unsupervised Multitask Learners, Radford et al. 2018
8th March 2018
Thank you!
tdelteil@amazon.com
github.com/ThomasDelteil
twitter.com/thdelteil
8th March 2018
GluonNLP Toolkit

More Related Content

What's hot

Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayesmehdi Cherti
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4zukun
 
Data compression & Classification
Data compression & ClassificationData compression & Classification
Data compression & ClassificationKhulna University
 
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist ConitnuationMMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist ConitnuationHojin Yang
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingMANISH T I
 
Python Training Tutorial for Frreshers
Python Training Tutorial for FrreshersPython Training Tutorial for Frreshers
Python Training Tutorial for Frreshersrajkamaltibacademy
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2Park Chunduck
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Cs1123 9 strings
Cs1123 9 stringsCs1123 9 strings
Cs1123 9 stringsTAlha MAlik
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVarun Reddy
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Royce Lu
 

What's hot (19)

Data compression
Data compressionData compression
Data compression
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
Introduction Data Compression/ Data compression, modelling and coding,Image C...
Introduction Data Compression/ Data compression, modelling and coding,Image C...Introduction Data Compression/ Data compression, modelling and coding,Image C...
Introduction Data Compression/ Data compression, modelling and coding,Image C...
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4
 
Data compression & Classification
Data compression & ClassificationData compression & Classification
Data compression & Classification
 
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist ConitnuationMMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
MMCF: Multimodal Collaborative Filtering for Automatic Playlist Conitnuation
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
 
Python Training Tutorial for Frreshers
Python Training Tutorial for FrreshersPython Training Tutorial for Frreshers
Python Training Tutorial for Frreshers
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Cs1123 9 strings
Cs1123 9 stringsCs1123 9 strings
Cs1123 9 strings
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
W2
W2W2
W2
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math Behind
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09Pitfalls of object_oriented_programming_gcap_09
Pitfalls of object_oriented_programming_gcap_09
 

Similar to Machine Learning Scientist at AWS Shares Recent Advances in NLP

nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsJeongkyu Shin
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdfChaoYang81
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...GeeksLab Odessa
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del LinguaggioAlumni Mathematica
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5dm_work
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demodm_work
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 

Similar to Machine Learning Scientist at AWS Shares Recent Advances in NLP (20)

nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractions
 
defense
defensedefense
defense
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...
AI&BigData Lab 2016. Анатолий Востряков: Перевод с "плохого" английского на "...
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio"SSC" - Geometria e Semantica del Linguaggio
"SSC" - Geometria e Semantica del Linguaggio
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 

More from Apache MXNet

Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringApache MXNet
 
Introduction to GluonNLP
Introduction to GluonNLPIntroduction to GluonNLP
Introduction to GluonNLPApache MXNet
 
Introduction to object tracking with Deep Learning
Introduction to object tracking with Deep LearningIntroduction to object tracking with Deep Learning
Introduction to object tracking with Deep LearningApache MXNet
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCVApache MXNet
 
Introduction to Computer Vision
Introduction to Computer VisionIntroduction to Computer Vision
Introduction to Computer VisionApache MXNet
 
Image Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesImage Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesApache MXNet
 
Introduction to Deep face detection and recognition
Introduction to Deep face detection and recognitionIntroduction to Deep face detection and recognition
Introduction to Deep face detection and recognitionApache MXNet
 
Generative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetGenerative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetApache MXNet
 
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.aiDeep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.aiApache MXNet
 
Using Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetUsing Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetApache MXNet
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet
 
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetMXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetApache MXNet
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018Apache MXNet
 
Apache MXNet EcoSystem - ACNA2018
Apache MXNet EcoSystem - ACNA2018Apache MXNet EcoSystem - ACNA2018
Apache MXNet EcoSystem - ACNA2018Apache MXNet
 
ONNX and Edge Deployments
ONNX and Edge DeploymentsONNX and Edge Deployments
ONNX and Edge DeploymentsApache MXNet
 
Distributed Inference with MXNet and Spark
Distributed Inference with MXNet and SparkDistributed Inference with MXNet and Spark
Distributed Inference with MXNet and SparkApache MXNet
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time SeriesApache MXNet
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Building Content Recommendation Systems using MXNet Gluon
Building Content Recommendation Systems using MXNet GluonBuilding Content Recommendation Systems using MXNet Gluon
Building Content Recommendation Systems using MXNet GluonApache MXNet
 

More from Apache MXNet (20)

Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question Answering
 
Introduction to GluonNLP
Introduction to GluonNLPIntroduction to GluonNLP
Introduction to GluonNLP
 
Introduction to object tracking with Deep Learning
Introduction to object tracking with Deep LearningIntroduction to object tracking with Deep Learning
Introduction to object tracking with Deep Learning
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCV
 
Introduction to Computer Vision
Introduction to Computer VisionIntroduction to Computer Vision
Introduction to Computer Vision
 
Image Segmentation: Approaches and Challenges
Image Segmentation: Approaches and ChallengesImage Segmentation: Approaches and Challenges
Image Segmentation: Approaches and Challenges
 
Introduction to Deep face detection and recognition
Introduction to Deep face detection and recognitionIntroduction to Deep face detection and recognition
Introduction to Deep face detection and recognition
 
Generative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetGenerative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNet
 
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.aiDeep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai
Deep Learning With Apache MXNet On Video by Ben Taylor @ ziff.ai
 
Using Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNetUsing Java to deploy Deep Learning models with MXNet
Using Java to deploy Deep Learning models with MXNet
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetMXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNet
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Apache MXNet EcoSystem - ACNA2018
Apache MXNet EcoSystem - ACNA2018Apache MXNet EcoSystem - ACNA2018
Apache MXNet EcoSystem - ACNA2018
 
ONNX and Edge Deployments
ONNX and Edge DeploymentsONNX and Edge Deployments
ONNX and Edge Deployments
 
Distributed Inference with MXNet and Spark
Distributed Inference with MXNet and SparkDistributed Inference with MXNet and Spark
Distributed Inference with MXNet and Spark
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time Series
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Building Content Recommendation Systems using MXNet Gluon
Building Content Recommendation Systems using MXNet GluonBuilding Content Recommendation Systems using MXNet Gluon
Building Content Recommendation Systems using MXNet Gluon
 

Recently uploaded

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Machine Learning Scientist at AWS Shares Recent Advances in NLP

  • 1. Thomas Delteil – Machine Learning Scientist @ AWS AI tdelteil@amazon.com 8th March 2018 Recent advances in Natural Language Processing
  • 2. Objective - NLP domain overview - Traditional methods - Word Embeddings (word2vec) - Contextualized word embeddings (ELMo) - Bidirectional Encoder Representation from Transformers (BERT) - Generative Pre-Training 2 (GPT-2)
  • 3. What is covered in NLP Text classification
  • 4. Language Modelling 𝑃(𝑤𝑡|𝑤𝑡−1, 𝑤𝑡−2, … ) See you later […] alligator today 𝑃(𝑤𝑡|𝑤𝑡+1, 𝑤𝑡+2, … ) […] abhors a vacuum Nature Fido
  • 5. David Gascoyne Automatic Text Generation http://botpoet.com The crow crooked on more beautiful and free, He journeyed off into the quarter sea. His radiant ribs girdled empty and very least beautiful as dignified to see. The smooth plain with its mirrors listens to the cliff Like a basilisk eating flowers. And the children, lost in the shadows of the catacombs, Call to the mirrors for help: “Strong-bow of salt, cutlass of memory, Write on my map the name of every river.”
  • 6. Natural Language Understanding Alexa, remind me to buy groceries after work Intent detection: Create Reminder Slot filling: What When Where Alexa, remind me to buy groceries after work
  • 7. Machine Translation Sometimes, in the morning, I wonder whether AI bots will kill us all 時々、午前中に、AIボットが私たち全員を殺すのだろうか? Text Summarization A Neural Attention Model for Abstractive Sentence Summarization, Alexander M. Rush et al. 2015
  • 8. Question Answering: “Who was president when Barack Obama was born?” John Fitzgerald Kennedy Part of speech tagging Sentence similarity Commonsense Reasoning Coreference Resolution …
  • 9. Classical Methods Text representation: Lexicon based  quickly explodes with N >> 10000  Text preprocessing
  • 10. Text Pre-Processing I’d love to drive again in the mountainous roads of Crete. I would love to drive again in the mountainous roads of crete. I · would · love · to · drive · again · in · the · mountainous · roads · of · crete · . I · would · love · to · drive · again · in · the · mountainous · roads · of · crete · . would · love · drive · again · mountainous · roads · crete · . would · love · drive · again · mountain · road · crete · . Normalization Tokenization Stop words removal Lemmatization
  • 11.
  • 12. Grapheme/Token representation: One-Hot encoding Define words as a vector I’d love to drive … preprocessing drive love would would love drive1 0 0 0 1 0 0 0 1
  • 13. Sentence representation: Bag of words Sum of one-hot encoded word vectors I’d love to drive … drive love would I’d love to drive Dictionary size = 3 If dictionary size >>> 1 Very sparse! 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1
  • 14. TF*IDF Term frequency inverse document frequency TF = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑜𝑐 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑟𝑚𝑠 IDF = 𝑙𝑛 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 0 0 0 0 2.3 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 1.2 0 0 0 0.5 0 Classifiers SVM MLP Naïve Bayes XGBoost
  • 15. Limitations: no semantic information With one-hot encoding: 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 - -= = √2 2 2 || vautomobile – vcar ||2 = || vautomobile – vmountain ||2 = √2 Ideally we would want: || vautomobile – vcar ||2 ≈ 0
  • 16. Word order matters • Context dependent information • The place of the word in the sentence matters My kindle is easy to use, I do not need help I do need help, my kindle is not easy to use
  • 17. Better grapheme representation Better context understanding
  • 18. Word2vec: Efficient Estimation of Word Representations in Vector Space Mikolov et al. 13 2013 Learn word embeddings: Skip-gram: predict context given center word Continuous Bag of Words (CBOW): predict center word given context CBOW model … The cake is a lie … Context words at t-2 and t-1 Context words at t+1 and t+2 Word to predict at t Estimate: 𝑃(𝑤𝑡|𝑤𝑡−2, 𝑤𝑡−1, 𝑤𝑡+1, 𝑤𝑡+2)
  • 19. Learning process ℒ = −log(𝑃 𝑤𝑡 𝑤𝑡−2, 𝑤𝑡−1, 𝑤𝑡+1, 𝑤𝑡 ) source: https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html
  • 21.
  • 22. Using Word Representation in Neural Networks Amazon is amazing 2910 79 1927 W2910 W79 W1927 W1 W2 … Wi … W|V| Neural Layers ? Output Layer {Wi} are the word embeddings. They are parameters that the Neural Networks can modify through. Can be pre-trained. indexing lookup 1 |V| N
  • 24. Recurrent Neural Network: Language Modelling RNNRNNRNN h0 h1 h2 hinit RNN h3 Proj Proj Proj Proj 𝑃 𝑤 ℎ0 𝑃 𝑤 ℎ1 𝑃 𝑤 ℎ2 𝑃 𝑤 ℎ3 <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 N 1 1 2910 79 1927 𝑙𝑜𝑠𝑠 = − log 𝑃( 𝑤=𝐴𝑚𝑎𝑧𝑜𝑛 ℎ0)) − log 𝑃( 𝑤=𝑖𝑠 ℎ1)) − log 𝑃( 𝑤=𝑎𝑚𝑎𝑧𝑖𝑛𝑔 ℎ2)) − log 𝑃( 𝑤=<𝐸𝑂𝑆> ℎ3))
  • 25. Convolutional Neural Network for Text Classification Source: Character-level Convolutional Networks for Text Classification, Zhang et al. 15 Embeddings Time Time
  • 26. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 C0,0 C C N 1 N T 1 2910 79 1927 2
  • 27. C0,0 Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 C0,0 C0,1 C N 1 N T 1 2910 79 1927 2
  • 28. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,1 C C0,0 C0,1 C0,2
  • 29. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C5,0 C5,1 C5,0 C5,1 C5,2
  • 30. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C5,0 C5,1 C5,2 C0,0
  • 31. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C5,0 C5,1 C5,2 C0,0 C0,0 C0,1 C0,0 C0,1 C3,0C3,0 C3,1
  • 32. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C5,0 C5,1 C5,2 C0,0 C0,0 C0,1 C0,0 C0,1 C3,0 C3,1 Receptive field allows long range dependencies
  • 33. Convolutional Neural Network <BOS> Amazon is amazing <EOS> W2910 W79 W1927W1 W2 W2910 W79 W2 W1927 W1 N 1 N T 1 2910 79 1927 2 C0,0 C0,0 C0,1 C0,0 C0,1 C3,0 C3,1 … x0 x1 x… xn-2 xn-1 xn Wpos Wneut Wneg softmax Pos 92% Neutral 8% Neg 0% C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C0,0 C0,1 C0,2 C5,0 C5,1 C5,2
  • 34. Limitations • Rare words are not well represented or just <UNK> Half-way solutions: • fastText and sum of subwords embeddings • Character ngrams • Byte Pair Encoding (BPE)
  • 35. Limitations Polysemy: meaning of a word • Java • Python Depends on the context • I love travelling. I am going to explore Java. https://en.wikipedia.org/wiki/Java
  • 36. Context can be bidirectional: I went to the bank, to drop off some money Context can be bidirectional: I went to the bank, to drop off some money Limitations
  • 37. ELMo Embeddings (Peters et al. 18) Contextualized word embeddings 𝑥 𝑏𝑜𝑠 𝑥1 𝑥2 𝑥3 𝑥 𝑛 𝑥 𝑒𝑜𝑠 Embedding (Char-CNN) 𝒉 𝒉 𝒉 𝒉 𝒉𝒉𝒉 𝒉 𝑆𝑖𝑆0 𝑆0 ′ 𝑆𝑖 ′ Θ 𝑒 Θ𝑗 𝐿𝑆𝑇𝑀 Θ𝑗 𝐿𝑆𝑇𝑀 SoftmaxΘ 𝑠 𝑦1 𝑦2 𝑦3 𝑦 𝑛Pre-Training on bidirectional language modelling:
  • 38. ELMo Embeddings (Peters et al. 18)) 𝑥 𝑏𝑜𝑠 𝑥1 𝑥2 𝑥3 𝑥 𝑛 𝑥 𝑒𝑜𝑠 Embedding (Char-CNN) 𝒉 𝒉 𝒉𝒉 𝒉 𝑆0 𝑆0 ′ Θ 𝑒 Θ𝑗 𝐿𝑆𝑇𝑀 Θ𝑗 𝐿𝑆𝑇𝑀 SoftmaxΘ 𝑠 𝑦1 𝑦2 𝑦3 𝑦 𝑛
  • 39. ELMo Embeddings (Peters et al. 18) Contextualized word embeddings Fine-Tuning: Task Specific Neural Network 𝑅1 𝑅2 𝑅3 𝑅 𝑛 Learnt linear combination of hidden states Embedding (Char-CNN)
  • 40. ELMo Embeddings (Peters et al. 18) ELMo Softmax Layer Input sentence Output (probabilities over V) Pretraining on Language Model Your task NN Input sentence Output Train on your Task
  • 41. BERT (Devlin et al. 18) Bidirectional Encoder Representations from Transformers (BERT)
  • 42. BERT (Devlin et al. 18) Inspired by “Improving Language Understanding by Generative Pre-Training”, Radford et al. 2018 GPT-1 OpenAI Based on Transformer and the multi-head self-attention model Source: https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
  • 43. BERT (Devlin et al. 18) Self-attention: “The apple is red, it is delicious” The apple is red , it is delicious The apple is red , it is delicious The apple is red , it is delicious
  • 44. BERT (Devlin et al. 18) INPUT WordPieces Embeddings Sentence Embeddings Position Embeddings BERT INPUT REPRESENTATION: Learned during the (pre)training process MASK EMASK In pre-training 15% of the input tokens are masked for the masked LM task
  • 45.
  • 46.
  • 47. Training objects in slightly modified BERT models for downstream tasks. (Image source: original paper) Fine-tuning For classification tasks: token 𝐶𝐿𝑆 , 𝒉 𝐿 [𝐶𝐿𝑆] Small weight matrix W: 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝒉 𝐿 𝐶𝐿𝑆 𝑾 𝐶𝐿𝑆) BERT (Devlin et al. 18)
  • 48. - No need for custom Neural Network for fine-tuning BERT (Devlin et al. 18)
  • 49. XLNet (Yang et al. 19) XLNet: Generalized Autoregressive Pretraining for Language Understanding Problems with Bert: 1. The [MASK] token used in training does not appear during fine-tuning 2. BERT generates predictions independently I went to [MASK] [MASK] and saw the [MASK] [MASK] [MASK].
  • 50. XLNet (Yang et al. 19) XLNet: Generalized Autoregressive Pretraining for Language Understanding Bidirectionnal context through randomized prediction of ordered tokens
  • 51. OpenAI GPT-2 (Radford et al. 19) Language Models are Unsupervised Multitask Learners Source: original paper Trained on language model task 40GB corpus dataset 1.5B parameters
  • 52. OpenAI GPT-2 (Radford et al. 19) Language Models are Unsupervised Multitask Learners All the downstream language tasks are framed as predicting conditional probabilities and there is no task-specific fine-tuning. Zero-shot learning: Summarization: 𝑃 𝑤 “𝑡𝑒𝑥𝑡 𝑡𝑜 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑧𝑒” + “ 𝑇𝐿; 𝐷𝑅: <? > ”) Question Answering: 𝑃 𝑤 “𝑡𝑒𝑥𝑡” + “𝑄: … 𝐴: … 𝑄: … 𝐴: <? > ”) Machine Translation: 𝑃 𝑤 “𝐼 𝑙𝑖𝑘𝑒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑠 = 𝐽′ 𝑎𝑖𝑚𝑒 𝑙𝑒𝑠 𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒𝑢𝑟𝑠; 𝐼 𝑙𝑖𝑣𝑒 𝑖𝑛 𝑉𝑎𝑛𝑐𝑜𝑢𝑣𝑒𝑟 =<? > ”)
  • 54. Source: original paper OpenAI GPT-2 (Radford et al. 19) Language Models are Unsupervised Multitask Learners
  • 55.
  • 56. Conclusion - Count-based word representation (tf-idf) - Learnt word representation (word2vec) - Contextualized embeddings + custom network (ELMo) - Sentence embeddings + fine-tuning (BERT, XLNet) - Zero-short transfer with large language model (GPT-2) Language representation Task specific adaptation
  • 57. References Word embeddings: LSA - Indexing by latent semantic analysis, Dumais et al. 1990 Word2Vec - Efficient Estimation of Word Representations in Vector Space, Mikolov et al. 2013 GloVe - GloVe: Global Vectors for Word Representation. Pennington et al. 2014 Subword embeddings CNN character embedding layer - Character-Aware Neural Language Models, Kim et al. 2015 FastText - Enriching Word Vectors with Subword Information, Bojanowski et al. 2017 WordPiece - Google’s NMT System: Bridging the Gap between Human and Machine Translation, Wu et al. 2016 Contextualized embeddings ELMo - Deep contextualized word representations, Peters et al. 2018 CoVe - Learned in Translation: Contextualized Word Vectors, McCann et al. 2017 Pre-trained deep learning architecture Transformer - Attention Is All You Need, Vaswani et al. 2017 OpenAI GPT - Improving language understanding with unsupervised learning, Radford et al. 2018 BERT - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. 2018 OpenAI GPT-2 - Language Models are Unsupervised Multitask Learners, Radford et al. 2018
  • 58. 8th March 2018 Thank you! tdelteil@amazon.com github.com/ThomasDelteil twitter.com/thdelteil

Editor's Notes

  1. Use softmax cross entropy loss Minimize – log p probability of the context world Unsupervised learning process from large corpora
  2. Task 1: Mask language model (MLM) Task 2: Next sentence prediction Note that the first token is always forced to be [CLS] — a placeholder that will be used later for prediction in downstream tasks.