Transfer learning in NLP

Transfer learning in NLP
What has changed and why is it important for business?
Jakub Nowacki, PhD
Lead Machine Learning Engineer @ Sotrender
Trainer @ Sages

Transfer Learning
https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0
c
😭
😎

Embeddings (Word2vec, FastText etc.)
https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c

So what is wrong with that?
[0.0, 0.0, …, 0.0]

Contextualized word-embeddings
http://jalammar.github.io/illustrated-bert/

Language model
https://medium.com/@plusepsilon/the-bidirectional-language-model-1f3961d1fb27
A statistical language model is a probability distribution over
sequences of words. Given such a sequence, say of length m, it
assigns a probability P(w_1,..., w_m) to the whole sequence.
Wikipedia: https://en.wikipedia.org/wiki/Language_model

ELMo

LSTM vs Transformer
https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04
https://colah.github.io/posts/2015-08-Understanding-LSTMs/

BERT

Transfer learning in NLP

NLP’s ImageNet moment
http://ruder.io/nlp-imagenet/

Cyberbullying
https://kidshelpline.com.au/teens/issues/cyberbullying

PolEval 2019 Cyberbullying
http://poleval.pl/tasks/task6
Precision = 0.5
Recall = 0.5522
F1-score = 0.5248 (balanced)
Accuracy = 0.866
🏆
4th place
(theoretically, since we
didn’t take part)
Word
embeddings
(FastText)
Flair
embeddings
(forward)
Flair
embeddings
(backward)
Stacked
embeddings
BiLSTM
(with dropouts)
Linear
Document
Harmful?

PolEval 2019 Cyberbullying
http://poleval.pl/tasks/task6
@anonymized_account Czyżby Madryt brał przykład z
Warszawy?
@anonymized_account @anonymized_account No to
Skończmy k**** z tym wersalem w j****** szczujni

The pros and cons
Shallow embeddings
(Word2Vec, FastText etc.)
Pros:
• Easy to train
• Small
• A lot of existing models
Cons:
• Same embedding for
different meaning
• May have issues with
inﬂection
• May have issues with
out-of-vocabulary (OOV)
words
Contextualized Embeddings
(ELMo, Flair etc.)
Pros:
• Embedding based on the
context
• Moderate size and
training speed
• Existing models
• No OOV problem
Cons:
• Require extra network
architecture
• LSTMs are rather slow
• Should be used along
with shallow embeddings
Transformer-based models
(e.g. BERT etc.)
Pros:
• Task-agnostic model
• Can be used as
embeddings or tuned
• Existing models
• Faster than LSTMs
• No OOV problem
Cons:
• Can be really large
• Hard to tune and even
harder to train (TPUs
almost a must)
• Multilingual versions are
very large
https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html

Transfer learning in NLP

Recommended

Recommended

More Related Content

Similar to Transfer learning in NLP

Similar to Transfer learning in NLP (20)

More from Sotrender

More from Sotrender (19)

Recently uploaded

Recently uploaded (20)

Transfer learning in NLP