Pre trained language model

Pre-trained Language
model - from ELMO to GPT-2
Jiwon Kim
jiwon.kim@dpschool.io

Why ‘pre-trained language model’?
● Achieved great performance on a variety of language
tasks.
● similar to how ImageNet classification pre-training helps
many vision tasks (*)
● Even better than CV tasks, it does not require labeled data
for pre-training.
(*) Although recently He et al. (2018) found that pre-training might not be necessary for image segmentation task.

The problem of
previous
embeddings
feat., Gensim, NLTK, Scikit-learn

Then, what’s difference between
“I’m eating an apple”
and
“I have an Apple pencil”?

The gradient - Sebastian Ruder

Embeddings Dependent on Context

ELMo: Deep contextualized word
representations
AllenNLP

ELMO - code
feat. tensorflow hub

ULMFiT: Universal Language Model
Fine-tuning for Text Classification
fast.ai

ULMFiT - core structure
Transfer-learning!

BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
Google

“You know, we could just
throw more GPUs and data at
it.”

GPT-2: "Language Models are
Unsupervised Multitask Learners"
OpenAI

A text generation sample from OpenAI’s GPT-2 language model

Comparing the Model
What tasks can we do with these models?

1 - tricky to implement and not good accuracy
SQuAD NER SRL Sentiment Coref Text Generation
ELMO 4 3 3 3 3 2
ULMFiT 1 2 2 5 2 4
BERT 3 4 4 4 4 3
GPT-2 5 1 1 2 1 5
5 - possible and easy

Pre trained language model

More Related Content

What's hot

Similar to Pre trained language model

Recently uploaded

Pre trained language model

Editor's Notes