Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pre trained language model

80 views

Published on

Explained NLP model, based on Pre-trained model
Supported by DPS; https://digitalproductschool.io/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Pre trained language model

  1. 1. Pre-trained Language model - from ELMO to GPT-2 Jiwon Kim jiwon.kim@dpschool.io
  2. 2. Why ‘pre-trained language model’? ● Achieved great performance on a variety of language tasks. ● similar to how ImageNet classification pre-training helps many vision tasks (*) ● Even better than CV tasks, it does not require labeled data for pre-training. (*) Although recently He et al. (2018) found that pre-training might not be necessary for image segmentation task.
  3. 3. The problem of previous embeddings feat., Gensim, NLTK, Scikit-learn
  4. 4. Then, what’s difference between “I’m eating an apple” and “I have an Apple pencil”?
  5. 5. The gradient - Sebastian Ruder
  6. 6. Embeddings Dependent on Context
  7. 7. ELMO ULMFiT BERT GPT-2
  8. 8. ELMo: Deep contextualized word representations AllenNLP
  9. 9. ELMO - core structure
  10. 10. ELMO - code feat. tensorflow hub
  11. 11. ULMFiT: Universal Language Model Fine-tuning for Text Classification fast.ai
  12. 12. ULMFiT - core structure Transfer-learning!
  13. 13. ULMFiT
  14. 14. ULMFiT feat. fastai
  15. 15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Google
  16. 16. BERT - core structure
  17. 17. BERT - code
  18. 18. BERT - code feat. huggingface
  19. 19. “You know, we could just throw more GPUs and data at it.”
  20. 20. GPT-2: "Language Models are Unsupervised Multitask Learners" OpenAI
  21. 21. GPT-2 - core structure
  22. 22. GPT-2 - code
  23. 23. A text generation sample from OpenAI’s GPT-2 language model
  24. 24. Comparing the Model What tasks can we do with these models?
  25. 25. 1 - tricky to implement and not good accuracy SQuAD NER SRL Sentiment Coref Text Generation ELMO 4 3 3 3 3 2 ULMFiT 1 2 2 5 2 4 BERT 3 4 4 4 4 3 GPT-2 5 1 1 2 1 5 5 - possible and easy

×