Allganize AI seminar - GPT3 and PET

•

0 likes•97 views

민우 박

Technology

All Tasks are Language Modeling
GPT-3 and PET
AI model team
Minwoo Park
minwoo@allganize.ai
AI Seminar - Paper review

Abstract
2
https://arxiv.org/pdf/2009.07118.pdf

GPT-3
3
https://arxiv.org/pdf/2005.14165.pdf

GPT-3: Introduction
4
https://arxiv.org/pdf/2005.14165.pdf
Recent NLP paradigm - Finetuning a pre-trained LM to downstream tasks
- has led substantial progress on many challenging NLP tasks
- entirely removing the need for task-speciﬁc architectures.
However,
- needs large amount of task-speciﬁc datasets
- needs task-speciﬁc ﬁne-tuning
Humans do not require large supervised datasets to learn.
A brief directive / a tiny number of demonstrations is often suﬃcient.

GPT-3
5
My daughter successfully predicted the next word with one-shot learning.

GPT-3 shows how powerful language models can be.
Drawbacks:
- It requires a gigantic LM to work well, making it UNUSABLE in real-world.
- It does not scale to more than a few examples as the context window of most LMs is limited to a
few hundred tokens.
GPT-3: Summary
11

PET
12
Pattern Exploiting Training
Reformulating tasks as cloze
questions with regular gradient-based
ﬁnetuning.

PET: Semi-supervised Knowledge Distillation
13
1. Various patterns are used for
ﬁnetuning language models.
2. The ensemble of trained
language models annotates
unlabeled data.
3. A classiﬁer is trained on the
so-obtained soft labeled
dataset.

PET: Without Knowledge Distillation
Without knowledge distillation
using unlabeled dataset, PET
performs even better.
But n*k times larger than the
distilled model. (k=number of
PVPs, n=number of LMs per each
PVP)

iPET
16
Some patterns perform worse than others -> The training set for ﬁnal model may contain many
falsely labeled examples.

PET: Summary
18
Small LM is possible to achieve few-shot performance similar to GPT-3.

👍
- Unsupervised / semi-supervised
learning
- No / few labeled data
- Text in text out using Language
model
Conclusion: Future of NLP
19
👎
- Supervised learning
- Large amount of labeled data
- Task speciﬁc ﬁne-tuning

Tool: Next Word Prediction
21
https://github.com/renatoviolin/next_word_prediction

Reference
Language Models are Few-Shot Learners
- https://arxiv.org/pdf/2005.14165.pdf
Exploiting Cloze Questions for Few Shot Text Classiﬁcation and
Natural Language Inference
- https://arxiv.org/pdf/2001.07676.pdf
It’s Not Just Size That Matters: Small Language Models Are Also
Few-Shot Learners
- https://arxiv.org/pdf/2009.07118.pdf

What's hot

Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion

Procedural programmingAnbarasan Gangadaran

Introduction to automataShubham Bansal

Paradigms programming from functional to multi-agent dataflowyannick grenzinger

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia

PythonMohammad Junaid Khan

Python lec 1002_for_biologistsRamadan Babers, PhD

NLP using transformers Arvind Devaraj

Introduction to phython programmingASIT Education

Spelunking through JPEG with Racket (Sixth RacketCon)Igalia

Python Programming Courseiseestech

Non binary orthogonal latin square codes for a multilevel phase charge memory...LogicMindtech Nologies

Machine Translation: The Neural FrontierIconic Translation Machines

Programming ParadigmsLeo Hernandez

Programming paradigmbusyking03

Programming LanguagesEdward Blurock

Machine Learning Techniques in Python Dissertation - PhdassistancePhD Assistance

Neural Machine Translation: a report from the front lineIconic Translation Machines

Itc chapter # 8National university of modern languages

What's hot (20)

Deep Learning for Machine Translation - A dramatic turn of paradigm

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...

Procedural programming

Introduction to automata

Paradigms programming from functional to multi-agent dataflow

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)

Python

Python lec 1002_for_biologists

NLP using transformers

Introduction to phython programming

Spelunking through JPEG with Racket (Sixth RacketCon)

Python Programming Course

Non binary orthogonal latin square codes for a multilevel phase charge memory...

Machine Translation: The Neural Frontier

Programming Paradigms

Programming paradigm

Programming Languages

Machine Learning Techniques in Python Dissertation - Phdassistance

Neural Machine Translation: a report from the front line

Itc chapter # 8

Similar to Allganize AI seminar - GPT3 and PET

A brief primer on OpenAI's GPT-3Ishan Jain

Implications of GPT-3Raven Jiang

LLMs for the “GPU-Poor” - Franck Nijimbere.pdfGDG Bujumbura

Weak Supervision.pdfStephenLeo7

2021 04-04-google nmtJAEMINJEONG5

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat

Benchmarking transfer learning approaches for NLPYury Kashnitsky

Learning to Translate with Joey NMTJulia Kreutzer

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network

Scaling Deep Learning Algorithms on Extreme Scale Architecturesinside-BigData.com

Better Functional Design through TDDPhil Calçado

Natural Language Processing - Research and Application TrendsShreyas Suresh Rao

Scaling Instruction-Finetuned Language Modelstaeseon ryu

Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud

Cd projectdivikmittal

Thomas Wolf "Transfer learning in NLP"Fwdays

Practical NLP with LispVsevolod Dyomkin

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

Similar to Allganize AI seminar - GPT3 and PET (20)

A brief primer on OpenAI's GPT-3

Implications of GPT-3

LLMs for the “GPU-Poor” - Franck Nijimbere.pdf

Weak Supervision.pdf

2021 04-04-google nmt

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

Training language models to follow instructions with human feedback (Instruct...

Benchmarking transfer learning approaches for NLP

Learning to Translate with Joey NMT

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)

Scaling Deep Learning Algorithms on Extreme Scale Architectures

Better Functional Design through TDD

Natural Language Processing - Research and Application Trends

Scaling Instruction-Finetuned Language Models

Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning

Cd project

Thomas Wolf "Transfer learning in NLP"

Practical NLP with Lisp

How to fine-tune and develop your own large language model.pptx

Recently uploaded

🐬 The future of MySQL is Postgres 🐘RTylerCroy

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘

CNv6 Instructor Chapter 6 Quality of Service

Advantages of Hiring UIUX Design Service Providers for Your Business

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Artificial Intelligence: Facts and Myths

GenCyber Cyber Security Day Presentation

Exploring the Future Potential of AI-Enabled Smartphone Processors

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to Troubleshoot Apps for the Modern Connected Worker

Axa Assurance Maroc - Insurer Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Boost Fertility New Invention Ups Success Rates.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Allganize AI seminar - GPT3 and PET

1. All Tasks are Language Modeling GPT-3 and PET AI model team Minwoo Park minwoo@allganize.ai AI Seminar - Paper review

2. Abstract 2 https://arxiv.org/pdf/2009.07118.pdf

3. GPT-3 3 https://arxiv.org/pdf/2005.14165.pdf

4. GPT-3: Introduction 4 https://arxiv.org/pdf/2005.14165.pdf Recent NLP paradigm - Finetuning a pre-trained LM to downstream tasks - has led substantial progress on many challenging NLP tasks - entirely removing the need for task-specific architectures. However, - needs large amount of task-specific datasets - needs task-specific fine-tuning Humans do not require large supervised datasets to learn. A brief directive / a tiny number of demonstrations is often sufficient.

5. GPT-3 5 My daughter successfully predicted the next word with one-shot learning.

6. GPT-3: LM Meta-learning 6

7. GPT-3: Larger model 7

8. GPT-3: Few-shot Inference 8

9. Traditional fine-tuning 9

10. GPT-3: Performance 10

11. GPT-3 shows how powerful language models can be. Drawbacks: - It requires a gigantic LM to work well, making it UNUSABLE in real-world. - It does not scale to more than a few examples as the context window of most LMs is limited to a few hundred tokens. GPT-3: Summary 11

12. PET 12 Pattern Exploiting Training Reformulating tasks as cloze questions with regular gradient-based ﬁnetuning.

13. PET: Semi-supervised Knowledge Distillation 13 1. Various patterns are used for ﬁnetuning language models. 2. The ensemble of trained language models annotates unlabeled data. 3. A classiﬁer is trained on the so-obtained soft labeled dataset.

14. PET: Without Knowledge Distillation Without knowledge distillation using unlabeled dataset, PET performs even better. But n*k times larger than the distilled model. (k=number of PVPs, n=number of LMs per each PVP)

15. PET: Patterns 15

16. iPET 16 Some patterns perform worse than others -> The training set for ﬁnal model may contain many falsely labeled examples.

17. PET vs. GPT-3 17

18. PET: Summary 18 Small LM is possible to achieve few-shot performance similar to GPT-3.

19. 👍 - Unsupervised / semi-supervised learning - No / few labeled data - Text in text out using Language model Conclusion: Future of NLP 19 👎 - Supervised learning - Large amount of labeled data - Task speciﬁc ﬁne-tuning

20. All Tasks Are Language Modeling 20

21. Tool: Next Word Prediction 21 https://github.com/renatoviolin/next_word_prediction

22. Reference Language Models are Few-Shot Learners - https://arxiv.org/pdf/2005.14165.pdf Exploiting Cloze Questions for Few Shot Text Classiﬁcation and Natural Language Inference - https://arxiv.org/pdf/2001.07676.pdf It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners - https://arxiv.org/pdf/2009.07118.pdf

23. minwoo@allganize.ai Thank you 23

Allganize AI seminar - GPT3 and PET

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Allganize AI seminar - GPT3 and PET

Similar to Allganize AI seminar - GPT3 and PET (20)

Recently uploaded

Recently uploaded (20)

Allganize AI seminar - GPT3 and PET