‘Big models’: the success and pitfalls of Transformer models in natural language processing

‘Big models’
ThesuccessandpitfallsofTransformermodelsinNLP
Suzan Verberne | NOTaS | March 2023

Today’stalk
 Large Language Models
 BERT
 Huggingface
 Generative Pretrained Transformers (GPT)
 Challenges and problems
 Consequences for work and education
Suzan Verberne 2023
2

Large Language Models
Suzan Verberne 2023
3

LargeLanguage
Models  Transformers: Attention is all you need (2017)
 Designed for sequence-to-sequence (i.e. translation)
 Encoder-decoder architecture
Suzan Verberne 2023
4
Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro
How it all started…

LargeLanguage
Models
Transformers are powerful because of
 the long-distance relation between all words (attention)
 parallel processing instead of sequential
 unsupervised pre-training on HUGE amount of data
Suzan Verberne 2023
5

LargeLanguage
Models BERT (Bidirectional Encoder Representations from
Transformers)
 An encoder-only transformer
 Input is text, output is embeddings
Suzan Verberne 2023
6
Next…

Some
linguistics…
BERT is based on the distributional hypothesis
 The context of a word defines its meaning
 Words that occur in similar contexts tend to be similar
Suzan Verberne 2023
Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162

Word
Embeddings
 BERT embeddings are learned from unlabelled data
 Through a process called ‘masked language modelling’
with self-supervision
Suzan Verberne 2023

BERT
 BERT is so powerful because
it is used in a transfer
learning setting
 Pre-training: learning
embeddings from huge
unlabeled data (self-
supervised)
 Fine-tuning: learning
the classification model
from smaller labeled
data (supervised) for
any NLP task (e.g.
sentiment, named
entities)
Suzan Verberne 2023
9

Huggingface
But also because:
 The authors (from Google) open-sourced the model
implementation
 And publicly release pretrained models (which are
computationally expensive to pretrain from scratch)
 https://huggingface.co/ is a the standard
implementation package for training and applying
Transformer models
 Currently over 150k models have been published on
Huggingface
Suzan Verberne 2023
10

Huggingface
Working with Huggingface
 Take a pre-trained model
 Run ‘zero-shot’:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
output=sentiment_pipeline(data)
print(output)
[{'label': 'POSITIVE', 'score': 0.9998656511306763},
{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
 Or fine-tune on your own data
Suzan Verberne 2023
13
Default model: distilbert-base-uncased-finetuned-sst-2-english

Generative Pretrained
Transformers (GPT)
Suzan Verberne 2023
14

GPT  GPT is a decoder-only transformer model
 It does not have an encoder
 Instead: use the prompt to generate outputs
 A growing family of models since 2018: GPT-2,
DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4
Suzan Verberne 2023
15

GPT-3
 GPT is trained to generate the most probable/plausible
text
 Trained on crawled internet data, open source books,
Wikipedia, sampled early 2022
 After each word, predict the most probable next word
given all the previous words
 It will give you fluent text that looks very real
Suzan Verberne 2023
16

Few-shot
learning
Few-shot learning: learn from a small number of examples
Suzan Verberne 2023
17
'Old paradigm'
• pre-training
• fine-tuning with ~100s-1000s
training samples
'New paradigm'
• pre-training
• prompting with ~3-50
examples in the prompt

Few-shot
learningwith
chatGPT
Suzan Verberne 2023
18

ChatGPT
 ChatGPT =
 GPT3.5
 + finetuning for conversations
 + reinforcement learning for better answers
Suzan Verberne 2023
19
https://openai.com/blog/chatgpt

WhyareLLMs
so powerful?
 Because they are HUGE (many parameters)
 And trained on HUGE data
Suzan Verberne 2023
20
https://huggingface.co/blog/large-language-models

Challenges and
problems with LLMs
Suzan Verberne 2023
21

Challengesand
problems
 Computational power
 Environmental footprint
 Heavy GPU computing required for training models
 Lengthy texts are challenging
 Low resource languages
 Low resource domains
 Closed models (‘OpenAI’) vs open source models
Suzan Verberne 2023
22
https://lessen-project.nl/ Together, the project partners will
develop, implement and evaluate state-of-the-art safe and
transparent chat-based conversational AI agents based on
state-of-the-art neural architectures. The focus is on lesser
resourced tasks, domains, and scenarios.

Challengesand
problems
 Factuality / consistency
 The output is fluent but not always correct
 Hallucination
Suzan Verberne 2023
23

Challengesand
problems
Suzan Verberne 2023
24

Challengesand
problems
Suzan Verberne 2023
25

Challengesand
problems
Suzan Verberne 2023
26

Challengesand
problems
 Search engines allow us to verify the source of the information
 Interfaces to generative language models should do the same
Suzan Verberne 2023
27

Consequences for work
and education
Suzan Verberne 2023
28

Consequences
forworkand
education
29
 Do not replace humans, but assist them to do
their work better
 When the boring part of the work is done by
computational models, the human can do the
interesting part
 (think about graphic designers using
generative models for creating images)
Suzan Verberne 2023

Consequences
forworkand
education
 Computational methods can help humans (students)
 Search engines
 Spelling correction
 Grammarly
 … Generative language models?
 New regulations
 We have to stress the importance of sources
 and of writing your own texts (and code!)
 and carefully pick our homework assignments
Suzan Verberne 2023
30

Research
opportunities
Use generative models to
 develop tools
 (e.g. QA-systems, chatbots, summarizers)
 generate training data1
 The prompting can be engineered to be more effective
 study linguistic phenomena
 which errors does the model make?
 study social phenomena
 simulate communication (opinionated /political content)2
Suzan Verberne 2023
31
1. https://github.com/arian-askari/ChatGPT-RetrievalQA
2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch
Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.

Final
recommendations
 Listen to the interview with Emily Bender
Suzan Verberne 2023
32
Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web

‘Big models’: the success and pitfalls of Transformer models in natural language processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing (20)

More from Leiden University

More from Leiden University (14)

Recently uploaded

Recently uploaded (20)

‘Big models’: the success and pitfalls of Transformer models in natural language processing