SlideShare a Scribd company logo
‘Big models’
ThesuccessandpitfallsofTransformermodelsinNLP
Suzan Verberne | NOTaS | March 2023
Today’stalk
 Large Language Models
 BERT
 Huggingface
 Generative Pretrained Transformers (GPT)
 Challenges and problems
 Consequences for work and education
Suzan Verberne 2023
2
Large Language Models
Suzan Verberne 2023
3
LargeLanguage
Models  Transformers: Attention is all you need (2017)
 Designed for sequence-to-sequence (i.e. translation)
 Encoder-decoder architecture
Suzan Verberne 2023
4
Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro
How it all started…
LargeLanguage
Models
Transformers are powerful because of
 the long-distance relation between all words (attention)
 parallel processing instead of sequential
 unsupervised pre-training on HUGE amount of data
Suzan Verberne 2023
5
LargeLanguage
Models BERT (Bidirectional Encoder Representations from
Transformers)
 An encoder-only transformer
 Input is text, output is embeddings
Suzan Verberne 2023
6
Next…
Some
linguistics…
BERT is based on the distributional hypothesis
 The context of a word defines its meaning
 Words that occur in similar contexts tend to be similar
Suzan Verberne 2023
Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162
Word
Embeddings
 BERT embeddings are learned from unlabelled data
 Through a process called ‘masked language modelling’
with self-supervision
Suzan Verberne 2023
BERT
 BERT is so powerful because
it is used in a transfer
learning setting
 Pre-training: learning
embeddings from huge
unlabeled data (self-
supervised)
 Fine-tuning: learning
the classification model
from smaller labeled
data (supervised) for
any NLP task (e.g.
sentiment, named
entities)
Suzan Verberne 2023
9
Huggingface
But also because:
 The authors (from Google) open-sourced the model
implementation
 And publicly release pretrained models (which are
computationally expensive to pretrain from scratch)
 https://huggingface.co/ is a the standard
implementation package for training and applying
Transformer models
 Currently over 150k models have been published on
Huggingface
Suzan Verberne 2023
10
11
Suzan Verberne 2023
Suzan Verberne 2023
12
Huggingface
Working with Huggingface
 Take a pre-trained model
 Run ‘zero-shot’:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
output=sentiment_pipeline(data)
print(output)
[{'label': 'POSITIVE', 'score': 0.9998656511306763},
{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
 Or fine-tune on your own data
Suzan Verberne 2023
13
Default model: distilbert-base-uncased-finetuned-sst-2-english
Generative Pretrained
Transformers (GPT)
Suzan Verberne 2023
14
GPT  GPT is a decoder-only transformer model
 It does not have an encoder
 Instead: use the prompt to generate outputs
 A growing family of models since 2018: GPT-2,
DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4
Suzan Verberne 2023
15
GPT-3
 GPT is trained to generate the most probable/plausible
text
 Trained on crawled internet data, open source books,
Wikipedia, sampled early 2022
 After each word, predict the most probable next word
given all the previous words
 It will give you fluent text that looks very real
Suzan Verberne 2023
16
Few-shot
learning
Few-shot learning: learn from a small number of examples
Suzan Verberne 2023
17
'Old paradigm'
• pre-training
• fine-tuning with ~100s-1000s
training samples
'New paradigm'
• pre-training
• prompting with ~3-50
examples in the prompt
Few-shot
learningwith
chatGPT
Suzan Verberne 2023
18
ChatGPT
 ChatGPT =
 GPT3.5
 + finetuning for conversations
 + reinforcement learning for better answers
Suzan Verberne 2023
19
https://openai.com/blog/chatgpt
WhyareLLMs
so powerful?
 Because they are HUGE (many parameters)
 And trained on HUGE data
Suzan Verberne 2023
20
https://huggingface.co/blog/large-language-models
Challenges and
problems with LLMs
Suzan Verberne 2023
21
Challengesand
problems
 Computational power
 Environmental footprint
 Heavy GPU computing required for training models
 Lengthy texts are challenging
 Low resource languages
 Low resource domains
 Closed models (‘OpenAI’) vs open source models
Suzan Verberne 2023
22
https://lessen-project.nl/ Together, the project partners will
develop, implement and evaluate state-of-the-art safe and
transparent chat-based conversational AI agents based on
state-of-the-art neural architectures. The focus is on lesser
resourced tasks, domains, and scenarios.
Challengesand
problems
 Factuality / consistency
 The output is fluent but not always correct
 Hallucination
Suzan Verberne 2023
23
Challengesand
problems
Suzan Verberne 2023
24
Challengesand
problems
Suzan Verberne 2023
25
Challengesand
problems
Suzan Verberne 2023
26
Challengesand
problems
 Search engines allow us to verify the source of the information
 Interfaces to generative language models should do the same
Suzan Verberne 2023
27
Consequences for work
and education
Suzan Verberne 2023
28
Consequences
forworkand
education
29
 Do not replace humans, but assist them to do
their work better
 When the boring part of the work is done by
computational models, the human can do the
interesting part
 (think about graphic designers using
generative models for creating images)
Suzan Verberne 2023
Consequences
forworkand
education
 Computational methods can help humans (students)
 Search engines
 Spelling correction
 Grammarly
 … Generative language models?
 New regulations
 We have to stress the importance of sources
 and of writing your own texts (and code!)
 and carefully pick our homework assignments
Suzan Verberne 2023
30
Research
opportunities
Use generative models to
 develop tools
 (e.g. QA-systems, chatbots, summarizers)
 generate training data1
 The prompting can be engineered to be more effective
 study linguistic phenomena
 which errors does the model make?
 study social phenomena
 simulate communication (opinionated /political content)2
Suzan Verberne 2023
31
1. https://github.com/arian-askari/ChatGPT-RetrievalQA
2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch
Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.
Final
recommendations
 Listen to the interview with Emily Bender
Suzan Verberne 2023
32
Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web

More Related Content

What's hot

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Gautier Marti
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
Loic Merckel
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
Steven Van Vaerenbergh
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
Bert
BertBert
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
AnastasiaSteele10
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
ChaoYang81
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
Ishan Jain
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
SynaptonIncorporated
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
OzgurOscarOzkan
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel
 
LLM presentation final
LLM presentation finalLLM presentation final
LLM presentation final
Ruth Griffin
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 

What's hot (20)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Bert
BertBert
Bert
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
LLM presentation final
LLM presentation finalLLM presentation final
LLM presentation final
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing

TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
IJCI JOURNAL
 
Demo day
Demo dayDemo day
Demo day
DeepikaRana30
 
Report finger print
Report finger printReport finger print
Report finger print
Eshaan Verma
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
Raouf KESKES
 
Issues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptxIssues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptx
SabbirHossen27
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)
kim.mens
 
Natural Language Generation in the Wild
Natural Language Generation in the WildNatural Language Generation in the Wild
Natural Language Generation in the Wild
Daniel Beck
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
Bruno Capuano
 
2005_matzon
2005_matzon2005_matzon
Thesis
ThesisThesis
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
KonfHubTechConferenc
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
Qualcomm Research
 
Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)
marksuter
 
Ase01.ppt
Ase01.pptAse01.ppt
Device for text to speech production and to braille script
Device for text to speech production and to braille scriptDevice for text to speech production and to braille script
Device for text to speech production and to braille script
IAEME Publication
 
Addressing open Machine Translation problems with Linked Data.
  Addressing open Machine Translation problems with Linked Data.  Addressing open Machine Translation problems with Linked Data.
Addressing open Machine Translation problems with Linked Data.
DiegoMoussallem
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
D2L Barry
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
IJERA Editor
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
IJERA Editor
 

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing (20)

TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
 
Demo day
Demo dayDemo day
Demo day
 
Report finger print
Report finger printReport finger print
Report finger print
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
 
Issues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptxIssues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptx
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)
 
Natural Language Generation in the Wild
Natural Language Generation in the WildNatural Language Generation in the Wild
Natural Language Generation in the Wild
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
 
2005_matzon
2005_matzon2005_matzon
2005_matzon
 
Thesis
ThesisThesis
Thesis
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)
 
Ase01.ppt
Ase01.pptAse01.ppt
Ase01.ppt
 
Device for text to speech production and to braille script
Device for text to speech production and to braille scriptDevice for text to speech production and to braille script
Device for text to speech production and to braille script
 
Addressing open Machine Translation problems with Linked Data.
  Addressing open Machine Translation problems with Linked Data.  Addressing open Machine Translation problems with Linked Data.
Addressing open Machine Translation problems with Linked Data.
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 

More from Leiden University

Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discovery
Leiden University
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
Leiden University
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
Leiden University
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
Leiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
Leiden University
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
Leiden University
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
Leiden University
 
Summarizing discussion threads
Summarizing discussion threadsSummarizing discussion threads
Summarizing discussion threads
Leiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
Leiden University
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leiden University
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
Leiden University
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
Leiden University
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
Leiden University
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
Leiden University
 

More from Leiden University (14)

Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discovery
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
 
Summarizing discussion threads
Summarizing discussion threadsSummarizing discussion threads
Summarizing discussion threads
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 

‘Big models’: the success and pitfalls of Transformer models in natural language processing

  • 2. Today’stalk  Large Language Models  BERT  Huggingface  Generative Pretrained Transformers (GPT)  Challenges and problems  Consequences for work and education Suzan Verberne 2023 2
  • 3. Large Language Models Suzan Verberne 2023 3
  • 4. LargeLanguage Models  Transformers: Attention is all you need (2017)  Designed for sequence-to-sequence (i.e. translation)  Encoder-decoder architecture Suzan Verberne 2023 4 Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro How it all started…
  • 5. LargeLanguage Models Transformers are powerful because of  the long-distance relation between all words (attention)  parallel processing instead of sequential  unsupervised pre-training on HUGE amount of data Suzan Verberne 2023 5
  • 6. LargeLanguage Models BERT (Bidirectional Encoder Representations from Transformers)  An encoder-only transformer  Input is text, output is embeddings Suzan Verberne 2023 6 Next…
  • 7. Some linguistics… BERT is based on the distributional hypothesis  The context of a word defines its meaning  Words that occur in similar contexts tend to be similar Suzan Verberne 2023 Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162
  • 8. Word Embeddings  BERT embeddings are learned from unlabelled data  Through a process called ‘masked language modelling’ with self-supervision Suzan Verberne 2023
  • 9. BERT  BERT is so powerful because it is used in a transfer learning setting  Pre-training: learning embeddings from huge unlabeled data (self- supervised)  Fine-tuning: learning the classification model from smaller labeled data (supervised) for any NLP task (e.g. sentiment, named entities) Suzan Verberne 2023 9
  • 10. Huggingface But also because:  The authors (from Google) open-sourced the model implementation  And publicly release pretrained models (which are computationally expensive to pretrain from scratch)  https://huggingface.co/ is a the standard implementation package for training and applying Transformer models  Currently over 150k models have been published on Huggingface Suzan Verberne 2023 10
  • 13. Huggingface Working with Huggingface  Take a pre-trained model  Run ‘zero-shot’: from transformers import pipeline sentiment_pipeline = pipeline("sentiment-analysis") data = ["I love you", "I hate you"] output=sentiment_pipeline(data) print(output) [{'label': 'POSITIVE', 'score': 0.9998656511306763}, {'label': 'NEGATIVE', 'score': 0.9991129040718079}]  Or fine-tune on your own data Suzan Verberne 2023 13 Default model: distilbert-base-uncased-finetuned-sst-2-english
  • 15. GPT  GPT is a decoder-only transformer model  It does not have an encoder  Instead: use the prompt to generate outputs  A growing family of models since 2018: GPT-2, DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4 Suzan Verberne 2023 15
  • 16. GPT-3  GPT is trained to generate the most probable/plausible text  Trained on crawled internet data, open source books, Wikipedia, sampled early 2022  After each word, predict the most probable next word given all the previous words  It will give you fluent text that looks very real Suzan Verberne 2023 16
  • 17. Few-shot learning Few-shot learning: learn from a small number of examples Suzan Verberne 2023 17 'Old paradigm' • pre-training • fine-tuning with ~100s-1000s training samples 'New paradigm' • pre-training • prompting with ~3-50 examples in the prompt
  • 19. ChatGPT  ChatGPT =  GPT3.5  + finetuning for conversations  + reinforcement learning for better answers Suzan Verberne 2023 19 https://openai.com/blog/chatgpt
  • 20. WhyareLLMs so powerful?  Because they are HUGE (many parameters)  And trained on HUGE data Suzan Verberne 2023 20 https://huggingface.co/blog/large-language-models
  • 21. Challenges and problems with LLMs Suzan Verberne 2023 21
  • 22. Challengesand problems  Computational power  Environmental footprint  Heavy GPU computing required for training models  Lengthy texts are challenging  Low resource languages  Low resource domains  Closed models (‘OpenAI’) vs open source models Suzan Verberne 2023 22 https://lessen-project.nl/ Together, the project partners will develop, implement and evaluate state-of-the-art safe and transparent chat-based conversational AI agents based on state-of-the-art neural architectures. The focus is on lesser resourced tasks, domains, and scenarios.
  • 23. Challengesand problems  Factuality / consistency  The output is fluent but not always correct  Hallucination Suzan Verberne 2023 23
  • 27. Challengesand problems  Search engines allow us to verify the source of the information  Interfaces to generative language models should do the same Suzan Verberne 2023 27
  • 28. Consequences for work and education Suzan Verberne 2023 28
  • 29. Consequences forworkand education 29  Do not replace humans, but assist them to do their work better  When the boring part of the work is done by computational models, the human can do the interesting part  (think about graphic designers using generative models for creating images) Suzan Verberne 2023
  • 30. Consequences forworkand education  Computational methods can help humans (students)  Search engines  Spelling correction  Grammarly  … Generative language models?  New regulations  We have to stress the importance of sources  and of writing your own texts (and code!)  and carefully pick our homework assignments Suzan Verberne 2023 30
  • 31. Research opportunities Use generative models to  develop tools  (e.g. QA-systems, chatbots, summarizers)  generate training data1  The prompting can be engineered to be more effective  study linguistic phenomena  which errors does the model make?  study social phenomena  simulate communication (opinionated /political content)2 Suzan Verberne 2023 31 1. https://github.com/arian-askari/ChatGPT-RetrievalQA 2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.
  • 32. Final recommendations  Listen to the interview with Emily Bender Suzan Verberne 2023 32 Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web