SlideShare a Scribd company logo
1 of 32
‘Big models’
ThesuccessandpitfallsofTransformermodelsinNLP
Suzan Verberne | NOTaS | March 2023
Today’stalk
 Large Language Models
 BERT
 Huggingface
 Generative Pretrained Transformers (GPT)
 Challenges and problems
 Consequences for work and education
Suzan Verberne 2023
2
Large Language Models
Suzan Verberne 2023
3
LargeLanguage
Models  Transformers: Attention is all you need (2017)
 Designed for sequence-to-sequence (i.e. translation)
 Encoder-decoder architecture
Suzan Verberne 2023
4
Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro
How it all started…
LargeLanguage
Models
Transformers are powerful because of
 the long-distance relation between all words (attention)
 parallel processing instead of sequential
 unsupervised pre-training on HUGE amount of data
Suzan Verberne 2023
5
LargeLanguage
Models BERT (Bidirectional Encoder Representations from
Transformers)
 An encoder-only transformer
 Input is text, output is embeddings
Suzan Verberne 2023
6
Next…
Some
linguistics…
BERT is based on the distributional hypothesis
 The context of a word defines its meaning
 Words that occur in similar contexts tend to be similar
Suzan Verberne 2023
Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162
Word
Embeddings
 BERT embeddings are learned from unlabelled data
 Through a process called ‘masked language modelling’
with self-supervision
Suzan Verberne 2023
BERT
 BERT is so powerful because
it is used in a transfer
learning setting
 Pre-training: learning
embeddings from huge
unlabeled data (self-
supervised)
 Fine-tuning: learning
the classification model
from smaller labeled
data (supervised) for
any NLP task (e.g.
sentiment, named
entities)
Suzan Verberne 2023
9
Huggingface
But also because:
 The authors (from Google) open-sourced the model
implementation
 And publicly release pretrained models (which are
computationally expensive to pretrain from scratch)
 https://huggingface.co/ is a the standard
implementation package for training and applying
Transformer models
 Currently over 150k models have been published on
Huggingface
Suzan Verberne 2023
10
11
Suzan Verberne 2023
Suzan Verberne 2023
12
Huggingface
Working with Huggingface
 Take a pre-trained model
 Run ‘zero-shot’:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
output=sentiment_pipeline(data)
print(output)
[{'label': 'POSITIVE', 'score': 0.9998656511306763},
{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
 Or fine-tune on your own data
Suzan Verberne 2023
13
Default model: distilbert-base-uncased-finetuned-sst-2-english
Generative Pretrained
Transformers (GPT)
Suzan Verberne 2023
14
GPT  GPT is a decoder-only transformer model
 It does not have an encoder
 Instead: use the prompt to generate outputs
 A growing family of models since 2018: GPT-2,
DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4
Suzan Verberne 2023
15
GPT-3
 GPT is trained to generate the most probable/plausible
text
 Trained on crawled internet data, open source books,
Wikipedia, sampled early 2022
 After each word, predict the most probable next word
given all the previous words
 It will give you fluent text that looks very real
Suzan Verberne 2023
16
Few-shot
learning
Few-shot learning: learn from a small number of examples
Suzan Verberne 2023
17
'Old paradigm'
• pre-training
• fine-tuning with ~100s-1000s
training samples
'New paradigm'
• pre-training
• prompting with ~3-50
examples in the prompt
Few-shot
learningwith
chatGPT
Suzan Verberne 2023
18
ChatGPT
 ChatGPT =
 GPT3.5
 + finetuning for conversations
 + reinforcement learning for better answers
Suzan Verberne 2023
19
https://openai.com/blog/chatgpt
WhyareLLMs
so powerful?
 Because they are HUGE (many parameters)
 And trained on HUGE data
Suzan Verberne 2023
20
https://huggingface.co/blog/large-language-models
Challenges and
problems with LLMs
Suzan Verberne 2023
21
Challengesand
problems
 Computational power
 Environmental footprint
 Heavy GPU computing required for training models
 Lengthy texts are challenging
 Low resource languages
 Low resource domains
 Closed models (‘OpenAI’) vs open source models
Suzan Verberne 2023
22
https://lessen-project.nl/ Together, the project partners will
develop, implement and evaluate state-of-the-art safe and
transparent chat-based conversational AI agents based on
state-of-the-art neural architectures. The focus is on lesser
resourced tasks, domains, and scenarios.
Challengesand
problems
 Factuality / consistency
 The output is fluent but not always correct
 Hallucination
Suzan Verberne 2023
23
Challengesand
problems
Suzan Verberne 2023
24
Challengesand
problems
Suzan Verberne 2023
25
Challengesand
problems
Suzan Verberne 2023
26
Challengesand
problems
 Search engines allow us to verify the source of the information
 Interfaces to generative language models should do the same
Suzan Verberne 2023
27
Consequences for work
and education
Suzan Verberne 2023
28
Consequences
forworkand
education
29
 Do not replace humans, but assist them to do
their work better
 When the boring part of the work is done by
computational models, the human can do the
interesting part
 (think about graphic designers using
generative models for creating images)
Suzan Verberne 2023
Consequences
forworkand
education
 Computational methods can help humans (students)
 Search engines
 Spelling correction
 Grammarly
 … Generative language models?
 New regulations
 We have to stress the importance of sources
 and of writing your own texts (and code!)
 and carefully pick our homework assignments
Suzan Verberne 2023
30
Research
opportunities
Use generative models to
 develop tools
 (e.g. QA-systems, chatbots, summarizers)
 generate training data1
 The prompting can be engineered to be more effective
 study linguistic phenomena
 which errors does the model make?
 study social phenomena
 simulate communication (opinionated /political content)2
Suzan Verberne 2023
31
1. https://github.com/arian-askari/ChatGPT-RetrievalQA
2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch
Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.
Final
recommendations
 Listen to the interview with Emily Bender
Suzan Verberne 2023
32
Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web

More Related Content

What's hot

Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Hady Elsahar
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERTATPowr
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3Raven Jiang
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 

What's hot (20)

Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Bert
BertBert
Bert
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing

TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCOREIJCI JOURNAL
 
Report finger print
Report finger printReport finger print
Report finger printEshaan Verma
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator ReportRaouf KESKES
 
Issues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptxIssues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptxSabbirHossen27
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)kim.mens
 
Natural Language Generation in the Wild
Natural Language Generation in the WildNatural Language Generation in the Wild
Natural Language Generation in the WildDaniel Beck
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!Bruno Capuano
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsKonfHubTechConferenc
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)marksuter
 
Device for text to speech production and to braille script
Device for text to speech production and to braille scriptDevice for text to speech production and to braille script
Device for text to speech production and to braille scriptIAEME Publication
 
Addressing open Machine Translation problems with Linked Data.
  Addressing open Machine Translation problems with Linked Data.  Addressing open Machine Translation problems with Linked Data.
Addressing open Machine Translation problems with Linked Data.DiegoMoussallem
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...D2L Barry
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image ProcessingIJERA Editor
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image ProcessingIJERA Editor
 

Similar to ‘Big models’: the success and pitfalls of Transformer models in natural language processing (20)

TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
 
Demo day
Demo dayDemo day
Demo day
 
Report finger print
Report finger printReport finger print
Report finger print
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
 
Issues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptxIssues in the Design of a Code Generator.pptx
Issues in the Design of a Code Generator.pptx
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)
 
Natural Language Generation in the Wild
Natural Language Generation in the WildNatural Language Generation in the Wild
Natural Language Generation in the Wild
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
 
2005_matzon
2005_matzon2005_matzon
2005_matzon
 
Thesis
ThesisThesis
Thesis
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)Game Design as an Intro to Computer Science (Meaningful Play 2014)
Game Design as an Intro to Computer Science (Meaningful Play 2014)
 
Ase01.ppt
Ase01.pptAse01.ppt
Ase01.ppt
 
Device for text to speech production and to braille script
Device for text to speech production and to braille scriptDevice for text to speech production and to braille script
Device for text to speech production and to braille script
 
Addressing open Machine Translation problems with Linked Data.
  Addressing open Machine Translation problems with Linked Data.  Addressing open Machine Translation problems with Linked Data.
Addressing open Machine Translation problems with Linked Data.
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
Brightspace and Math Formulae: Making Friends - 2014 Brightspace Ignite Wisco...
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 

More from Leiden University

Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discoveryLeiden University
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionalsLeiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van tekstenLeiden University
 
Summarizing discussion threads
Summarizing discussion threadsSummarizing discussion threads
Summarizing discussion threadsLeiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van tekstenLeiden University
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leiden University
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchLeiden University
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextLeiden University
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleLeiden University
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Leiden University
 

More from Leiden University (14)

Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discovery
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
 
Summarizing discussion threads
Summarizing discussion threadsSummarizing discussion threads
Summarizing discussion threads
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
 

Recently uploaded

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 

Recently uploaded (20)

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 

‘Big models’: the success and pitfalls of Transformer models in natural language processing

  • 2. Today’stalk  Large Language Models  BERT  Huggingface  Generative Pretrained Transformers (GPT)  Challenges and problems  Consequences for work and education Suzan Verberne 2023 2
  • 3. Large Language Models Suzan Verberne 2023 3
  • 4. LargeLanguage Models  Transformers: Attention is all you need (2017)  Designed for sequence-to-sequence (i.e. translation)  Encoder-decoder architecture Suzan Verberne 2023 4 Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro How it all started…
  • 5. LargeLanguage Models Transformers are powerful because of  the long-distance relation between all words (attention)  parallel processing instead of sequential  unsupervised pre-training on HUGE amount of data Suzan Verberne 2023 5
  • 6. LargeLanguage Models BERT (Bidirectional Encoder Representations from Transformers)  An encoder-only transformer  Input is text, output is embeddings Suzan Verberne 2023 6 Next…
  • 7. Some linguistics… BERT is based on the distributional hypothesis  The context of a word defines its meaning  Words that occur in similar contexts tend to be similar Suzan Verberne 2023 Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162
  • 8. Word Embeddings  BERT embeddings are learned from unlabelled data  Through a process called ‘masked language modelling’ with self-supervision Suzan Verberne 2023
  • 9. BERT  BERT is so powerful because it is used in a transfer learning setting  Pre-training: learning embeddings from huge unlabeled data (self- supervised)  Fine-tuning: learning the classification model from smaller labeled data (supervised) for any NLP task (e.g. sentiment, named entities) Suzan Verberne 2023 9
  • 10. Huggingface But also because:  The authors (from Google) open-sourced the model implementation  And publicly release pretrained models (which are computationally expensive to pretrain from scratch)  https://huggingface.co/ is a the standard implementation package for training and applying Transformer models  Currently over 150k models have been published on Huggingface Suzan Verberne 2023 10
  • 13. Huggingface Working with Huggingface  Take a pre-trained model  Run ‘zero-shot’: from transformers import pipeline sentiment_pipeline = pipeline("sentiment-analysis") data = ["I love you", "I hate you"] output=sentiment_pipeline(data) print(output) [{'label': 'POSITIVE', 'score': 0.9998656511306763}, {'label': 'NEGATIVE', 'score': 0.9991129040718079}]  Or fine-tune on your own data Suzan Verberne 2023 13 Default model: distilbert-base-uncased-finetuned-sst-2-english
  • 15. GPT  GPT is a decoder-only transformer model  It does not have an encoder  Instead: use the prompt to generate outputs  A growing family of models since 2018: GPT-2, DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4 Suzan Verberne 2023 15
  • 16. GPT-3  GPT is trained to generate the most probable/plausible text  Trained on crawled internet data, open source books, Wikipedia, sampled early 2022  After each word, predict the most probable next word given all the previous words  It will give you fluent text that looks very real Suzan Verberne 2023 16
  • 17. Few-shot learning Few-shot learning: learn from a small number of examples Suzan Verberne 2023 17 'Old paradigm' • pre-training • fine-tuning with ~100s-1000s training samples 'New paradigm' • pre-training • prompting with ~3-50 examples in the prompt
  • 19. ChatGPT  ChatGPT =  GPT3.5  + finetuning for conversations  + reinforcement learning for better answers Suzan Verberne 2023 19 https://openai.com/blog/chatgpt
  • 20. WhyareLLMs so powerful?  Because they are HUGE (many parameters)  And trained on HUGE data Suzan Verberne 2023 20 https://huggingface.co/blog/large-language-models
  • 21. Challenges and problems with LLMs Suzan Verberne 2023 21
  • 22. Challengesand problems  Computational power  Environmental footprint  Heavy GPU computing required for training models  Lengthy texts are challenging  Low resource languages  Low resource domains  Closed models (‘OpenAI’) vs open source models Suzan Verberne 2023 22 https://lessen-project.nl/ Together, the project partners will develop, implement and evaluate state-of-the-art safe and transparent chat-based conversational AI agents based on state-of-the-art neural architectures. The focus is on lesser resourced tasks, domains, and scenarios.
  • 23. Challengesand problems  Factuality / consistency  The output is fluent but not always correct  Hallucination Suzan Verberne 2023 23
  • 27. Challengesand problems  Search engines allow us to verify the source of the information  Interfaces to generative language models should do the same Suzan Verberne 2023 27
  • 28. Consequences for work and education Suzan Verberne 2023 28
  • 29. Consequences forworkand education 29  Do not replace humans, but assist them to do their work better  When the boring part of the work is done by computational models, the human can do the interesting part  (think about graphic designers using generative models for creating images) Suzan Verberne 2023
  • 30. Consequences forworkand education  Computational methods can help humans (students)  Search engines  Spelling correction  Grammarly  … Generative language models?  New regulations  We have to stress the importance of sources  and of writing your own texts (and code!)  and carefully pick our homework assignments Suzan Verberne 2023 30
  • 31. Research opportunities Use generative models to  develop tools  (e.g. QA-systems, chatbots, summarizers)  generate training data1  The prompting can be engineered to be more effective  study linguistic phenomena  which errors does the model make?  study social phenomena  simulate communication (opinionated /political content)2 Suzan Verberne 2023 31 1. https://github.com/arian-askari/ChatGPT-RetrievalQA 2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.
  • 32. Final recommendations  Listen to the interview with Emily Bender Suzan Verberne 2023 32 Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web