Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Software and Systems Engineering Standards: Verification and Validation of Sy...
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
1. How to supervise a PhD in NLP
in the ChatGPT area?
WiMLDS
September 27th, 2023
Laure Soulier
2. Who I am?
2
Associate professor at Sorbonne University - MLIA team in the ISIR lab
Research interests:
- Information retrieval and NLP
- Deep learning, representation learning
- Language models
Supervision:
- 3 defended theses
- 6 on-going theses
- 1 postdoctoral researcher /year
- 2-3 master intern students /year
Conversational search
& neural ranking models
Data-to-text generation
Language grounding
2
3. Why this topic?
à The ChatGPT craze
3
1 million users in 5 days
173 million active users in April 2023
0
5000
10000
15000
20000
25000
30000
35000
40000
2
0
1
5
2
0
1
6
2
0
1
7
2
0
1
8
2
0
1
9
2
0
2
0
2
0
2
1
2
0
2
2
2
0
2
3
Large language models Language models
à Emergence of large
language models
àThings are moving faster and
faster in the research community
(statistics extracted from google scholar)
A Survey of Large Language Models, Zhao et al, 2023
4. For who is this talk?
à Colleagues: opening up a debate
- What to expect from Ph.D. students
- How to « survive »
4
5. For who is this talk?
à (Future) PhD sutdents
- What to expect from your advisors
- How to « survive »
5
6. For who is this talk?
à Industrial partners
- How to collaborate with Ph.D. students during a CIFRE
- Indentifying what Ph.D. are good at
6
7. For who is this talk?
à Curious people
- What does a thesis look like?
7
8. Outline of the talk
➜ Overview of LLM
➜ The impact of recent advances of LLM on NLP use cases
8
This talk is built on the basis of my own experience and does not engage colleagues.
You might have different opinions or different experiences.
Feel free to share them in the Q&A sessions or during the cocktail!
Conversational
search
Data-to-text
generation
9. (Large) Language Models
Given a sequence of items !!, !", … , !#$!, what is the probability of the next item !#?
$ !# !!, !", … , !#$!)
A salad is composed of (Large) Language model
Lettuce Probability: 0.9
Tomatoes Probability: 0.85
Corn Probability: 0.6
Ice cream Probability: 0.001
.
.
.
Principle:
- Modeling the probability of sequences !!, !", … , !_'
- Items may be words, characters, character ngrams, word pieces, etc
Semantics, word representation and latent space
Salad
Lettuce
Tomatoes
Ice cream
Corn Salad = (0.3, 0.2, 0.45, -0.1, -0.3)
Lettuce = (0.2, 0.1, 0.38, -0.5, -0.4)
…
Ice cream = (-0.9, -0.3, -0.5, 0.8, 0.7)
9
10. (Large) Language Models
Transformer networks (2017) A encoder-decoder neural network w/:
- About 65M parameters
- Successive feed-forward blocks
- Paralel heads
… That estimates contextual representations of items
with self-attention
Distinguishing Washington/city from Washington/man
(Vaswini et al 2017)
10
12. Large Language Models: interesting properties
➜ Prompting
➜ Prompt :
Instruction explicitly expressing
what is expected
➜ Challenge:
Writing the good prompt
(task, context, expected output …)
➜ Implication:
Everything is generation
From Thomas Gerald - 2023
Translate this sentence in
French: « the sun shines »
Output:
Le soleil brille
12
13. Large Language Models: interesting properties
➜ In-context learning
• Learning from examples mentioned in the prompt
• Without fine-tuning of the model
Multimodal few-shot learning with frozen language models, Tsimpoukelli et al. 2021
13
14. Large Language Models: interesting properties
1. Language model: general knowledge
2. Adaptation to a new task with fine-tuning
cat dog
Encoder
Pretraining
text
Decoder
words & text
representations
Word prediction; sentence completion; ...
Pretrained Language Model Finetuned Model
Language Model
your
(small)
data
expected
target
+
Adapted Language
Model
Massive corpus
= 3%
of the corpus
It's raining MASK and PRED
14
16. Use case on conversational search
Introduction
→ Replacing or augmenting IR systems to perform search session in natural language
Objectives [Radlinsky and Craswell 2017, Culpepper et al 2018]
6
16
17. Use case on conversational search
17
→ Understanding users’ information need
→ Retrieving documents according to the conversation context
→ Generating a response according to the retrieved documents
Initial definition of the research project
What current LLMs do
What we need
- Capturing the semantics of words
- Leveraging the conversation context
- Word representations
- Prompting*
What current LLMs do
What we need
- Matching contextual information
needs with documents
- Leveraging users’ feedback
- Word representations
- Neural ranking models
What current LLMs do
What we need
- Synthesizing document content into a
structured response
- Text generation
- Prompting*
2017
Pierre
Erbacher’s
thesis
18. Use case on conversational search
Proactive information systems
with clarifying questions
18
First strategy: Thinking to the next step 2018-2019
→ Multi-turn clarification framework and analyzing its impact on the retrieval effectiveness
[Erbacher et al., SIGIR 2021]
Contributions
à What existed:
- Small human-annotated datasets
- Single-turn interaction datasets
Except that….
19. Use case on conversational search
19
How to react? Which strategy?
- Stop your thesis? Change thesis subject?
- Change task?
- Since GPT3 and ChatGPT are not open-sourced, designing an open-source model
- … What else?
20. Use case on conversational search
20
Second strategy: Leveraging existing models 2023
→ Generating new conversational search sessions using IR datasets
LLM with the following prompt:
« Query: q Facet: f »
fine-tune to generate clarifying questions
LLM with the following prompt:
« Query: q Intent: i Question: cq »
fine-tune to a yes/no user’s answer
21. Use case on conversational search
21
Second strategy: Leveraging existing models 2023
→ Beyond Toolformer: learning LLM when to search
Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al, 2023
Our approach
(Erbacher et al – under submission)
Toolformer
23. Conclusion - Discussion
What it has changed in a thesis?
à Huge competition
à Big actors, huge number of (un)submitted papers
à Big GPU clusters (but we have Jean Zay!!!!)
à Collaborative projects between Ph.D. students (and advisors)
à Faster reactivity against the literature review
à More experiments
à Not a 3-year project anymore
à Adapt the research project to on-going innovations
23
24. Conclusion - Discussion
24
à Don’t be afraid!
à You are not the only one facing the tornado
à No pression: you don't have to create
version 10 of the transformer
à It is always possible to find a good idea
à You are learning valuable knowledge and skills
à Might be difficult to design effective models
à You are learning a methodology
à You are accumulating knowledge on the best
LLMs
à Be passionate!
Wrap up for future and current Ph.D. students
25. 25
Thank you for your attention
@LaureSoulier
laure-soulier-18829948
https://pages.isir.upmc.fr/soulier/