Large language models (LLMs) are in the spotlight. Laypeople are aware of and are using the LLMs such as OpenAI’s ChatGPT and Google’s Gemini on a daily basis. While companies are exploring new business opportunities, researchers have gained access to an unprecedented scientific playground that allows for fast experimentation with limited resources and immediate results. In this talk, using concrete examples from requirements engineering, I am going to put forward several research opportunities that are enabled by the advent of LLMs. I will show how LLMs, as a key example of modern AI, unlock research topics that were deemed as too challenging until recently. Then, I will critically discuss the perils that we face when it comes to planning, conducting, and reporting on credible research results following a rigorous scientific approach. This talk will stress the inherent tension between the exciting affordances offered by this new technology, which include the ability to generate non-factual outputs (fiction), and our role and societal responsibility as information scientists.
Improve your value at work: Using evidence to influence decision makingKaren Schriver
This talk—part of a keynote to the 2019 Summit of the Society for Technical Communication—argued that communicators need to become more critical consumers of evidence and empirical research. It showed how bringing data to the table (rather than opinion) can have a powerful influence on how others view your value at work. Learning how to use evidence can develop your personal expertise and enhance your credibility on the job—whether you focus on editing, content strategy, technical illustration, plain language, or information design.
Academics: bring your own identity. Exploratory thoughts and a plug for the ORCID ecosystem.
By Amber Thomas, head of Academic Technology Team at the University of Warwick UK. @ambrouk
A fundamental philosophy from the early days of Agile, and particularly of XP, is that teams should own their process. Today we would say that they should be allowed, and better yet, enabled, to choose their own way of working (WoW).
This was a powerful vision, but it was quickly abandoned to make way for the Agile certification gold rush. Why do the hard work of learning your craft, of improving your WoW via experimentation and learning, when you can instead become a certified master of an agile method in two days or a program consultant of a scaling framework in four? It sounds great, and certainly is great for anyone collecting the money, but 18 years after the signing of the Agile Manifesto as an industry we’re nowhere near reaching Agile’s promise. Nowhere near it.
We had it right in the very beginning, and the lean community had it right all along – teams need to own their process, they must be enabled to choose their WoW. To do this we need to stop looking for easy answers, we must reject the simplistic solutions that the agile industrial complex wants to sell us, and most importantly recognize that we need #NoFrameworks.
Improve your value at work: Using evidence to influence decision makingKaren Schriver
This talk—part of a keynote to the 2019 Summit of the Society for Technical Communication—argued that communicators need to become more critical consumers of evidence and empirical research. It showed how bringing data to the table (rather than opinion) can have a powerful influence on how others view your value at work. Learning how to use evidence can develop your personal expertise and enhance your credibility on the job—whether you focus on editing, content strategy, technical illustration, plain language, or information design.
Academics: bring your own identity. Exploratory thoughts and a plug for the ORCID ecosystem.
By Amber Thomas, head of Academic Technology Team at the University of Warwick UK. @ambrouk
A fundamental philosophy from the early days of Agile, and particularly of XP, is that teams should own their process. Today we would say that they should be allowed, and better yet, enabled, to choose their own way of working (WoW).
This was a powerful vision, but it was quickly abandoned to make way for the Agile certification gold rush. Why do the hard work of learning your craft, of improving your WoW via experimentation and learning, when you can instead become a certified master of an agile method in two days or a program consultant of a scaling framework in four? It sounds great, and certainly is great for anyone collecting the money, but 18 years after the signing of the Agile Manifesto as an industry we’re nowhere near reaching Agile’s promise. Nowhere near it.
We had it right in the very beginning, and the lean community had it right all along – teams need to own their process, they must be enabled to choose their WoW. To do this we need to stop looking for easy answers, we must reject the simplistic solutions that the agile industrial complex wants to sell us, and most importantly recognize that we need #NoFrameworks.
“Digital Transformation: Going Beyond Buzzwords” - ConveyUX Boston 2019 Keyno...Jaime Levy Consulting
Digital Transformation is not about applying the latest trending technology to your company’s value proposition out of fear of falling behind. Instead, it’s an overarching strategy with measurable milestones for reshaping the way that the business runs in order to provide a better customer experience. This requires senior leadership, product owners and cross-functional teams to evolve their corporate culture into one where collaboration, rapid experimentation, and process optimization is the norm. This talk provides a theoretical foundation along with practical techniques for the implementation of real Digital Transformation.
Working at the Edge: Developing a Cross-disciplinary Research AgendaArosha Bandara
Slides from a seminar delivered to the School of Computing & Communications on the opportunities of cross-disciplinary research and strategies for overcoming some of the challenges.
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
Dongmei Zhang and Tao Xie. Pathways to Technology Transfer and Adoption: Achievements and Challenges. In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), Mini-Tutorial, San Francisco, CA, May 2013. http://people.engr.ncsu.edu/txie/publications/icse13seip-techtransfer.pdf
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengePurdue RCODI
This IronHacks Live: Info Session provided details on the Summer 2020: COVID-19 Data Science Challenge hosted by the IronHacks Team at the Research Center for Open Digital Innovation (RCODI) at Purdue University.
Requirements Conversations: A New Frontier in AI-for-REFabiano Dalpiaz
Natural language processing and machine/deep learning have been widely used by RE researchers for the automated analysis of written artefacts, notably requirements specifications, which play a pivotal role in RE processes. Many key RE activities, however, are rooted in synchronous conversations rather than written documents: elicitation interviews and workshops, refinement meetings, and validation sessions, to name a few. Very limited research exists that applies AI techniques to conversations. Most likely, this has to do with the fact that researchers can more easily gain access to specifications than to recorded and transcribed conversations. The availability of digital communication tools that allow for automated recording and transcription (e.g., Teams and Zoom), together with the increasing use of these tools in RE activities, offers an opportunity for RE researchers to automatically analyze requirements conversations. This is especially interesting in agile development settings, where minimal documentation is preferred to comprehensive documentation. In this talk, I will present use cases for the analysis of RE conversations, explore the unique challenges of this setting, and discuss ongoing research concerning the identification of requirements-relevant information in RE conversations.
The Generative AI System Shock, and some thoughts on Collective Intelligence ...Simon Buckingham Shum
Keynote Address: Team-based Learning Collaborative Asia Pacific Community (TBLC-APC) Symposium (“Impact of emerging technologies on learning strategies”) 8-9 February 2024, Sydney https://tbl.sydney.edu.au
“Digital Transformation: Going Beyond Buzzwords” - ConveyUX Boston 2019 Keyno...Jaime Levy Consulting
Digital Transformation is not about applying the latest trending technology to your company’s value proposition out of fear of falling behind. Instead, it’s an overarching strategy with measurable milestones for reshaping the way that the business runs in order to provide a better customer experience. This requires senior leadership, product owners and cross-functional teams to evolve their corporate culture into one where collaboration, rapid experimentation, and process optimization is the norm. This talk provides a theoretical foundation along with practical techniques for the implementation of real Digital Transformation.
Working at the Edge: Developing a Cross-disciplinary Research AgendaArosha Bandara
Slides from a seminar delivered to the School of Computing & Communications on the opportunities of cross-disciplinary research and strategies for overcoming some of the challenges.
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
Dongmei Zhang and Tao Xie. Pathways to Technology Transfer and Adoption: Achievements and Challenges. In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), Mini-Tutorial, San Francisco, CA, May 2013. http://people.engr.ncsu.edu/txie/publications/icse13seip-techtransfer.pdf
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengePurdue RCODI
This IronHacks Live: Info Session provided details on the Summer 2020: COVID-19 Data Science Challenge hosted by the IronHacks Team at the Research Center for Open Digital Innovation (RCODI) at Purdue University.
Requirements Conversations: A New Frontier in AI-for-REFabiano Dalpiaz
Natural language processing and machine/deep learning have been widely used by RE researchers for the automated analysis of written artefacts, notably requirements specifications, which play a pivotal role in RE processes. Many key RE activities, however, are rooted in synchronous conversations rather than written documents: elicitation interviews and workshops, refinement meetings, and validation sessions, to name a few. Very limited research exists that applies AI techniques to conversations. Most likely, this has to do with the fact that researchers can more easily gain access to specifications than to recorded and transcribed conversations. The availability of digital communication tools that allow for automated recording and transcription (e.g., Teams and Zoom), together with the increasing use of these tools in RE activities, offers an opportunity for RE researchers to automatically analyze requirements conversations. This is especially interesting in agile development settings, where minimal documentation is preferred to comprehensive documentation. In this talk, I will present use cases for the analysis of RE conversations, explore the unique challenges of this setting, and discuss ongoing research concerning the identification of requirements-relevant information in RE conversations.
The Generative AI System Shock, and some thoughts on Collective Intelligence ...Simon Buckingham Shum
Keynote Address: Team-based Learning Collaborative Asia Pacific Community (TBLC-APC) Symposium (“Impact of emerging technologies on learning strategies”) 8-9 February 2024, Sydney https://tbl.sydney.edu.au
Similar to Information science research with large language models: between science and fiction (20)
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Information science research with large language models: between science and fiction
1. Information science research with large
language models: between science and fiction
Fabiano Dalpiaz
Requirements Engineering Lab
Utrecht University, the Netherlands
May 15, 2024
f.dalpiaz@uu.nl @FabianoDalpiaz fabianodalpiaz
2. 1. Large Language Models
@2024 Fabiano Dalpiaz
2
ChatGPT, depicted by ChatGPT 4.0 + DALL-E
6. LLMs in information science research
@2024 Fabiano Dalpiaz
6
⚠ LLM use disclaimers?
• “drafted by ChatGPT – rephrased by Quillbot –
images by MidJourney – prompts in Appendix A”?
⚠ Legal and ethical implications
⚠ Quoting ≠ paraphrasing
What’s ahead?
👉 Dedicated conference tracks about LLMs
👉 Exciting avenues for research!
7. LLMs in Software Engineering research
@2024 Fabiano Dalpiaz
7
A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Senguta, S.Yoo, J.M. Zhang. ”Large Language Models for Software Engineering: Survey and Open Problems." arXiv:2310.03533, 2023
ICSE’24 main track
8. How are YOU using LLMs in YOUR research?
@2024 Fabiano Dalpiaz
8
9. Key Message 1: Accept the Evolution
@2024 Fabiano Dalpiaz
9
Can assist us in
science fiction tasks
Large
Language
Models
are here
• As citizens
• As researchers
• As educators
They are
and will be
changing
our lives
10. 2. Credibility in (information) science research
@2024 Fabiano Dalpiaz
10
11. IS research in the small – simplified illustration
@2024 Fabiano Dalpiaz
11
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
12. Credibility in information science research
@2024 Fabiano Dalpiaz
12
Interesting, this seems a
breakthrough. But…
how can I trust what the
authors claim?
PhD student Elize
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
13. How do YOU assess the credibility of a paper?
@2024 Fabiano Dalpiaz
13
14. Threats to credibility – the idea
@2024 Fabiano Dalpiaz
14
That idea is
wrong in the
first place!
Jim, the reviewer
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
15. Threats to credibility – the idea
@2024 Fabiano Dalpiaz
14
That idea is
wrong in the
first place!
Jim, the reviewer
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Invalid criticism in science!
16. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – the conceptual framework
@2024 Fabiano Dalpiaz
16
It builds on a
rejected theory
It proposes a
theory that hasn’t
been tested yet
Jim, the reviewer
17. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – the constructed artifact
@2024 Fabiano Dalpiaz
17
Simplistic, partially
implemented
It conflicts with
the conceptual
framework
Jim, the reviewer
18. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – validation / evaluation
@2024 Fabiano Dalpiaz
18
• The evaluation is too small
• Mislabeled: is it a case study / experiment?
• The experimental design is flawed
• Too few subjects
• The research questions are not clear
• The metrics do not match with the RQs
• Missing threats to validity
• Wrong statistical tests
• Ethical approval missing
• The source code is not available
• No replication package
• Won’t generalize
• Too small improvement over SotA
• …
Jim, the reviewer
19. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – the written paper
@2024 Fabiano Dalpiaz
19
This claim is
factually wrong
The sentence is
ambiguous
Jim, the reviewer
20. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – peer reviewing / publication
@2024 Fabiano Dalpiaz
20
Renown
authors =
good?
Jim, the reviewer
21. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – peer reviewing / publication
@2024 Fabiano Dalpiaz
21
Prestigious
venue = good?
Never heard of
this journal = bad?
Jim, the reader
22. Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
Threats to credibility – literature
@2024 Fabiano Dalpiaz
22
We propose tool Z that can be used to
classify requirements automatically,
distinguishing functional from quality
requirements.
[…]
Dalpiaz et al. [22] showed that their ML-
based approach has accuracy of 95%.
[…]
The performance of Z is superior to
that of Dalpiaz et al. [22].
I can’t find a
link to tool Z…
On which
dataset was the
95% accuracy
obtained?
What does it
mean for Z to
be superior?
Jim, the reader
24. Credibility in research: open science badges
@2024 Fabiano Dalpiaz
24
Artifacts evaluated - functional
“Work as intended”
https://www.acm.org/publications/policies/artifact-review-and-badging-current
Artifacts evaluated - reusable
Functional + very carefully
documented + well structured
Artifacts available
Publicly accessible in a an archival
repository (with DOI)
Results reproduced
Another team obtained the same
results with the artifacts provided
by the original authors
Results replicated
Another team obtained the same
results without the author-supplied
artifacts
25. Problem solved? How about LLMs being USED in the
research cycle?
@2024 Fabiano Dalpiaz
25
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
26. LLMs are already been used! (a few examples)
@2024 Fabiano Dalpiaz
26
Literature review generator: jenni.ai Originality checker: originality.ai
Writing assistant: quillbot.com
The one-size-fits-all ChatGPT
Code generation: copilot
27. Will the use of LLMs affect research CREDIBILITY?
@2024 Fabiano Dalpiaz
27
Research idea Conceptual framework Artifact construction Validation / evaluation
Paper writing
Peer review
Publication
Literature
28. Will the use of LLMs affect research CREDIBILITY?
@2024 Fabiano Dalpiaz
28
29. Key Message 2: Responsibility as Information Scientists
@2024 Fabiano Dalpiaz
29
• Can be used for
many tasks
• We are using them!
LLMs in IS
Research
• Deliver research that can
be trusted
• Discern credible results
What is
up to us?
30. 3. Deep dive on NLP tools in
Requirements Engineering (NLP4RE)
@2024 Fabiano Dalpiaz
30
31. Background theory: Refinement in RE
@2024 Fabiano Dalpiaz
31
K. Pohl. "The three dimensions of requirements engineering: a framework and its applications." Information Systems 19.3 (1994): 243-258.
Specification
Representation
opaque
fair
complete
common view
informal semi-formal formal
personal view
Initial RE
input
Desired RE output
Agreement
Refinement
path in
practice
RE research,
including NLP4RE Tools
32. How do NLP4RE tools work?
@2024 Fabiano Dalpiaz
32
Processing text is
particularly suitable
for LLMs!!
33. Four categories of NLP4RE tools
@2024 Fabiano Dalpiaz
33
1. Find defects /
deviations from
good practice
2. Generate models
from NL reqs
3. Infer trace links
between NL reqs
and other artifacts
4. Identify key
abstractions
from NL
documents
D..M. Berry, R. Gacitua, P. Sawyer, and S.F.Tjong. "The case for dumb requirements engineering tools." In Proceedings of REFSQ, pp. 211-217. 2012.
34. Tools in NLP4RE (2021-2022, before LLMs)
@2024 Fabiano Dalpiaz
34
L. Zhao,W.Alhoshan, Al. Ferrari, K. J. Letsholo, M.A.A., E-V.. Chioasca, and R.T. Batista-Navarro. Natural Language Processing (NLP) for Requirements Engineering:A Systematic Mapping Study.
ACM Computing Surveys 54:3, 2022
35. Case: F/Q Requirements Classification
@2024 Fabiano Dalpiaz
35
} Seminal classification problem that
aims at identifying NFRs (or Qualities)
} Two classes: Functional and Quality
} Dozens of tools in the literature
} Keyword based, ML & DL classifiers,
zero- and few-shot learning…
36. Automated classification via ML
@2024 Fabiano Dalpiaz
36
Item Labels
Req 1 F
Req 2 F
Req 3 Q
Req 4 Q
Req 5 F, Q
…
Labeled dataset D
1. Builds a model M that
describes the items in D accurately
Item Labels
Req 1 F
Req 2 F
Req 3 Q
Req 4 Q
Req 5 F, Q
…
2. Given an unseen, unlabeled
dataset D’, predicts (accurately)
the labels of the items in D’
Classification
algorithm
Item Predicted Real
Req XX F F
Req XY Q F
Req XZ F, Q F, Q
ReqYZ F Q
Req XYX F F
…
37. An example of classification in NLP4RE
@2024 Fabiano Dalpiaz
37
Feature engineering is key as it
determines which information the classifier
should combine to construct the model
38. Classification with LLMs
@2024 Fabiano Dalpiaz
38
} No feature engineering needed!
} Immediate results via prompting
} Zero-shot learning
} Few-shot learning (a few labelled
examples in the prompt)
} Better results via fine-tuning
} Re-train the LLM with a labelled dataset
} Combines the LLM knowledge with the
domain-specific task
Pre-trained LLM
Domain-specific,
labelled dataset
Fine-tuned LLM
XXL general-
purpose dataset
fine-tuning
39. Credible research?
@2024 Fabiano Dalpiaz
39
Iris, the
req. analyst
I need to find quality
requirements in
3,000+ requirements
from 10 projects…
Will I obtain the same
performance on my
unlabeled data?
This paper does it
automatically with
great results!
41. Evaluating Classifiers in SE Research (ECSER)
@2024 Fabiano Dalpiaz
41
} ECSER focuses on
TreatmentValidation
} Treatment = a classifier
} Two macro phases
} Treatment design is beyond
the scope of ECSER
D. Dell'Anna, F. Basak Aydemir, F.. Dalpiaz: Evaluating classifiers in SE research:The ECSER pipeline and two replication studies. Empirical Software Engineering 28(1): 3 (2023)
42. ECSER’s highlight #1: data and models
@2024 Fabiano Dalpiaz
42
Training
Validation
Test
S5
43. ECSER’s highlight #2: p-fold cross-validation
@2024 Fabiano Dalpiaz
} In SE, data originates from different projects
} p-fold cross-validation extends k-fold cross-validation with per-project splits
(as opposed to random splits)
1. Given a set P of projects, take a subset S⊂P to train a model
2. Test the model on the remaining P S
3. Take another subset S’ of the same size of S
4. Train the model on S’
5. Test the model on P S’
6. …
43
44. ECSER’s highlight #3: the confusion matrix
@2024 Fabiano Dalpiaz
44
} It provides transparency: it allows to derive all metrics and to inspect the results
45. ECSER’s highlight #4: overfitting and degradation
@2024 Fabiano Dalpiaz
45
} Two metrics to analyze performance differences depending on the data splits
training set
test set
validation set
Overfitting =Test –Training
Degradation =Test –Validation
46. ECSER’s highlight #5: statistical tests
@2024 Fabiano Dalpiaz
46
} Which significance test? ➡
} Not only p-value. Also,
effect size! ⬇
47. Credible research?
@2024 Fabiano Dalpiaz
47
Iris, the
req. analyst
I need to find quality
requirements in
3,000+ requirements
from 10 projects…
Will I obtain the same
performance on my
unlabeled data?
This paper does it
automatically with
great results!
Luckily, someone
applied ECSER!
49. S1. Evaluation method and data splitting
@2024 Fabiano Dalpiaz
49
} Most of the literature uses PROMISE NFR
} 625 requirements that pertain to 15 student projects
} Generally, the studies only perform validation, no testing
} Our choices
} Three algorithms (see previous slide)
} No hyper-parameter tuning (validation, S3-S4)
} Two binary classifiers: isFunctional and isQuality
Training
Validation
Test
50. S2 & S5. Training and testing the model
@2024 Fabiano Dalpiaz
50
} Training is performed on PROMISE NFR
} Testing is performed on the remaining datasets
} Test on Dronology, then test on DUAP, …
} Calculate arithmetic mean
51. S6. Reporting the confusion matrix
@2024 Fabiano Dalpiaz
51
} This is simply a presentation of the raw results…
} But some aspects already stand out!
52. S7-S8. Performance and overfitting
@2024 Fabiano Dalpiaz
52
} For simplicity, let’s examine F1 here
km500 fits best the
training set norbert has the best
performance on the
test set
ling17 has the
smallest overfitting
53. S9. ROC Plot (for isFunctional)
@2024 Fabiano Dalpiaz
53
norbert is the best
for most projects
ling17 tends to lead to
more false positives
km500 tends to
lead to more false
negatives
54. S10. Statistical tests
@2024 Fabiano Dalpiaz
54
} Is one of these classifiers significantly better?
} The results are mixed
} Yes, for km500 vs. norbert in the isFunctional case
} Almost never for isQuality
55. Results from the first application of ECSER
@2024 Fabiano Dalpiaz
55
} We confirm that norbert outperforms both ling17 and km500 on unseen data
} But not in a statistical sense (small sample size?)
} The “losers” still have good properties:
} ling17 has the smallest overfitting
} km500 fits best the training data
56. Credible research? Under certain assumptions
@2024 Fabiano Dalpiaz
56
F. Dalpiaz, D. Dell'Anna , F.B.Aydemir, S. Çevikol: Requirements Classification with Interpretable Machine Learning and Dependency Parsing. RE 2019: 142-152
Iris, the
req. analyst
Will I obtain the same
performance on my
unlabeled data?
Only if my
data resembles
Promise!
57. Key Message 3: Assess your results properly!
@2024 Fabiano Dalpiaz
57
• Provides guidelines for
evaluating classifiers
• Is a step-by-step tool
The
ECSER
pipeline
• Confirms some results
• Clarifies and confutes
others
ECSER’s
application
60. LLM-Assisted RE: A Vision
@2024 Fabiano Dalpiaz
60
RE version 1.1
} Non-disruptive improvements in all
activities where currently some
automation takes place
} Classification
} Model derivation
} Defect identification
} Traceability
RE version 2.0
} Key focus on elicitation
} Breakthrough: automated analysis of
conversations
} RE is mainly a human-centered activity
62. Elicitation: the root of (all) NL requirements
@2024 Fabiano Dalpiaz
62
Requirements
conversations
Requirements Analyst
Own ideas
Budget / project
constraints
Design
decisions Domain-specific
documentation
Elicitation
Specification
63. Timeliness: why researching conversations now?
@2024 Fabiano Dalpiaz
63
Increased remote work
and collaboration
Automated
transcription
64. (Requirements) conversations vs. specifications
@2024 Fabiano Dalpiaz
64
2+ parties (here Analyst
and Stakeholder)
Informal: no “shall”
statements, user
stories, glossary
Relevant
information may
be sparse
Includes persuasion,
uncertainty,
misunderstandings
65. The many layers of (requirements) conversations
@2024 Fabiano Dalpiaz
65
Turns and utterance units as
atomic entities
Cross-speaker interaction
defines the meaning
Traum, David R., and Elizabeth A. Hinkelman. "Conversation acts in task-oriented spoken dialogue." Computational intelligence 8.3 (1992): 575-599.
The purpose of a
conversation across
multiple turns
66. Tools for Conversational RE: Two Examples
@2024 Fabiano Dalpiaz
66
Tjerk Spijkman, Fabiano Dalpiaz, and Sjaak Brinkkemper “Back to the
Roots: Linking User Stories to Requirements Elicitation Conversations”
Proceedings of the RE 2022
Tjerk Spijkman, Xavier de Bondt, Fabiano Dalpiaz, and Sjaak
Brinkkemper “Summarization of Elicitation Conversations to Locate
Requirements-Relevant Information” Proceedings of REFSQ 2023
67. Trace2Conv: Key Idea
@2024 Fabiano Dalpiaz
67
Requirements
conversations
Requirements Analyst
Own ideas
Budget / project
constraints
Design
decisions Domain-specific
documentation } Supports backward, pre-RS traceability
} Largely overlooked area of research
} Aims to find information that provides
additional context to a requirement
Specification
Trace2Conv
68. Trace2Conv pre-LLMs
@2024 Fabiano Dalpiaz
68
As a vendor user, I can use the password forgotten
functionality whenever I forgot or want to reset my
password, so that I always have a way to create a new
password
70. Trace2Conv with LLMs
Expectations
} Complex pre-processing will be unnecessary
} Simple prompts will be able to match
requirements to speaker turns well
} Limitations
} Number of tokens limit
@2024 Fabiano Dalpiaz
70
71. } Trigger: long recorded conversations, spanning over multiple hours
} Can we facilitate the analyst in exploring the transcript by summarizing it?
Summarizing a transcript: ReConSum
Step #1: Identify
the questions
Step #2: Filter by
question relevance
Step #3: Label by
relevance type
@2024 Fabiano Dalpiaz
71
72. How to identify the questions? (Step #1)
Based on sequences of POS tags:
Wh-, yes/no, tag questions
Based on pre-trained DistilBert
(deep learning)
Combination: question if either
approach says so
@2024 Fabiano Dalpiaz
72
73. How to filter relevant questions? (Step #2)
TF-IDF can be used to rank questions
with domain-specific words
@2024 Fabiano Dalpiaz
73
74. Do our steps #1 and #2 work? (pre-LLM)
Step #1: Question identification
- Deep learning gives the best results
- Even better when combining the approaches
Step #2: Relevance detection:
- The combined pipeline achieves a F1-score around 67%
- [back to ECSER] error propagation from idea #1
We expect LLMs to improve the results, but this should be assessed rigorously (see ECSER)
Approach Precision Recall F1-Score
Speech Acts (DL) 81.8% 91.7% 86.5%
Part of Speech tags 69.7% 77.4% 73.4%
Combination 76.8% 95.8% 85.3%
Approach Precision Recall F1-Score
Speech Acts (DL) 64.4% 70.3% 67.2%
Part of Speech tags 53.8% 62.4% 57.8%
Combination 55.7% 81.7% 65.7%
@2024 Fabiano Dalpiaz
74
75. Ongoing tool: distilling domain models
ChatGPT 4.0 prompts
- Guidelines from Blaha and Rumbaugh
- combine transcripts with its own knowledge
@2024 Fabiano Dalpiaz
75
76. @2024 Fabiano Dalpiaz
Key challenge ahead in Conversational RE?
Lack of metrics and gold standards!
76
77. Key Message 4: New avenues unlocked, but…
• Opens new avenues for the
RE discipline
• LLMs will be an enabler
Coversati
onal RE
• No gold standards
• Unknown metrics
• Rigor is necessary!
What are
the perils?
@2024 Fabiano Dalpiaz
77
79. Take-home messages
@2024 Fabiano Dalpiaz
79
Large
language
models
are here and can
do science fiction
stuff
are changing our
job as
researchers
need rigorous
reporting (ECSER
as an example)
unlock uncharted
territories (e.g.,
conversational
RE)
80. Take-home messages
@2024 Fabiano Dalpiaz
79
Large
language
models
are here and can
do science fiction
stuff
are changing our
job as
researchers
need rigorous
reporting (ECSER
as an example)
unlock uncharted
territories (e.g.,
conversational
RE)
81. Take-home messages
@2024 Fabiano Dalpiaz
79
Large
language
models
are here and can
do science fiction
stuff
are changing our
job as
researchers
need rigorous
reporting (ECSER
as an example)
unlock uncharted
territories (e.g.,
conversational
RE)
82. Take-home messages
@2024 Fabiano Dalpiaz
79
Large
language
models
are here and can
do science fiction
stuff
are changing our
job as
researchers
need rigorous
reporting (ECSER
as an example)
unlock uncharted
territories (e.g.,
conversational
RE)
83. Thank you for listening! Questions?
f.dalpiaz@uu.nl @FabianoDalpiaz fabianodalpiaz
Special credits to
- F. Başak Aydemir
- Davide Dell’Anna
- Xavier de Bondt
- Tjerk Spijkman
- Sjaak Brinkkemper
Large
language
models
are here and can
do science fiction
stuff
are changing our
job as
researchers
need rigorous
reporting (ECSER
as an example)
unlock uncharted
territories (e.g.,
conversational
RE)