SlideShare a Scribd company logo
1 of 103
Download to read offline
Natural Language
Requirements Processing:
from Research to Practice
Alessio Ferrari 

CNR-ISTI, Pisa, Italy

alessio.ferrari@isti.cnr.it

http://alessiofer.wixsite.com/alessioferrari 

Twitter: @alessferra
Objectives
• Stimulate your curiosity
• Show that some things can be really easy to do at home

• Show that some things can easily become very complicated 

(don’t do that at home!)

• For practitioners and researchers

• Some parts are tutorial-like

• No, this is not about deep learning
NLP and Requirements Engineering
Natural Language
Processing (NLP)
Technologies enabling extraction and manipulation of information
from natural language (NL) - English, Italian, Swedish, etc.
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$officials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$effecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
From the slides of
D. Jurafsky and C. Manning, 2012
Natural Language
Processing (NLP)
Technologies enabling extraction and manipulation of information
from natural language (NL) - English, Italian, Swedish, etc.
From my slides, 2018
(a couple of weeks ago)
Part-of-Speech Tagging
VERB, NOUN, ADJECTIVE
Word-sense Disambiguation
Machine Translation
Information Extraction
Dialogue
Parsing
Sentiment Analysis
Coreference Resolution
Spam Detection
Question-Answering
Mostly Solved Making Good Progress
Paraphrase
Named Entity Recognition
PERSON, LOCATION
Summarization
What Happened?
Large Amount of Data Computational Power (GPU)
Deep Neural Networks Competitions (Shared Tasks)
Rule-based vs
Machine Learning NLP
if (good or fantastic) in sent then sent.sentiment = positive
else if (bad or terrible) in sent then sent.sentiment = negative
else sent.sentiment = neutral
Rule-based
we had good food = positive
terrible experience = negative
dirty place = negative
Supervised Machine Learning
we had good food
nice meal
terrible experience
dirty place
Unsupervised Machine Learning
terrible experience
dirty place
we had good food
nice meal
I have to teach the computer
how to “understand” the text
NL Requirement
• Jackson and Zave: Condition over phenomena of the environment that we want to make
true by developing the system
• Lamsweerde: Goal under the responsibility of a single agent of the software-to-be
• ISO/IEC/IEEE 29148 Standard: Statement which translates or expresses a need and its
associated constraints and conditions
• Wikipedia: Singular documented physical or functional Need that a particular design,
product or process aims to satisfy

• No agreed INTENSIONAL definition

• Some confusion on the types of requirements (e.g., user, system, software, business,
functional, non-functional), the concept of specification, etc.

• So, let us give some EXAMPLES, and give an EXTENSIONAL definition
NL Requirement
As a user, I want to share pictures, so that my friends will see them
If track data at least to the location where the relevant MA ends are
not available on-board, the MA shall be rejected
The voucher numbers are system
generated and created with unique
identification numbers with security
protocols in-built. The created unique
numbers are then printed out in the form
of bar-codes, which will complement (or
stuck on the voucher) the voucher. […]
User Story
One Sentence - High
Unstructured
When MA_received = FALSE and T_speed > 0 and MA_time > 15, then T_brake = 1
One Sentence - Low
Actor Student
Success Scenario 1. Student selects “List”
2. System displays available courses
3. Student selects one of the courses
Structured - Use Case
NL Requirement
It would be nice to have a way to search
my previous messages by keyword
User’s Feedback
Application does not create a new item when clicking the
SAVE button while creating a new item. Steps to
reproduce:
1) Login into the application
2) Pressed button New Item
3) Filled the information for the new item
4) Clicked on Save button
5) Seen an error page “ADA121 Exception: value error”
Bug Report
NL Requirement
• In this talk, a NL requirement is generally a chunk of text
in a requirements document

• A requirements document contains information to be
used for the development of a system

• Except in some cases, we do not deal with users’
feedback or bug reports
Why are NL Requirements
so Special?
• Let us compare a NL Requirements Corpus (PURE, ~80
documents) with a generic corpus (Brown)
Token: the, user, sets, the, input, parameters
Lexical Word: user, sets, input, parameters
Most Frequent Words
• Requirements use a more restricted vocabulary (about a half of
generic texts in Brown)

• Requirements have longer sentences

• Requirements use a computer science terminology that is
common to different documents (system, data)

• Requirements use domain-specific expressions (NPAC, TCS, etc.) 

• 62% of the lexical words used in PURE do not
appear in Brown
This suggests that NLP tools trained on
generic texts may need to be tailored
for requirements
NL Requirements Tasks
DEFECT DETECTION
CATEGORISATION
TRACING
EQUIVALENT REQUIREMENTS
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
REGULATORY COMPLIANCE
USERS' FEEDBACK
ANALYSIS
Categorisation
Large requirements set
user interface
communication
security
usability
availability
braking
speed control
flight balance
functional
categories
non-functional
categories
fine-grained topics
Apportionment
Retrieval
Retrieval
Procurement
Documents
Existing
Requirements
Bid
Tailored
Product
Customer’s
Requirements
Tracing
High-level
Requirements
Low-level
Requirements
Design
Code
Architecture
Process Artefacts
Refactoring
and Impact Analysis
External
Assessment
Equivalent Requirements
Large requirements set
Equivalent
Requirements
Requirements
Analyst
Defect Detection
ambiguous
vague
weak verb
passive form
Requirements
Analyst
Glossary Extraction
train
automatic train protection
automatic train supervision
track circuitbalise
Domain-specific
Relevant Terms
Requirements
Document
Glossary
Categorisation
Model Generation
Model Synthesis
Early Requirements /
User Stories
train
track circuit
High-level
Model
Detailed Requirements
Problem
Scoping
Analysis
Detailed Model
(also Feature Model)
Documentation
Visual models provide
a more comprehensive view
on requirements
Regulatory Compliance
Regulations
Requirements
Rights
Obligations
Actors
Actions
Resources
Constraints
Privacy Policy Regulations
Sensitive Data
Ambiguity
Users’ Feedback Analysis
It would be nice to have a way to search
my previous messages by keyword
Large amount of
User’s Feedback
This app is amazing
When I press back, it crashes
Requirement
It would be nice to have a way to search
my previous messages by keyword
This app is amazing
Opinion
When I press back, it crashes
Bug
Refactoring
Update
Observations
• Most of RE problems could be solved top-down
• I can enforce tracing when writing requirements

• I can use constrained natural languages to improve quality

• I can tag classes in advance

• I can write a glossary in advance

• Unfortunately, this does not happen, that’s why we need NLP

• We need NLP also to recover from errors when RE problems are
addressed top-down by fallible humans
Where are we Today?
DEFECT DETECTION
CATEGORISATION
TRACING
EQUIVALENT REQUIREMENTS
MODEL SYNTHESIS
=
GLOSSARY EXTRACTION
RETRIEVAL
REGULATORY COMPLIANCE
USERS' FEEDBACK
ANALYSIS
Mostly Solved Making Good Progress Still Very Hard
Basic Support Sub-Tasks
• There is no time to explore all possible tasks

• However, there are basic sub-tasks that are useful for
most of the tasks

• Information extraction: extraction of relevant parts 

of the text

• Similarity computation: estimating relatedness
Information Extraction
DEFECT DETECTION
CATEGORISATION
TRACING
EQUIVALENT REQUIREMENTS
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
REGULATORY COMPLIANCE
USERS' FEEDBACK
ANALYSIS
Similarity Computation
DEFECT DETECTION
CATEGORISATION
TRACING
EQUIVALENT REQUIREMENTS
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
REGULATORY COMPLIANCE
USERS' FEEDBACK
ANALYSIS
Information Extraction
Information Extraction
with GATE
• Information Retrieval (IR): pulls documents from large corpora 

• Information Extraction (IE): retrieves structured information from large corpora

• IR returns documents containing the relevant information (normally fast)

• IE returns precise and structured information (can be slow)

• GATE (General Architecture for Text Engineering, https://gate.ac.uk) supports IE

• Potential Usage
• Entity, Events, Relation extraction

• Annotate documents for machine learning

• Ambiguity detection in requirements with rule-based approaches
Definitions
• Document = text + annotations + features 

• Corpus = collection of documents

• Linguistic information in documents is encoded in the form of
annotations (like coloured mark-ups)

• Annotations have features with relative types and values

• EXAMPLE
• Annotation: Sentence Length

• Feature 1: Length in Characters, Value 1 = 100
• Feature 2: Length in Tokens, Value 2 = 15
Processing Resources (PR)
• PR are algorithms that make NLP easy

• ANNIE English Tokeniser - identify tokens (words, numbers, etc.)

• ANNIE Sentence Splitter - identify sentence boundaries

• ANNIE Gazetteer - identify specific tokens in a list

• ANNIE POS Tagger - identify part-of-speech (POS), like name,
adjective, etc.

• JAPE Transducers - user-defined annotations based on regular
expressions over annotations

• ANNIE collects all the algorithms and run them in a PIPELINE
Pipeline
Text
Tokenizer
Sentence
Splitter
Gazetteer
POS
Tagger
Jape
Transducer
Document
ANNIE English Tokenizer
Produces Token annotations
ANNIE Sentence Splitter
Produces Sentence annotations
Gazetteer
Gazetteer Produces Lookup annotations
ANNIE POS Tagger
Modifies Token annotations with a new feature:
category = NN, JJ, VB, etc.
JAPE Transducers
• Gazetteer lists are designed for annotating simple, regular features

• Even identifying simple patterns like e-mails is impossible with a
Gazetteer

• What is JAPE

• JAPE provides pattern matching in GATE 

• Each JAPE rule consists of:

• LHS which contains patterns to match

• RHS which details the annotations (and optionally features) to be
created
• I want to find all the occurrences of the term “level”
followed by a number (level 1, level 2, etc.)
ERTMS level 2 shall be backward compatible
with ERTMS level 1
• Adding features and values
ERTMS level 2 shall be backward compatible
with ERTMS level 1
Level {number = 2}
Annotation Feature Value
Level {number = 1}
Defect Detection as an Information
Extraction Problem
Redundant
Implementation-dependent Inconsistent
Incomplete Multiple
Unfeasible
Non-traceable
Non-verifiable
Requirement
Ambiguous
Ambiguity in RE (from Berry, Kamsties and Krieger, 2003)
• Property of an expression of being interpreted in multiple ways 

• Vagueness: the sentence admits borderline cases 

(e.g., Avoid long C functions)

• Generality: the sentence/term needs to be specified more 

(e.g., The interface shall be coded in Java)

• Lexical ambiguity: term has different unrelated vocabulary meanings
(e.g., bank)

• Syntactic ambiguity: sentence has more than one syntax tree 

(e.g., Structured approaches and tools)

• Semantic ambiguity: sentence can be translated into more than one
logic expression (e.g., All lights have a switch)

• Pragmatic ambiguity: the meaning depends on the context – other
sentences, domain knowledge, common-sense, viewpoint
Berry, D., Kamsties, E. and M. Krieger. From Contract Drafting to Software Specification: 

Linguistic Sources of Ambiguity. University of Waterloo. 2003
Vagueness
• Vagueness may occur when I have adjectives and (modal)
adverbs (so I can use a POS Tagger, but it gives many
false positive cases)

• I can use lists of pre-defined vague terms, and include
them in a Gazetteer (still, a lot of false positives)

• This is the “dumb” approach, often recommended for
defect detection: it is easy to discard false positives
CTRL + F
adaptability, additionally, adequate, aggregate, also, ancillary, arbitrary, appropriate, as
appropriate, available, as far as, at last, as few as possible, as little as possible, as many
as possible, as much as possible, as required, as well as, bad, both, but, but also, but
not limited to, capable of, capable to, capability of, capability, common, correctly,
consistent, contemporary, convenient, credible, custom, customary, default, definable,
easily, easy, effective, efficient, episodic, equitable, equitably, eventually, exist, exists,
expeditiously, fast, fair, fairly, finally, frequently, full, general, generic, good, high-level,
impartially, infrequently, insignificant, intermediate, interactive, in terms of, less,
lightweight, logical, low-level, maximum, minimum, more, mutually-agreed,
mutually-exclusive, mutually- inclusive, near, necessary, neutral, not only, only, on the
fly, particular, physical, powerful, practical, prompt, provided, quickly, random, recent,
regardless of, relevant, respective, robust, routine, sufficiently, sequential, significant,
simple, specific, strong, there, there is, transient, transparent, timely, undefinable,
understandable, unless, unnecessary, useful, various, varying
List of Vague Terms (from Tjong and Berry’s SREE)
If the logical AND between the two input sensors is 1…
The system shall implement a logical sequence of steps for…
Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity finder for requirements specifications and lessons learned. 

In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg.
Syntactic Ambiguity
(Coordination)
• Detect all sentences that include a potential coordination
ambiguity. Whenever I have ambiguity due to and/or.

• Example: The system shall produce the speed profile
plot or the data-log and the legend.

• The system shall produce [the speed profile plot or
the data-log] and the legend.

• The system shall produce the speed profile plot or
[the data-log and the legend].
Coordination Ambiguities - Jape Rule 0
Retrieve all occurrences of And / Or
Phase: MatchAndOr
Input: Token
Options: control = appelt
//Operator ==~ returns only whole string matches
//Operator (?i) tells that the matching is case insensitive
//Operator | is the logical "or" operator
Rule: checkAndOr
Priority: 1
(
{Token.string ==~ "(?i)and"} |
{Token.string ==~ "(?i)or"}
):coord
-->
:coord.AndOr = {}
A. Ferrari (ISTI) Requirements Engineering 36 / 60
Retrieve all occurrences of
AND / OR
Jape Rule 1
The system shall produce the speed profile plot or the data-log and the legend
The system shall produce a sound alarm and a visual alarm
Coordination Ambiguities - Jape Rule 1
Annotate all the sequences of And/Or in the same sentence
Phase: MatchCoordinationSequences
Input: Split AndOr
Options: control = appelt
//Note that, having Split among the input Annotations allows
//us to identify sequences of And/Or in the same sentence
//The ’+’ operator means "one or more occurrences"
//The ’*’ operator means "zero or more occurrences"
//The ’?’ operator means "zero or one occurrences"
Rule: checkCoordinationSequences
Priority: 1
(
{AndOr}
({AndOr})+
):coordSequence
-->
:coordSequence.AndOrSequence = {}
A. Ferrari (ISTI) Requirements Engineering 37 / 60
Jape Rule 2
Annotate AND / OR in the
same sentence
The system shall produce a sound alarm and a visual alarm
The system shall produce the speed profile plot or the data-log and the legend
Jape Rule 3Annotate all the sentences with And/Or sequences
Phase: MatchCoordinationSentences
Input: Sentence AndOrSequence
Options: control = appelt
//A "contains" B: searches for
//annotations A that contain annotations B
Rule: checkCoordinationSentences
Priority: 1
(
{Sentence contains AndOrSequence}
):coordSentence
-->
:coordSentence.AndOrSentence = {}
A. Ferrari (ISTI) Requirements Engineering 38
The system shall produce the speed profile plot or the data-log and the legend
The system shall produce a sound alarm and a visual alarm
Annotate sentence with
sequences of AND / OR
The system shall produce the plot or the data-log, and the legend
Commas matter!
Your Own, Personal
Requirements Assistant
Experience: a Railway Company
requirement fragments (i.e., contiguous sequences of tokens in the requirement)
that match the pattern. In Table 2 we report the patterns in a compact version.
The JAPE implementation of the patterns, together with the discard-patterns
that will be introduced in Sect. 3.3, is available in our public repository1
.
Below, we describe the defect classes addressed by each pattern.
Table 2: Pattern adopted for each defect class.
Defect Class Pattern
Anaphoric ambiguity
PANA = (NP)(NP)+
(Split)[0,1]
(Token.POS == PP | Token.POS =⇠ PR*)
Coordination
ambiguity
PCO1
= ((Token)+ (Token.string == AND | OR)) [2]
PCO2
= (Token.POS == JJ) (Token.POS == NN | NNS)
(Token.string == AND | OR) (Token.POS == NN | NNS)
Vague terms PV AG = (Token.string 2 Vague)
Modal adverbs
PADV = (Token.POS == RB | RBR),
(Token.string =⇠ ”[.]*ly$”)
Passive voice
PP V = (AUXVERB)(NOT)?(Token.POS == RB | RBR)?
(Token.POS ==VBN)
Excessive length PLEN = Sentence.len > 60
Missing condition
PMC = (IF)(Token, !Token.kind == punctuation)*
(Token.kind == punctuation)(!(ELSE | OTHERWISE))
Missing unit
of measurement
PMU1
= (NUMBER)((Token)[0, 1](NUMBER))?(!MEASUREMENT)
PMU2
= (NUMBER)((Token)[0, 1](NUMBER))?(!PERCENT)
Missing reference
PMR = (Token.string == “Ref”)(Token.string == “.”)
(SpaceToken)?(NUMBER)
Undefined term PUT = (Token.kind == word, Token.orth == mixedCaps)
1 https://github.com/ISTI-FMT/QUARS_plus_plus
Domain
Experts
11
Table 4: Discard patterns.
Defect Class Discard Pattern
Anaphoric ambiguity
DANA = ((Token.POS == PP | Token.POS = PR*)
within IT SHALL BE POSSIBLE)
Vague terms
DV AG1
= (PV AG, Token.string ==⇠ “(?i)sound” | “(?i)light”,
Token.POS == NN | NNS)
DV AG2
= (PV AG within IT SHALL BE POSSIBLE)
DV AG3
= (PV AG within StophP hrasesV ague)
Modal adverbs
DADV1
= (Token.string ==⇠ “(?i)manually” | “(?i)automatically”)
DADV2
= (PADV within INFORMATION PURPOSES ONLY)
Undefined term DUT = (PUT contains KnownAcronym)
3.4 SREE Patterns
The tool SREE (Tjong and Berry, 2013) is a defect detection tool for NL re-
quirements that is oriented to achieve 100% recall for the defects in its scope,
Defect-detection patterns
Discard patterns
A lot of false
positives
1800
requirements
Adaptation to the language of the company
Essential to involve domain experts
Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018).
Detecting requirements defects with NLP patterns: an industrial experience
in the railway domain. Empirical Software Engineering, 1-50.
Glossary Extraction
Domain-specific Terms
• Requirements typically include domain-specific terms, and
sometimes project specific ones (may be easier to extract thanks to
conventions, m_balise_group)

• Domain-specific terms may be single or multi-word
train
automatic train protection
automatic train supervision
track circuitbalise
abdominal hysterectomy
abdomen lymph nodes
continuous passive motion machine
administrative law judge
affirmative defense
just compensation
trial
Term Extraction
• We evaluate how much a word is independent from other
words

• If a word always occurs with different words, it is likely to
be an independent term
• Example: The automatic train supervision platform 

dispatches the vehicles, while the system for 

automatic train protection brakes the vehicle 

in case of danger.
Term Extraction
• If a word often occurs with the same words, it is likely to
be part of a multi-word term
• Example: The automatic train supervision platform 

dispatches the vehicles, while the system for 

automatic train protection brakes the vehicle 

in case of danger.
C/NC-Value
Linguistic Analysis
POS Tagging Filters Stoplist
great, numerous,
several, year…
Noun, Adj
Candidate
Strings
C-Value
automatic train
protection,
track…
RANKED
Candidate
Strings
NC-Value
computes
termhood
considers term-context words
RE-RANKED
Candidate
Strings
Contrastive Analysis
• Extracted terms might be domain-generic or domain-
specific

• With contrastive analysis, terms are further ranked
according to their domain-specificity
Contrastive Analysis
• A contrastive corpus is a set of domain-generic documents (e.g.,
newspapers)

• Terms are extracted from the contrastive corpus

• The terms found in the requirements are compared with the terms of the
contrastive corpus

• If a term is less frequent in the contrastive corpus, it is considered as
a domain-specific term

• If a term is more frequent in the contrastive corpus, it is considered
as a domain-generic term
• A rank is associated to each term according to its domain-specificity
Contrastive
Analysis
Contrastive Analysis
Contrastive
Corpus
Requirements
C-NC Value
Ranking
C-NC Value
Ranking
Domain-generic
Terms
Domain-specific term
Domain-generic term
Domain-specific
Terms
Domain-generic
Terms
Domain-specific term
Domain-generic term
Experience: Product LinesStep 3: Commonality Candidates Identification
Automatic Train Protection
Automatic Train Supervision
Interlocking
...
NetTrack
Region ATP
...
CCTV
...
CCTV
...
Smartlock
Region ATP
...
Airlink
...
A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 25 / 36
A Global Feature Diagram (excerpt)
ATP
Onboard
CBTC
ATP IXL ATS
IXL
Controllable
ATP
Wayside
IXL
Pure
ATP
Simple
ATP
IXL
ATS
Router
ATS
Simple
ATP
Controller
A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 12 / 36
Ferrari, A., Spagnolo, G. O., & Dell'Orletta, F. (2013). Mining commonalities and variabilities from natural language documents.
In Proceedings of the 17th International Software Product Line Conference (pp. 116-120). ACM.
Nasr, S. B., Bécan, G., Acher, M., Ferreira Filho, J. B., Sannier, N., Baudry, B., & Davril, J. M. (2017). Automated extraction of product comparison matrices from informal
product descriptions. Journal of Systems and Software, 124, 82-103.
• Term Extraction: TerMine, http://www.nactem.ac.uk/software/
termine/ 

• Contrastive Analysis: Text2Knowledge, http://www.italianlp.it/demo/
t2k-text-to-knowledge/ 

• Further Readings:
• Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. "Automatic
recognition of multi-word terms: the C-value/NC-value method."
International journal on digital libraries 3.2: 115-130, 2000.

• Bonin, Francesca, et al. "A contrastive approach to multi-word
term extraction from domain corpora." Proceedings of the 7th
International Conference on Language Resources and Evaluation.
2010.
Similarity Computation
Many Reasons for Similarity
The system shall support user authentication
Before accessing the system, the user
shall be authenticated
Equivalence
Authentication shall be performed by means of iris recognition
Refinement
If the authentication through iris recognition fails,
the system shall ask the user login information
Relatedness
If the authentication through iris recognition fails,
the system shall authenticate the user through fingerprint recognition
Inconsistency
Lexical and Semantic
Similarity
The system shall support user authentication
The student shall enter login and password
Semantically
Similar
Lexically
Different
Vector Representation
student system support user authentication enter login password
0 1 1 1 1 0 0 0
1 0 0 0 0 1 1 1
The system shall support user authentication
The student shall enter login and password
• Each sentence is represented as a vector of numbers

• Each component of the vector is a term in the complete set of terms

• The component is 1 if the term occurs in the requirement, 0 otherwise 

(other weighting schema can be used, e.g., TF/IDF, to emphasise rare terms)
Similarity Metrics
Dice +2 X
Jaccard + -
Cosine X
The system shall support 

user authentication
User authentication shall be performed 

through fingerprint
cos > 0
Angle between the two
requirements vectors
the cosine is greater than zero when
the angle is lower than 90 degrees
cos = 0
The system shall support 

user authentication
Response time is 100 ms
the cosine is zero when
the vectors are orthogonal
Vectors have one component
for each word in the vocabulary
student system support user authentication enter login password
0 1 1 1 1 0 0 0
1 0 0 0 0 1 1 1
The system shall support user authentication
The student shall enter login and password
0
Word Embeddings
• I want to enrich the semantic representation of words

• Avoid the problem of lexical similarity = 0

• I want compact vector representations (avoid sparse vectors)

• In 2013 by Mikolov et al. introduced Skip-gram with negative
sampling (SGNS), the most common word embeddings algorithm
• Implemented in the package word2vec
• Enhance similarity computation, but it is useful for any task in
which I want to represent the semantics of words
Word Embeddings: Idea
• For a human, the meaning of a term is given by the
mental (experiential) context of that term

• But a human has many senses to create meaning
DOG
Word Embeddings: Idea
• To let a system associate meaning to a term, I can consider
only the textual context

• Distributional hypothesis (Harris, 1954): the meaning of a
word is given by the company it keeps (in a set of documents)
The dog is a man’s best friend …
… then I went to walk with the dog
my dog does not bite, but…
…the dog barked too much
Documents Meaning
dog
friend bark
walkbite
Word Embeddings: Details
• To produce the word embedding vectors, a fake task is performed based on the
input text, and the word embeddings are produced as a by-product of the task
Source Text
{requirements, are}
{requirements, conditions}
{are, requirements}
{are, conditions}
{are, over}
Word Pairs
(Training Samples)
window size = 2 (context considered)
{conditions, are}
{…}
• A neural network (NN) is trained for the fake task, and the
word embeddings are the hidden layer of the trained NN
Word Embeddings: Details
• Given the word pairs such as
{requirements, are}, requirements
is the input, and the expected output is are

• The weight in the hidden layer of the neural
network are incrementally adjusted
• At the end of the training, given a word, 

the output vector is a probability
distribution

• The word embeddings are the hidden
layer of the NN
0
0
0
1
0
0
0
0
vector len = |V|
(vocabulary size)
requirements
p1
p2
pV
…
p4
p5
p3
…
vector len = |V|
probability that the 

context word is “are”
probability that the 

context word is 

“conditions”
e1 e2 e3 … … eL
vector len = L
(chosen by the user)
Word Embedding
Word Embeddings: Details
• The nice property of word embedding is that vectors of
related words are closer than vector of unrelated words
requirement
constraint
dog
NOTE 1: the components of the vectors
here do not mean anything
NOTE 2: I have a vector for each word and
not for each sentence
Semantic Similarity with
Word Embeddings
• Given a sentence, I can produce the word embedding for
each word
• Word embeddings are vectors, so I can combine the word
embeddings with typical vector operations (e.g, average
vector) to represent requirements

• I can use the previous similarity measures (normally,
cosine similarity)

• More refined measures exist (Word Mover Distance, Word
Centroid Similarity, etc.)
Semantic Similarity
import gensim
req_1 = "The system shall support user authentication"
req_2 = "The student shall enter login and password"
if __name__ == '__main__':
mdl = model = gensim.models.Word2Vec.load_word2vec_format(‘./model/ GoogleNews-vectors-
negative300.bin’, binary=True)
tok_req_1 = nltk.tokenize.word_tokenize(req_1)
vect_req_1 = [mdl[t] for t in tok_req_1 if t in mdl.wv.vocab]
v_req_1 = [sum(e)/len(e) for e in zip(*vect_req_1)]
tok_req_2 = nltk.tokenize.word_tokenize(req_2)
vect_req_2 = [mdl[t] for t in tok_req_2 if t in mdl.wv.vocab]
v_req_2 = [sum(e)/len(e) for e in zip(*vect_req_2)]
print cosine_similarity(v_req_1, v_req_2)
Domain-specific Word
Embeddings
• Pre-trained word embeddings exist that are trained on
large amount of generic texts

• In requirements, the meaning highly depends on the
domain that I am considering

• Generic texts are different from requirements, so word
embeddings may not represent the actual meaning
intended in the requirements
Domain-specific Meaning
and Relatedness
CODE
source
program
software
CODE
convention
dsm
identifier
MACHINE
computation
instruction
MACHINE spirometer
oximeter
respirator
Wikipedia Crawling
word2vec
Domain-specific
Portals
Each portal includes all the Wiki pages
related to a certain domain
Domain-specific
word embeddings
Domain-specific
Requirements Similarity
req_1 = "The system shall support user authentication"
req_2 = "The student shall enter login and password"
if __name__ == '__main__':
mdl = Word2Vec.load(os.path.join(MODEL_PATH, "Computer_Science_D_2.bin"))
tok_req_1 = nltk.tokenize.word_tokenize(req_1)
vect_req_1 = [mdl[t] for t in tok_req_1 if t in mdl.wv.vocab]
v_req_1 = [sum(e)/len(e) for e in zip(*vect_req_1)]
tok_req_2 = nltk.tokenize.word_tokenize(req_2)
vect_req_2 = [mdl[t] for t in tok_req_2 if t in mdl.wv.vocab]
v_req_2 = [sum(e)/len(e) for e in zip(*vect_req_2)]
print cosine_similarity(v_req_1, v_req_2)
0.53966196
• I can use these domain-specific word embeddings 

to represent my domain-specific requirements
Domain-specific Word
Embeddings
• Let’s look at the neighbouring word vectors in the
different domains
for MODEL_NAME in MODEL_LIST:
mdl = Word2Vec.load(os.path.join(MODEL_PATH, MODEL_NAME))
print MODEL_NAME[:-8], " ", mdl.wv.most_similar("code")
Sports [(u'rule', 0.7007173895835876), (u'regulation'), (u'definition'), (u’guideline’)…
Computer_Science [(u'compiled', 0.6769073605537415), (u'bytecode'), (u'executable'), (u’assembly’…
Medicine [(u'nomenclature', 0.7324844002723694), (u'listing'), (u'taxonomy'), (u’atc’),…
Computer_Science [(u'dbms', 0.7940356135368347), (u'rdbms'), (u'nosql'), (u’relational’)…
print MODEL_NAME[:-8], " ", mdl.wv.most_similar("database")
Literature [(u'internet', 0.9139115810394287), (u'web'), (u'streaming'), (u’librivox’)…
Mechanical_Engineering [(u'documentation', 0.8548465967178345), (u'online'), (u’internet’)…
Experience
Experimental Results – Crawled Documents
Table: Number of pages for each domain.
Domain Pages Words Vocabulary
Computer Science (CS) 10,000 3,985,740 104,907
Electronic Engineering (EEN) 8,568 4,576,917 100,272
Mechanical Engineering (MEN) 7,267 4,459,961 95,466
Medicine (MED) 10,000 5,470,284 150,617
Literature (LIT) 10,000 5,558,470 242,386
Sports (SPO) 10,000 5,725,688 165,814
Technical engineering domains have a more restricted vocabulary
A. Ferrari, et al. (ISTI-CNR) Domain-specific Ambiguities 17 / 26
Compute the potential for ambiguity
between different domains and Computer Science
Crawled Documents
Bring to the
same vector space
Compare domains
Ferrari, A., Donati, B., & Gnesi, S. (2017). Detecting Domain-specific Ambiguities: an NLP Approach based on
Wikipedia Crawling and Word Embeddings. In 2017 IEEE 25th International Requirements Engineering
Conference Workshops (REW) (pp. 393-399). IEEE.
“window” has a similar meaning
in CS and EEN, different in other domains
each line is associated to a domain
each radius is a term
if a point is close to the center, it means the meaning is very different
• GenSim (similarity, and working with embeddings): https://radimrehurek.com/gensim/

• Pre-trained word embeddings:

• from word2vec: https://code.google.com/archive/p/word2vec/ 

• from GloVE: http://nlp.stanford.edu/projects/glove/ 

• from fastText (2018, also multi-lingual): https://fasttext.cc

• Domain-specific word embeddings: https://github.com/alessioferrari/Domain-specific-ambiguity 

• Further Readings:
• Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of
words and phrases and their compositionality. In Advances in neural information processing
systems (pp. 3111-3119).

• Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of
context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp.
238-247).

• http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
Open Issues
(from the NLP4RE Workshop)
Data
Train supervised machine learning algorithms
Validate rule-based and unsupervised algorithms
Generalisation through different domains
Experiment replication
Requirements are confidential
Annotations require domain-knowledge
Case studies are often the best option!
Better Data
Data quality impacts on performance
Linguistic quality (no grammatical errors)
Annotation quality (expert annotators)
No Bias! (annotate in advance)
Need for tools that learn on-the-job
Very hard to involve enough experts
Very hard to make them work without showing them a tool
Bad requirements are realistic!
Some tasks are inherently hard to perform in advance
Validation Metrics and
Workflows
We normally use information retrieval measures for RE tools
RE tasks are often composition of tasks (e.g., model generation)
Errors made by a tool can have different impacts, depending on the context
The context is given by the task, the process, the user
In general, it is safe to avoid false negatives…
Avoiding false negatives leads to false positives
Too many false positives means that the tool does not do its job
Try the tool in the field!
Competitions!
Domain-specificity
Different domains speak different languages
Domain-adaptation is key
Domain-specific resources (ontologies) are needed
Different terms but also different business rules
Need to automate ontology-building
Domain-specific resources require
support from domain experts
Issues of tacit knowledge
Language Issues
Most of the available resources are in English
Most of the NLP research is for English
Requirements are written in different languages
Machine translation can be effective solely for certain tasks
(e.g., similarity)
Don’t forget rule-based techniques!
Human-in-the-loop
Clearly separate human and machine tasks
NLP tools do not replace humans
NLP tools empower domain experts
NLP tools cannot do everything
We need the support of domain experts
to build NLP tools for RE
Process changes when a tool is used
People tend to rely on the tool
Tools can have a learning effect
Players’ Cooperation
RE Researchers
Vendors of
Requirements
Management
Tools
NLP Researchers
Industry (Users)
Support for
hard NLP tasks
RE awareness
Provide pluggable
solutions*
Clarify NLP capabilities, principles
and needs (i.e., expert support)
*NLP technologies are GPL, RE tools are not!
Support for
scoping the discipline
NLP Technologies and
Resources
• Extract information from text: General Architecture for Text Engineering (GATE): https://gate.ac.uk 

• Perform NLP fine-grained analyses: 

• Python Natural Language Toolkit (NLTK): https://www.nltk.org 

• TextBlob (high-level API to NLTK): https://textblob.readthedocs.io/

• GenSim (for similarity): https://radimrehurek.com/gensim/ 

• Stanford CoreNLP (Java): https://stanfordnlp.github.io/CoreNLP/ 

• SpaCy (designed for speed): https://spacy.io 

• Dive into machine learning and deep learning:
• WEKA (user-friendly, several algorithms): https://www.cs.waikato.ac.nz/ml/weka/
• TensorFlow: https://www.tensorflow.org 

• Keras (high-level API to TensorFlow): https://keras.io
Selected Publications
(with trends based on my opinion)


Sultanov, H., & Hayes, J. H. (2013, July). Application of reinforcement learning to requirements engineering: requirements tracing. In Requirements Engineering
Conference (RE), 2013 21st IEEE International (pp. 52-61). IEEE. 

Gervasi, V., & Zowghi, D. (2014). Supporting traceability through affinity mining. In Requirements Engineering Conference (RE), 2014 IEEE 22nd International (pp.
143-152). IEEE.
Borg, M., Runeson, P., & Ardö, A. (2014). Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical
Software Engineering, 19(6), 1565-1616.
Mahmoud, A., & Nan N.. "On the role of semantics in automated requirements tracing." Requirements Engineering 20.3 (2015): 281-300.
Guo, J., Cheng, J., & Cleland-Huang, J. (2017). Semantically enhanced software traceability using deep learning techniques. In Proceedings of the 39th International
Conference on Software Engineering (pp. 3-14). IEEE Press.
Hübner, P., & Paech, B. (2018). Evaluation of Techniques to Detect Wrong Interaction Based Trace Links. In International Working Conference on Requirements
Engineering: Foundation for Software Quality (pp. 75-91). Springer, Cham.
Tracing
Ferrari, A., Dell’Orletta, F., Esuli, A., Gervasi, V., & Gnesi, S. (2017). Natural Language Requirements Processing: A 4D Vision. IEEE Software, 34(6), 28-35.
General Introduction
Casamayor, A., Godoy, D., & Campo, M. (2010). Identification of non-functional requirements in textual specifications: A semi-supervised learning approach.
Information and Software Technology, 52(4), 436-445.
Casamayor, A., Godoy, D., & Campo, M. (2012). Functional grouping of natural language requirements for assistance in architectural software design. Knowledge-
Based Systems, 30, 78-86.
Knauss, E., & Ott, D. (2014). (Semi-) automatic Categorization of Natural Language Requirements. In 

International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 39-54). Springer International Publishing.
Kurtanović, Z., & Maalej, W. (2017). Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning. In Requirements
Engineering Conference (RE), 2017 IEEE 25th International (pp. 490-495). IEEE.
Categorisation
Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity finder for requirements specifications and lessons learned. 

In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg. 

Arora, C., Sabetzadeh, M., Briand, L., & Zimmer, F. (2015). Automated checking of conformance to requirements templates using natural language processing. IEEE
transactions on Software Engineering, 41(10), 944-968. 

Femmer, H., Fernández, D. M., Wagner, S., & Eder, S. (2017). Rapid quality assurance with requirements smells. 

Journal of Systems and Software, 123, 190-213. 

Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018). Detecting requirements defects with NLP patterns: an industrial experience
in the railway domain. Empirical Software Engineering, 1-50.
Defect Detection
Falessi, D., Cantone, G., & Canfora, G. (2013). Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing
techniques. IEEE Transactions on Software Engineering, 39(1), 18-44.
Equivalent Requirements
Goldin, L., & Berry, D. M. (1997). AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. ASE, 4(4), 375-412.
Gacitua, R., Sawyer, P., & Gervasi, V. (2011). Relevance-based abstraction identification: technique and evaluation. Requirements Engineering, 16(3), 251.
Bakar, N. H., Kasirun, Z. M., & Salleh, N. (2015). Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic
literature review. Journal of Systems and Software, 106, 132-149.
Quirchmayr, T., Paech, B., Kohl, R., & Karey, H. (2017). Semi-automatic software feature-relevant information extraction from natural language user manuals. In
International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 255-272). Springer, Cham.
Glossary Extraction
Yue, T., Briand, L. C., & Labiche, Y. (2011). A systematic review of transformation approaches between user requirements and analysis models. Requirements
Engineering, 16(2), 75-99.
Yue, T., Briand, L. C., & Labiche, Y. (2015). aToucan: an automated framework to derive UML analysis models from use case models. ACM Transactions on Software
Engineering and Methodology (TOSEM), 24(3), 13.
Lucassen, G., Robeer, M., Dalpiaz, F., van der Werf, J. M. E., & Brinkkemper, S. (2017). Extracting conceptual models from user stories with Visual Narrator.
Requirements Engineering, 22(3), 339-358.
Model Synthesis
Chen, N., Lin, J., Hoi, S. C., Xiao, X., & Zhang, B. (2014). AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th
International Conference on Software Engineering (pp. 767-778). ACM.
Maalej, W., & Nabil, H. (2015, August). Bug report, feature request, or simply praise? on automatically classifying app reviews. In Requirements Engineering
Conference (RE), 2015 IEEE 23rd International (pp. 116-125). IEEE.
Guzman, E., Alkadhi, R., & Seyff, N. (2016). A needle in a haystack: What do twitter users say about software?. In Requirements Engineering Conference (RE),
2016 IEEE 24th International (pp. 96-105). IEEE.
Maalej, W., Nayebi, M., Johann, T., & Ruhe, G. (2016). Toward data-driven requirements engineering. IEEE Software, 33(1), 48-54.
Martin, W., Sarro, F., Jia, Y., Zhang, Y., & Harman, M. (2017). A survey of app store analysis for software engineering. IEEE transactions on software engineering, 43(9),
817-847.
Groen, E. C., Seyff, N., Ali, R., Dalpiaz, F., Doerr, J., Guzman, E., ... & Stade, M. (2017). The crowd in requirements engineering: The landscape and challenges.
IEEE software, 34(2), 44-52.
Users’ Feedback Analysis
Breaux, T. D., Vail, M. W., & Anton, A. I. (2006). Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In
Requirements Engineering, 14th IEEE International Conference (pp. 49-58). IEEE.
Cleland-Huang, J., Czauderna, A., Gibiec, M., & Emenecker, J. (2010). A machine learning approach for tracing regulatory codes to product specific requirements. In
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 (pp. 155-164). ACM.
Massey, A. K., Rutledge, R. L., Antón, A. I., & Swire, P. P. (2014). Identifying and classifying ambiguity for regulatory requirements. In Requirements Engineering
Conference (RE), 2014 IEEE 22nd International (pp. 83-92). IEEE.
Hosseini, M. B., Breaux, T. D., & Niu, J. (2018). Inferring Ontology Fragments from Semantic Role Typing of Lexical Variants. In International Working Conference on
Requirements Engineering: Foundation for Software Quality (pp. 39-56). Springer, Cham.
Regulatory Compliance
Natt och Dag, J., Gervasi, V., Brinkkemper, S., & Regnell, B. (2004). Speeding up requirements management in a product software company: Linking customer
wishes to product requirements through linguistic engineering. In Requirements Engineering Conference, 2004. Proceedings. 12th IEEE International (pp. 283-294).
IEEE.
Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., & Mirakhorli, M. (2011). On-demand feature recommendations derived from
mining public product descriptions. In Proceedings of the 33rd International Conference on Software Engineering (pp. 181-190). ACM.
Retrieval
Berry, D., Gacitua, R., Sawyer, P., & Tjong, S. F. (2012). The case for dumb requirements engineering tools. In International Working Conference on Requirements
Engineering: Foundation for Software Quality (pp. 211-217). Springer, Berlin, Heidelberg.
Berry, D. M. (2017). Evaluation of Tools for Hairy Requirements and Software Engineering Tasks. In 2017 IEEE 25th International Requirements Engineering
Conference Workshops (REW) (pp. 284-291). IEEE.
Berry, D. M., Cleland-Huang, J., Ferrari, A., Maalej, W., Mylopoulos, J., & Zowghi, D. (2017). Panel: Context-Dependent Evaluation of Tools for NL RE Tasks: Recall
vs. Precision, and Beyond. In Requirements Engineering Conference (RE), 2017 IEEE 25th International (pp. 570-573). IEEE.
Tool Evaluation
Venues
nlp4re'18
Questions

More Related Content

What's hot

Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Niloy Biswas
 
Production system in ai
Production system in aiProduction system in ai
Production system in aisabin kafle
 
Software maintenance Unit5
Software maintenance  Unit5Software maintenance  Unit5
Software maintenance Unit5Mohammad Faizan
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzerahmed51236
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Functional vs Non-functional Requirements - Which comes first?
Functional vs Non-functional Requirements - Which comes first?Functional vs Non-functional Requirements - Which comes first?
Functional vs Non-functional Requirements - Which comes first?Evgeniy Labunskiy
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Daniel Mendez
 
Software requirements and analysis
Software requirements and analysisSoftware requirements and analysis
Software requirements and analysisPhanindra Cherukuri
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Formal Approaches to SQA.pptx
Formal Approaches to SQA.pptxFormal Approaches to SQA.pptx
Formal Approaches to SQA.pptxKarthigaiSelviS3
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
Requirement engineering evaluation
Requirement engineering evaluationRequirement engineering evaluation
Requirement engineering evaluationIshraq Al Fataftah
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentationSai Mohith
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 

What's hot (20)

NLP
NLPNLP
NLP
 
Language models
Language modelsLanguage models
Language models
 
Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)
 
Production system in ai
Production system in aiProduction system in ai
Production system in ai
 
Software maintenance Unit5
Software maintenance  Unit5Software maintenance  Unit5
Software maintenance Unit5
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzer
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Functional vs Non-functional Requirements - Which comes first?
Functional vs Non-functional Requirements - Which comes first?Functional vs Non-functional Requirements - Which comes first?
Functional vs Non-functional Requirements - Which comes first?
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
 
Software requirements and analysis
Software requirements and analysisSoftware requirements and analysis
Software requirements and analysis
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Formal Approaches to SQA.pptx
Formal Approaches to SQA.pptxFormal Approaches to SQA.pptx
Formal Approaches to SQA.pptx
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Requirement engineering evaluation
Requirement engineering evaluationRequirement engineering evaluation
Requirement engineering evaluation
 
Requirements elicitation
Requirements elicitationRequirements elicitation
Requirements elicitation
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 

Similar to Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview

Infosys Interview Questions And Answers 2023
Infosys Interview Questions And Answers 2023Infosys Interview Questions And Answers 2023
Infosys Interview Questions And Answers 2023Simplilearn
 
Formation au logiciel NVivo d'analyse de données qualitatives
Formation au logiciel NVivo d'analyse de données qualitativesFormation au logiciel NVivo d'analyse de données qualitatives
Formation au logiciel NVivo d'analyse de données qualitativesvaléry ridde
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to shareDoron Bar
 
Importance Of Being Driven
Importance Of Being DrivenImportance Of Being Driven
Importance Of Being DrivenAntonio Terreno
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Hady Elsahar
 
Presentation1.update.pptx
Presentation1.update.pptxPresentation1.update.pptx
Presentation1.update.pptxsefefehunegnaw1
 
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docx
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docxDesign Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docx
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docxcarolinef5
 
Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adamAdam Essenmacher
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challengesmark madsen
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Findwise
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16Christian Berg
 
documentation-testing.ppt
documentation-testing.pptdocumentation-testing.ppt
documentation-testing.pptCbhaSlide
 

Similar to Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview (20)

Infosys Interview Questions And Answers 2023
Infosys Interview Questions And Answers 2023Infosys Interview Questions And Answers 2023
Infosys Interview Questions And Answers 2023
 
Formation au logiciel NVivo d'analyse de données qualitatives
Formation au logiciel NVivo d'analyse de données qualitativesFormation au logiciel NVivo d'analyse de données qualitatives
Formation au logiciel NVivo d'analyse de données qualitatives
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to share
 
Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)
 
Importance Of Being Driven
Importance Of Being DrivenImportance Of Being Driven
Importance Of Being Driven
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
Presentation1.update.pptx
Presentation1.update.pptxPresentation1.update.pptx
Presentation1.update.pptx
 
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docx
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docxDesign Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docx
Design Document – Week 1 – ProposalCourse ID IT 491 CAPSTONE .docx
 
Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adam
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challenges
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
SRE Organizational Framework
SRE Organizational FrameworkSRE Organizational Framework
SRE Organizational Framework
 
L16 Documenting Software
L16 Documenting SoftwareL16 Documenting Software
L16 Documenting Software
 
documentation-testing.ppt
documentation-testing.pptdocumentation-testing.ppt
documentation-testing.ppt
 
documentation-testing.ppt
documentation-testing.pptdocumentation-testing.ppt
documentation-testing.ppt
 

More from alessio_ferrari

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studiesalessio_ferrari
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validityalessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1alessio_ferrari
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineeringalessio_ferrari
 

More from alessio_ferrari (7)

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studies
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineering
 

Recently uploaded

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 

Recently uploaded (20)

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview

  • 1. Natural Language Requirements Processing: from Research to Practice Alessio Ferrari CNR-ISTI, Pisa, Italy alessio.ferrari@isti.cnr.it http://alessiofer.wixsite.com/alessioferrari Twitter: @alessferra
  • 2. Objectives • Stimulate your curiosity • Show that some things can be really easy to do at home • Show that some things can easily become very complicated 
 (don’t do that at home!) • For practitioners and researchers • Some parts are tutorial-like • No, this is not about deep learning
  • 3. NLP and Requirements Engineering
  • 4. Natural Language Processing (NLP) Technologies enabling extraction and manipulation of information from natural language (NL) - English, Italian, Swedish, etc. Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ Dan$Jurafsky$ Language(Technology( Coreference$resoluIon$ QuesIon$answering$(QA)$ PartOofOspeech$(POS)$tagging$ Word$sense$disambiguaIon$(WSD)$ Paraphrase$ Named$enIty$recogniIon$(NER)$ Parsing$ SummarizaIon$ InformaIon$extracIon$(IE)$ Machine$translaIon$(MT)$ Dialog$ SenIment$analysis$ $$$ mostly$solved$ making$good$progress$ sIll$really$hard$ Spam$detecIon$ Let’s$go$to$Agra!$ Buy$V1AGRA$…$ ✓ ✗ Colorless$$$green$$$ideas$$$sleep$$$furiously.$ $$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$ Einstein$met$with$UN$officials$in$Princeton$ PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$ You’re$invited$to$our$dinner$ party,$Friday$May$27$at$8:30$ Party$ May$27$ add$ Best$roast$chicken$in$San$Francisco!$ The$waiter$ignored$us$for$20$minutes.$ Carter$told$Mubarak$he$shouldn’t$run$again.$ I$need$new$baWeries$for$my$mouse.$ The$13th$Shanghai$InternaIonal$Film$FesIval…$ 13 … The$Dow$Jones$is$up$ Housing$prices$rose$ Economy$is$ good$ Q.$How$effecIve$is$ibuprofen$in$reducing$ fever$in$paIents$with$acute$febrile$illness?$ I$can$see$Alcatraz$from$the$window!$ XYZ$acquired$ABC$yesterday$ ABC$has$been$taken$over$by$XYZ$ Where$is$CiIzen$Kane$playing$in$SF?$$ Castro$Theatre$at$7:30.$Do$ you$want$a$Icket?$ The$S&P500$jumped$ From the slides of D. Jurafsky and C. Manning, 2012
  • 5. Natural Language Processing (NLP) Technologies enabling extraction and manipulation of information from natural language (NL) - English, Italian, Swedish, etc. From my slides, 2018 (a couple of weeks ago) Part-of-Speech Tagging VERB, NOUN, ADJECTIVE Word-sense Disambiguation Machine Translation Information Extraction Dialogue Parsing Sentiment Analysis Coreference Resolution Spam Detection Question-Answering Mostly Solved Making Good Progress Paraphrase Named Entity Recognition PERSON, LOCATION Summarization
  • 6. What Happened? Large Amount of Data Computational Power (GPU) Deep Neural Networks Competitions (Shared Tasks)
  • 7. Rule-based vs Machine Learning NLP if (good or fantastic) in sent then sent.sentiment = positive else if (bad or terrible) in sent then sent.sentiment = negative else sent.sentiment = neutral Rule-based we had good food = positive terrible experience = negative dirty place = negative Supervised Machine Learning we had good food nice meal terrible experience dirty place Unsupervised Machine Learning terrible experience dirty place we had good food nice meal I have to teach the computer how to “understand” the text
  • 8. NL Requirement • Jackson and Zave: Condition over phenomena of the environment that we want to make true by developing the system • Lamsweerde: Goal under the responsibility of a single agent of the software-to-be • ISO/IEC/IEEE 29148 Standard: Statement which translates or expresses a need and its associated constraints and conditions • Wikipedia: Singular documented physical or functional Need that a particular design, product or process aims to satisfy • No agreed INTENSIONAL definition • Some confusion on the types of requirements (e.g., user, system, software, business, functional, non-functional), the concept of specification, etc. • So, let us give some EXAMPLES, and give an EXTENSIONAL definition
  • 9. NL Requirement As a user, I want to share pictures, so that my friends will see them If track data at least to the location where the relevant MA ends are not available on-board, the MA shall be rejected The voucher numbers are system generated and created with unique identification numbers with security protocols in-built. The created unique numbers are then printed out in the form of bar-codes, which will complement (or stuck on the voucher) the voucher. […] User Story One Sentence - High Unstructured When MA_received = FALSE and T_speed > 0 and MA_time > 15, then T_brake = 1 One Sentence - Low Actor Student Success Scenario 1. Student selects “List” 2. System displays available courses 3. Student selects one of the courses Structured - Use Case
  • 10. NL Requirement It would be nice to have a way to search my previous messages by keyword User’s Feedback Application does not create a new item when clicking the SAVE button while creating a new item. Steps to reproduce: 1) Login into the application 2) Pressed button New Item 3) Filled the information for the new item 4) Clicked on Save button 5) Seen an error page “ADA121 Exception: value error” Bug Report
  • 11. NL Requirement • In this talk, a NL requirement is generally a chunk of text in a requirements document • A requirements document contains information to be used for the development of a system • Except in some cases, we do not deal with users’ feedback or bug reports
  • 12. Why are NL Requirements so Special? • Let us compare a NL Requirements Corpus (PURE, ~80 documents) with a generic corpus (Brown) Token: the, user, sets, the, input, parameters Lexical Word: user, sets, input, parameters
  • 14. • Requirements use a more restricted vocabulary (about a half of generic texts in Brown) • Requirements have longer sentences • Requirements use a computer science terminology that is common to different documents (system, data) • Requirements use domain-specific expressions (NPAC, TCS, etc.) • 62% of the lexical words used in PURE do not appear in Brown This suggests that NLP tools trained on generic texts may need to be tailored for requirements
  • 15. NL Requirements Tasks DEFECT DETECTION CATEGORISATION TRACING EQUIVALENT REQUIREMENTS GLOSSARY EXTRACTION MODEL SYNTHESIS = RETRIEVAL Natural Language Requirements Document REGULATORY COMPLIANCE USERS' FEEDBACK ANALYSIS
  • 16. Categorisation Large requirements set user interface communication security usability availability braking speed control flight balance functional categories non-functional categories fine-grained topics Apportionment Retrieval
  • 19. Equivalent Requirements Large requirements set Equivalent Requirements Requirements Analyst
  • 21. Glossary Extraction train automatic train protection automatic train supervision track circuitbalise Domain-specific Relevant Terms Requirements Document Glossary Categorisation Model Generation
  • 22. Model Synthesis Early Requirements / User Stories train track circuit High-level Model Detailed Requirements Problem Scoping Analysis Detailed Model (also Feature Model) Documentation Visual models provide a more comprehensive view on requirements
  • 24. Users’ Feedback Analysis It would be nice to have a way to search my previous messages by keyword Large amount of User’s Feedback This app is amazing When I press back, it crashes Requirement It would be nice to have a way to search my previous messages by keyword This app is amazing Opinion When I press back, it crashes Bug Refactoring Update
  • 25. Observations • Most of RE problems could be solved top-down • I can enforce tracing when writing requirements • I can use constrained natural languages to improve quality • I can tag classes in advance • I can write a glossary in advance • Unfortunately, this does not happen, that’s why we need NLP • We need NLP also to recover from errors when RE problems are addressed top-down by fallible humans
  • 26. Where are we Today? DEFECT DETECTION CATEGORISATION TRACING EQUIVALENT REQUIREMENTS MODEL SYNTHESIS = GLOSSARY EXTRACTION RETRIEVAL REGULATORY COMPLIANCE USERS' FEEDBACK ANALYSIS Mostly Solved Making Good Progress Still Very Hard
  • 27. Basic Support Sub-Tasks • There is no time to explore all possible tasks • However, there are basic sub-tasks that are useful for most of the tasks • Information extraction: extraction of relevant parts 
 of the text • Similarity computation: estimating relatedness
  • 28. Information Extraction DEFECT DETECTION CATEGORISATION TRACING EQUIVALENT REQUIREMENTS GLOSSARY EXTRACTION MODEL SYNTHESIS = RETRIEVAL Natural Language Requirements Document REGULATORY COMPLIANCE USERS' FEEDBACK ANALYSIS
  • 29. Similarity Computation DEFECT DETECTION CATEGORISATION TRACING EQUIVALENT REQUIREMENTS GLOSSARY EXTRACTION MODEL SYNTHESIS = RETRIEVAL Natural Language Requirements Document REGULATORY COMPLIANCE USERS' FEEDBACK ANALYSIS
  • 31. Information Extraction with GATE • Information Retrieval (IR): pulls documents from large corpora • Information Extraction (IE): retrieves structured information from large corpora • IR returns documents containing the relevant information (normally fast) • IE returns precise and structured information (can be slow) • GATE (General Architecture for Text Engineering, https://gate.ac.uk) supports IE • Potential Usage • Entity, Events, Relation extraction • Annotate documents for machine learning • Ambiguity detection in requirements with rule-based approaches
  • 32. Definitions • Document = text + annotations + features • Corpus = collection of documents • Linguistic information in documents is encoded in the form of annotations (like coloured mark-ups) • Annotations have features with relative types and values • EXAMPLE • Annotation: Sentence Length • Feature 1: Length in Characters, Value 1 = 100 • Feature 2: Length in Tokens, Value 2 = 15
  • 33.
  • 34. Processing Resources (PR) • PR are algorithms that make NLP easy • ANNIE English Tokeniser - identify tokens (words, numbers, etc.) • ANNIE Sentence Splitter - identify sentence boundaries • ANNIE Gazetteer - identify specific tokens in a list • ANNIE POS Tagger - identify part-of-speech (POS), like name, adjective, etc. • JAPE Transducers - user-defined annotations based on regular expressions over annotations • ANNIE collects all the algorithms and run them in a PIPELINE
  • 36. ANNIE English Tokenizer Produces Token annotations
  • 37. ANNIE Sentence Splitter Produces Sentence annotations
  • 40. ANNIE POS Tagger Modifies Token annotations with a new feature: category = NN, JJ, VB, etc.
  • 41. JAPE Transducers • Gazetteer lists are designed for annotating simple, regular features • Even identifying simple patterns like e-mails is impossible with a Gazetteer • What is JAPE • JAPE provides pattern matching in GATE • Each JAPE rule consists of: • LHS which contains patterns to match • RHS which details the annotations (and optionally features) to be created
  • 42. • I want to find all the occurrences of the term “level” followed by a number (level 1, level 2, etc.) ERTMS level 2 shall be backward compatible with ERTMS level 1
  • 43. • Adding features and values ERTMS level 2 shall be backward compatible with ERTMS level 1 Level {number = 2} Annotation Feature Value Level {number = 1}
  • 44. Defect Detection as an Information Extraction Problem
  • 46. Ambiguity in RE (from Berry, Kamsties and Krieger, 2003) • Property of an expression of being interpreted in multiple ways • Vagueness: the sentence admits borderline cases 
 (e.g., Avoid long C functions) • Generality: the sentence/term needs to be specified more 
 (e.g., The interface shall be coded in Java) • Lexical ambiguity: term has different unrelated vocabulary meanings (e.g., bank) • Syntactic ambiguity: sentence has more than one syntax tree 
 (e.g., Structured approaches and tools) • Semantic ambiguity: sentence can be translated into more than one logic expression (e.g., All lights have a switch) • Pragmatic ambiguity: the meaning depends on the context – other sentences, domain knowledge, common-sense, viewpoint Berry, D., Kamsties, E. and M. Krieger. From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity. University of Waterloo. 2003
  • 47. Vagueness • Vagueness may occur when I have adjectives and (modal) adverbs (so I can use a POS Tagger, but it gives many false positive cases) • I can use lists of pre-defined vague terms, and include them in a Gazetteer (still, a lot of false positives) • This is the “dumb” approach, often recommended for defect detection: it is easy to discard false positives CTRL + F
  • 48.
  • 49. adaptability, additionally, adequate, aggregate, also, ancillary, arbitrary, appropriate, as appropriate, available, as far as, at last, as few as possible, as little as possible, as many as possible, as much as possible, as required, as well as, bad, both, but, but also, but not limited to, capable of, capable to, capability of, capability, common, correctly, consistent, contemporary, convenient, credible, custom, customary, default, definable, easily, easy, effective, efficient, episodic, equitable, equitably, eventually, exist, exists, expeditiously, fast, fair, fairly, finally, frequently, full, general, generic, good, high-level, impartially, infrequently, insignificant, intermediate, interactive, in terms of, less, lightweight, logical, low-level, maximum, minimum, more, mutually-agreed, mutually-exclusive, mutually- inclusive, near, necessary, neutral, not only, only, on the fly, particular, physical, powerful, practical, prompt, provided, quickly, random, recent, regardless of, relevant, respective, robust, routine, sufficiently, sequential, significant, simple, specific, strong, there, there is, transient, transparent, timely, undefinable, understandable, unless, unnecessary, useful, various, varying List of Vague Terms (from Tjong and Berry’s SREE) If the logical AND between the two input sensors is 1… The system shall implement a logical sequence of steps for… Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity finder for requirements specifications and lessons learned. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg.
  • 50. Syntactic Ambiguity (Coordination) • Detect all sentences that include a potential coordination ambiguity. Whenever I have ambiguity due to and/or. • Example: The system shall produce the speed profile plot or the data-log and the legend. • The system shall produce [the speed profile plot or the data-log] and the legend. • The system shall produce the speed profile plot or [the data-log and the legend].
  • 51. Coordination Ambiguities - Jape Rule 0 Retrieve all occurrences of And / Or Phase: MatchAndOr Input: Token Options: control = appelt //Operator ==~ returns only whole string matches //Operator (?i) tells that the matching is case insensitive //Operator | is the logical "or" operator Rule: checkAndOr Priority: 1 ( {Token.string ==~ "(?i)and"} | {Token.string ==~ "(?i)or"} ):coord --> :coord.AndOr = {} A. Ferrari (ISTI) Requirements Engineering 36 / 60 Retrieve all occurrences of AND / OR Jape Rule 1 The system shall produce the speed profile plot or the data-log and the legend The system shall produce a sound alarm and a visual alarm
  • 52. Coordination Ambiguities - Jape Rule 1 Annotate all the sequences of And/Or in the same sentence Phase: MatchCoordinationSequences Input: Split AndOr Options: control = appelt //Note that, having Split among the input Annotations allows //us to identify sequences of And/Or in the same sentence //The ’+’ operator means "one or more occurrences" //The ’*’ operator means "zero or more occurrences" //The ’?’ operator means "zero or one occurrences" Rule: checkCoordinationSequences Priority: 1 ( {AndOr} ({AndOr})+ ):coordSequence --> :coordSequence.AndOrSequence = {} A. Ferrari (ISTI) Requirements Engineering 37 / 60 Jape Rule 2 Annotate AND / OR in the same sentence The system shall produce a sound alarm and a visual alarm The system shall produce the speed profile plot or the data-log and the legend
  • 53. Jape Rule 3Annotate all the sentences with And/Or sequences Phase: MatchCoordinationSentences Input: Sentence AndOrSequence Options: control = appelt //A "contains" B: searches for //annotations A that contain annotations B Rule: checkCoordinationSentences Priority: 1 ( {Sentence contains AndOrSequence} ):coordSentence --> :coordSentence.AndOrSentence = {} A. Ferrari (ISTI) Requirements Engineering 38 The system shall produce the speed profile plot or the data-log and the legend The system shall produce a sound alarm and a visual alarm Annotate sentence with sequences of AND / OR The system shall produce the plot or the data-log, and the legend Commas matter!
  • 55. Experience: a Railway Company requirement fragments (i.e., contiguous sequences of tokens in the requirement) that match the pattern. In Table 2 we report the patterns in a compact version. The JAPE implementation of the patterns, together with the discard-patterns that will be introduced in Sect. 3.3, is available in our public repository1 . Below, we describe the defect classes addressed by each pattern. Table 2: Pattern adopted for each defect class. Defect Class Pattern Anaphoric ambiguity PANA = (NP)(NP)+ (Split)[0,1] (Token.POS == PP | Token.POS =⇠ PR*) Coordination ambiguity PCO1 = ((Token)+ (Token.string == AND | OR)) [2] PCO2 = (Token.POS == JJ) (Token.POS == NN | NNS) (Token.string == AND | OR) (Token.POS == NN | NNS) Vague terms PV AG = (Token.string 2 Vague) Modal adverbs PADV = (Token.POS == RB | RBR), (Token.string =⇠ ”[.]*ly$”) Passive voice PP V = (AUXVERB)(NOT)?(Token.POS == RB | RBR)? (Token.POS ==VBN) Excessive length PLEN = Sentence.len > 60 Missing condition PMC = (IF)(Token, !Token.kind == punctuation)* (Token.kind == punctuation)(!(ELSE | OTHERWISE)) Missing unit of measurement PMU1 = (NUMBER)((Token)[0, 1](NUMBER))?(!MEASUREMENT) PMU2 = (NUMBER)((Token)[0, 1](NUMBER))?(!PERCENT) Missing reference PMR = (Token.string == “Ref”)(Token.string == “.”) (SpaceToken)?(NUMBER) Undefined term PUT = (Token.kind == word, Token.orth == mixedCaps) 1 https://github.com/ISTI-FMT/QUARS_plus_plus Domain Experts 11 Table 4: Discard patterns. Defect Class Discard Pattern Anaphoric ambiguity DANA = ((Token.POS == PP | Token.POS = PR*) within IT SHALL BE POSSIBLE) Vague terms DV AG1 = (PV AG, Token.string ==⇠ “(?i)sound” | “(?i)light”, Token.POS == NN | NNS) DV AG2 = (PV AG within IT SHALL BE POSSIBLE) DV AG3 = (PV AG within StophP hrasesV ague) Modal adverbs DADV1 = (Token.string ==⇠ “(?i)manually” | “(?i)automatically”) DADV2 = (PADV within INFORMATION PURPOSES ONLY) Undefined term DUT = (PUT contains KnownAcronym) 3.4 SREE Patterns The tool SREE (Tjong and Berry, 2013) is a defect detection tool for NL re- quirements that is oriented to achieve 100% recall for the defects in its scope, Defect-detection patterns Discard patterns A lot of false positives 1800 requirements Adaptation to the language of the company Essential to involve domain experts Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018). Detecting requirements defects with NLP patterns: an industrial experience in the railway domain. Empirical Software Engineering, 1-50.
  • 57. Domain-specific Terms • Requirements typically include domain-specific terms, and sometimes project specific ones (may be easier to extract thanks to conventions, m_balise_group) • Domain-specific terms may be single or multi-word train automatic train protection automatic train supervision track circuitbalise abdominal hysterectomy abdomen lymph nodes continuous passive motion machine administrative law judge affirmative defense just compensation trial
  • 58. Term Extraction • We evaluate how much a word is independent from other words • If a word always occurs with different words, it is likely to be an independent term • Example: The automatic train supervision platform 
 dispatches the vehicles, while the system for 
 automatic train protection brakes the vehicle 
 in case of danger.
  • 59. Term Extraction • If a word often occurs with the same words, it is likely to be part of a multi-word term • Example: The automatic train supervision platform 
 dispatches the vehicles, while the system for 
 automatic train protection brakes the vehicle 
 in case of danger.
  • 60. C/NC-Value Linguistic Analysis POS Tagging Filters Stoplist great, numerous, several, year… Noun, Adj Candidate Strings C-Value automatic train protection, track… RANKED Candidate Strings NC-Value computes termhood considers term-context words RE-RANKED Candidate Strings
  • 61. Contrastive Analysis • Extracted terms might be domain-generic or domain- specific • With contrastive analysis, terms are further ranked according to their domain-specificity
  • 62. Contrastive Analysis • A contrastive corpus is a set of domain-generic documents (e.g., newspapers) • Terms are extracted from the contrastive corpus • The terms found in the requirements are compared with the terms of the contrastive corpus • If a term is less frequent in the contrastive corpus, it is considered as a domain-specific term • If a term is more frequent in the contrastive corpus, it is considered as a domain-generic term • A rank is associated to each term according to its domain-specificity
  • 63. Contrastive Analysis Contrastive Analysis Contrastive Corpus Requirements C-NC Value Ranking C-NC Value Ranking Domain-generic Terms Domain-specific term Domain-generic term Domain-specific Terms Domain-generic Terms Domain-specific term Domain-generic term
  • 64. Experience: Product LinesStep 3: Commonality Candidates Identification Automatic Train Protection Automatic Train Supervision Interlocking ... NetTrack Region ATP ... CCTV ... CCTV ... Smartlock Region ATP ... Airlink ... A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 25 / 36 A Global Feature Diagram (excerpt) ATP Onboard CBTC ATP IXL ATS IXL Controllable ATP Wayside IXL Pure ATP Simple ATP IXL ATS Router ATS Simple ATP Controller A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 12 / 36 Ferrari, A., Spagnolo, G. O., & Dell'Orletta, F. (2013). Mining commonalities and variabilities from natural language documents. In Proceedings of the 17th International Software Product Line Conference (pp. 116-120). ACM. Nasr, S. B., Bécan, G., Acher, M., Ferreira Filho, J. B., Sannier, N., Baudry, B., & Davril, J. M. (2017). Automated extraction of product comparison matrices from informal product descriptions. Journal of Systems and Software, 124, 82-103.
  • 65. • Term Extraction: TerMine, http://www.nactem.ac.uk/software/ termine/ • Contrastive Analysis: Text2Knowledge, http://www.italianlp.it/demo/ t2k-text-to-knowledge/ • Further Readings: • Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. "Automatic recognition of multi-word terms: the C-value/NC-value method." International journal on digital libraries 3.2: 115-130, 2000. • Bonin, Francesca, et al. "A contrastive approach to multi-word term extraction from domain corpora." Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010.
  • 67. Many Reasons for Similarity The system shall support user authentication Before accessing the system, the user shall be authenticated Equivalence Authentication shall be performed by means of iris recognition Refinement If the authentication through iris recognition fails, the system shall ask the user login information Relatedness If the authentication through iris recognition fails, the system shall authenticate the user through fingerprint recognition Inconsistency
  • 68. Lexical and Semantic Similarity The system shall support user authentication The student shall enter login and password Semantically Similar Lexically Different
  • 69. Vector Representation student system support user authentication enter login password 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 The system shall support user authentication The student shall enter login and password • Each sentence is represented as a vector of numbers • Each component of the vector is a term in the complete set of terms • The component is 1 if the term occurs in the requirement, 0 otherwise 
 (other weighting schema can be used, e.g., TF/IDF, to emphasise rare terms)
  • 70. Similarity Metrics Dice +2 X Jaccard + - Cosine X
  • 71. The system shall support user authentication User authentication shall be performed through fingerprint cos > 0 Angle between the two requirements vectors the cosine is greater than zero when the angle is lower than 90 degrees cos = 0 The system shall support user authentication Response time is 100 ms the cosine is zero when the vectors are orthogonal Vectors have one component for each word in the vocabulary
  • 72. student system support user authentication enter login password 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 The system shall support user authentication The student shall enter login and password 0
  • 73. Word Embeddings • I want to enrich the semantic representation of words • Avoid the problem of lexical similarity = 0 • I want compact vector representations (avoid sparse vectors) • In 2013 by Mikolov et al. introduced Skip-gram with negative sampling (SGNS), the most common word embeddings algorithm • Implemented in the package word2vec • Enhance similarity computation, but it is useful for any task in which I want to represent the semantics of words
  • 74. Word Embeddings: Idea • For a human, the meaning of a term is given by the mental (experiential) context of that term • But a human has many senses to create meaning DOG
  • 75. Word Embeddings: Idea • To let a system associate meaning to a term, I can consider only the textual context • Distributional hypothesis (Harris, 1954): the meaning of a word is given by the company it keeps (in a set of documents) The dog is a man’s best friend … … then I went to walk with the dog my dog does not bite, but… …the dog barked too much Documents Meaning dog friend bark walkbite
  • 76. Word Embeddings: Details • To produce the word embedding vectors, a fake task is performed based on the input text, and the word embeddings are produced as a by-product of the task Source Text {requirements, are} {requirements, conditions} {are, requirements} {are, conditions} {are, over} Word Pairs (Training Samples) window size = 2 (context considered) {conditions, are} {…} • A neural network (NN) is trained for the fake task, and the word embeddings are the hidden layer of the trained NN
  • 77. Word Embeddings: Details • Given the word pairs such as {requirements, are}, requirements is the input, and the expected output is are • The weight in the hidden layer of the neural network are incrementally adjusted • At the end of the training, given a word, 
 the output vector is a probability distribution • The word embeddings are the hidden layer of the NN 0 0 0 1 0 0 0 0 vector len = |V| (vocabulary size) requirements p1 p2 pV … p4 p5 p3 … vector len = |V| probability that the context word is “are” probability that the context word is “conditions” e1 e2 e3 … … eL vector len = L (chosen by the user) Word Embedding
  • 78. Word Embeddings: Details • The nice property of word embedding is that vectors of related words are closer than vector of unrelated words requirement constraint dog NOTE 1: the components of the vectors here do not mean anything NOTE 2: I have a vector for each word and not for each sentence
  • 79. Semantic Similarity with Word Embeddings • Given a sentence, I can produce the word embedding for each word • Word embeddings are vectors, so I can combine the word embeddings with typical vector operations (e.g, average vector) to represent requirements • I can use the previous similarity measures (normally, cosine similarity) • More refined measures exist (Word Mover Distance, Word Centroid Similarity, etc.)
  • 80. Semantic Similarity import gensim req_1 = "The system shall support user authentication" req_2 = "The student shall enter login and password" if __name__ == '__main__': mdl = model = gensim.models.Word2Vec.load_word2vec_format(‘./model/ GoogleNews-vectors- negative300.bin’, binary=True) tok_req_1 = nltk.tokenize.word_tokenize(req_1) vect_req_1 = [mdl[t] for t in tok_req_1 if t in mdl.wv.vocab] v_req_1 = [sum(e)/len(e) for e in zip(*vect_req_1)] tok_req_2 = nltk.tokenize.word_tokenize(req_2) vect_req_2 = [mdl[t] for t in tok_req_2 if t in mdl.wv.vocab] v_req_2 = [sum(e)/len(e) for e in zip(*vect_req_2)] print cosine_similarity(v_req_1, v_req_2)
  • 81. Domain-specific Word Embeddings • Pre-trained word embeddings exist that are trained on large amount of generic texts • In requirements, the meaning highly depends on the domain that I am considering • Generic texts are different from requirements, so word embeddings may not represent the actual meaning intended in the requirements
  • 83. Wikipedia Crawling word2vec Domain-specific Portals Each portal includes all the Wiki pages related to a certain domain Domain-specific word embeddings
  • 84. Domain-specific Requirements Similarity req_1 = "The system shall support user authentication" req_2 = "The student shall enter login and password" if __name__ == '__main__': mdl = Word2Vec.load(os.path.join(MODEL_PATH, "Computer_Science_D_2.bin")) tok_req_1 = nltk.tokenize.word_tokenize(req_1) vect_req_1 = [mdl[t] for t in tok_req_1 if t in mdl.wv.vocab] v_req_1 = [sum(e)/len(e) for e in zip(*vect_req_1)] tok_req_2 = nltk.tokenize.word_tokenize(req_2) vect_req_2 = [mdl[t] for t in tok_req_2 if t in mdl.wv.vocab] v_req_2 = [sum(e)/len(e) for e in zip(*vect_req_2)] print cosine_similarity(v_req_1, v_req_2) 0.53966196 • I can use these domain-specific word embeddings 
 to represent my domain-specific requirements
  • 85. Domain-specific Word Embeddings • Let’s look at the neighbouring word vectors in the different domains for MODEL_NAME in MODEL_LIST: mdl = Word2Vec.load(os.path.join(MODEL_PATH, MODEL_NAME)) print MODEL_NAME[:-8], " ", mdl.wv.most_similar("code") Sports [(u'rule', 0.7007173895835876), (u'regulation'), (u'definition'), (u’guideline’)… Computer_Science [(u'compiled', 0.6769073605537415), (u'bytecode'), (u'executable'), (u’assembly’… Medicine [(u'nomenclature', 0.7324844002723694), (u'listing'), (u'taxonomy'), (u’atc’),… Computer_Science [(u'dbms', 0.7940356135368347), (u'rdbms'), (u'nosql'), (u’relational’)… print MODEL_NAME[:-8], " ", mdl.wv.most_similar("database") Literature [(u'internet', 0.9139115810394287), (u'web'), (u'streaming'), (u’librivox’)… Mechanical_Engineering [(u'documentation', 0.8548465967178345), (u'online'), (u’internet’)…
  • 86. Experience Experimental Results – Crawled Documents Table: Number of pages for each domain. Domain Pages Words Vocabulary Computer Science (CS) 10,000 3,985,740 104,907 Electronic Engineering (EEN) 8,568 4,576,917 100,272 Mechanical Engineering (MEN) 7,267 4,459,961 95,466 Medicine (MED) 10,000 5,470,284 150,617 Literature (LIT) 10,000 5,558,470 242,386 Sports (SPO) 10,000 5,725,688 165,814 Technical engineering domains have a more restricted vocabulary A. Ferrari, et al. (ISTI-CNR) Domain-specific Ambiguities 17 / 26 Compute the potential for ambiguity between different domains and Computer Science Crawled Documents Bring to the same vector space Compare domains Ferrari, A., Donati, B., & Gnesi, S. (2017). Detecting Domain-specific Ambiguities: an NLP Approach based on Wikipedia Crawling and Word Embeddings. In 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW) (pp. 393-399). IEEE.
  • 87. “window” has a similar meaning in CS and EEN, different in other domains each line is associated to a domain each radius is a term if a point is close to the center, it means the meaning is very different
  • 88. • GenSim (similarity, and working with embeddings): https://radimrehurek.com/gensim/ • Pre-trained word embeddings: • from word2vec: https://code.google.com/archive/p/word2vec/ • from GloVE: http://nlp.stanford.edu/projects/glove/ • from fastText (2018, also multi-lingual): https://fasttext.cc • Domain-specific word embeddings: https://github.com/alessioferrari/Domain-specific-ambiguity • Further Readings: • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). • Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 238-247). • http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
  • 89. Open Issues (from the NLP4RE Workshop)
  • 90. Data Train supervised machine learning algorithms Validate rule-based and unsupervised algorithms Generalisation through different domains Experiment replication Requirements are confidential Annotations require domain-knowledge Case studies are often the best option!
  • 91. Better Data Data quality impacts on performance Linguistic quality (no grammatical errors) Annotation quality (expert annotators) No Bias! (annotate in advance) Need for tools that learn on-the-job Very hard to involve enough experts Very hard to make them work without showing them a tool Bad requirements are realistic! Some tasks are inherently hard to perform in advance
  • 92. Validation Metrics and Workflows We normally use information retrieval measures for RE tools RE tasks are often composition of tasks (e.g., model generation) Errors made by a tool can have different impacts, depending on the context The context is given by the task, the process, the user In general, it is safe to avoid false negatives… Avoiding false negatives leads to false positives Too many false positives means that the tool does not do its job Try the tool in the field! Competitions!
  • 93. Domain-specificity Different domains speak different languages Domain-adaptation is key Domain-specific resources (ontologies) are needed Different terms but also different business rules Need to automate ontology-building Domain-specific resources require support from domain experts Issues of tacit knowledge
  • 94. Language Issues Most of the available resources are in English Most of the NLP research is for English Requirements are written in different languages Machine translation can be effective solely for certain tasks (e.g., similarity) Don’t forget rule-based techniques!
  • 95. Human-in-the-loop Clearly separate human and machine tasks NLP tools do not replace humans NLP tools empower domain experts NLP tools cannot do everything We need the support of domain experts to build NLP tools for RE Process changes when a tool is used People tend to rely on the tool Tools can have a learning effect
  • 96. Players’ Cooperation RE Researchers Vendors of Requirements Management Tools NLP Researchers Industry (Users) Support for hard NLP tasks RE awareness Provide pluggable solutions* Clarify NLP capabilities, principles and needs (i.e., expert support) *NLP technologies are GPL, RE tools are not! Support for scoping the discipline
  • 97. NLP Technologies and Resources • Extract information from text: General Architecture for Text Engineering (GATE): https://gate.ac.uk • Perform NLP fine-grained analyses: • Python Natural Language Toolkit (NLTK): https://www.nltk.org • TextBlob (high-level API to NLTK): https://textblob.readthedocs.io/ • GenSim (for similarity): https://radimrehurek.com/gensim/ • Stanford CoreNLP (Java): https://stanfordnlp.github.io/CoreNLP/ • SpaCy (designed for speed): https://spacy.io • Dive into machine learning and deep learning: • WEKA (user-friendly, several algorithms): https://www.cs.waikato.ac.nz/ml/weka/ • TensorFlow: https://www.tensorflow.org • Keras (high-level API to TensorFlow): https://keras.io
  • 98. Selected Publications (with trends based on my opinion) 
 Sultanov, H., & Hayes, J. H. (2013, July). Application of reinforcement learning to requirements engineering: requirements tracing. In Requirements Engineering Conference (RE), 2013 21st IEEE International (pp. 52-61). IEEE. 
 Gervasi, V., & Zowghi, D. (2014). Supporting traceability through affinity mining. In Requirements Engineering Conference (RE), 2014 IEEE 22nd International (pp. 143-152). IEEE. Borg, M., Runeson, P., & Ardö, A. (2014). Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering, 19(6), 1565-1616. Mahmoud, A., & Nan N.. "On the role of semantics in automated requirements tracing." Requirements Engineering 20.3 (2015): 281-300. Guo, J., Cheng, J., & Cleland-Huang, J. (2017). Semantically enhanced software traceability using deep learning techniques. In Proceedings of the 39th International Conference on Software Engineering (pp. 3-14). IEEE Press. Hübner, P., & Paech, B. (2018). Evaluation of Techniques to Detect Wrong Interaction Based Trace Links. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 75-91). Springer, Cham. Tracing Ferrari, A., Dell’Orletta, F., Esuli, A., Gervasi, V., & Gnesi, S. (2017). Natural Language Requirements Processing: A 4D Vision. IEEE Software, 34(6), 28-35. General Introduction Casamayor, A., Godoy, D., & Campo, M. (2010). Identification of non-functional requirements in textual specifications: A semi-supervised learning approach. Information and Software Technology, 52(4), 436-445. Casamayor, A., Godoy, D., & Campo, M. (2012). Functional grouping of natural language requirements for assistance in architectural software design. Knowledge- Based Systems, 30, 78-86. Knauss, E., & Ott, D. (2014). (Semi-) automatic Categorization of Natural Language Requirements. In 
 International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 39-54). Springer International Publishing. Kurtanović, Z., & Maalej, W. (2017). Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning. In Requirements Engineering Conference (RE), 2017 IEEE 25th International (pp. 490-495). IEEE. Categorisation
  • 99. Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity finder for requirements specifications and lessons learned. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg. Arora, C., Sabetzadeh, M., Briand, L., & Zimmer, F. (2015). Automated checking of conformance to requirements templates using natural language processing. IEEE transactions on Software Engineering, 41(10), 944-968. Femmer, H., Fernández, D. M., Wagner, S., & Eder, S. (2017). Rapid quality assurance with requirements smells. Journal of Systems and Software, 123, 190-213. Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018). Detecting requirements defects with NLP patterns: an industrial experience in the railway domain. Empirical Software Engineering, 1-50. Defect Detection Falessi, D., Cantone, G., & Canfora, G. (2013). Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Transactions on Software Engineering, 39(1), 18-44. Equivalent Requirements Goldin, L., & Berry, D. M. (1997). AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. ASE, 4(4), 375-412. Gacitua, R., Sawyer, P., & Gervasi, V. (2011). Relevance-based abstraction identification: technique and evaluation. Requirements Engineering, 16(3), 251. Bakar, N. H., Kasirun, Z. M., & Salleh, N. (2015). Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review. Journal of Systems and Software, 106, 132-149. Quirchmayr, T., Paech, B., Kohl, R., & Karey, H. (2017). Semi-automatic software feature-relevant information extraction from natural language user manuals. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 255-272). Springer, Cham. Glossary Extraction Yue, T., Briand, L. C., & Labiche, Y. (2011). A systematic review of transformation approaches between user requirements and analysis models. Requirements Engineering, 16(2), 75-99. Yue, T., Briand, L. C., & Labiche, Y. (2015). aToucan: an automated framework to derive UML analysis models from use case models. ACM Transactions on Software Engineering and Methodology (TOSEM), 24(3), 13. Lucassen, G., Robeer, M., Dalpiaz, F., van der Werf, J. M. E., & Brinkkemper, S. (2017). Extracting conceptual models from user stories with Visual Narrator. Requirements Engineering, 22(3), 339-358. Model Synthesis
  • 100. Chen, N., Lin, J., Hoi, S. C., Xiao, X., & Zhang, B. (2014). AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering (pp. 767-778). ACM. Maalej, W., & Nabil, H. (2015, August). Bug report, feature request, or simply praise? on automatically classifying app reviews. In Requirements Engineering Conference (RE), 2015 IEEE 23rd International (pp. 116-125). IEEE. Guzman, E., Alkadhi, R., & Seyff, N. (2016). A needle in a haystack: What do twitter users say about software?. In Requirements Engineering Conference (RE), 2016 IEEE 24th International (pp. 96-105). IEEE. Maalej, W., Nayebi, M., Johann, T., & Ruhe, G. (2016). Toward data-driven requirements engineering. IEEE Software, 33(1), 48-54. Martin, W., Sarro, F., Jia, Y., Zhang, Y., & Harman, M. (2017). A survey of app store analysis for software engineering. IEEE transactions on software engineering, 43(9), 817-847. Groen, E. C., Seyff, N., Ali, R., Dalpiaz, F., Doerr, J., Guzman, E., ... & Stade, M. (2017). The crowd in requirements engineering: The landscape and challenges. IEEE software, 34(2), 44-52. Users’ Feedback Analysis Breaux, T. D., Vail, M. W., & Anton, A. I. (2006). Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In Requirements Engineering, 14th IEEE International Conference (pp. 49-58). IEEE. Cleland-Huang, J., Czauderna, A., Gibiec, M., & Emenecker, J. (2010). A machine learning approach for tracing regulatory codes to product specific requirements. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 (pp. 155-164). ACM. Massey, A. K., Rutledge, R. L., Antón, A. I., & Swire, P. P. (2014). Identifying and classifying ambiguity for regulatory requirements. In Requirements Engineering Conference (RE), 2014 IEEE 22nd International (pp. 83-92). IEEE. Hosseini, M. B., Breaux, T. D., & Niu, J. (2018). Inferring Ontology Fragments from Semantic Role Typing of Lexical Variants. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 39-56). Springer, Cham. Regulatory Compliance
  • 101. Natt och Dag, J., Gervasi, V., Brinkkemper, S., & Regnell, B. (2004). Speeding up requirements management in a product software company: Linking customer wishes to product requirements through linguistic engineering. In Requirements Engineering Conference, 2004. Proceedings. 12th IEEE International (pp. 283-294). IEEE. Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., & Mirakhorli, M. (2011). On-demand feature recommendations derived from mining public product descriptions. In Proceedings of the 33rd International Conference on Software Engineering (pp. 181-190). ACM. Retrieval Berry, D., Gacitua, R., Sawyer, P., & Tjong, S. F. (2012). The case for dumb requirements engineering tools. In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 211-217). Springer, Berlin, Heidelberg. Berry, D. M. (2017). Evaluation of Tools for Hairy Requirements and Software Engineering Tasks. In 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW) (pp. 284-291). IEEE. Berry, D. M., Cleland-Huang, J., Ferrari, A., Maalej, W., Mylopoulos, J., & Zowghi, D. (2017). Panel: Context-Dependent Evaluation of Tools for NL RE Tasks: Recall vs. Precision, and Beyond. In Requirements Engineering Conference (RE), 2017 IEEE 25th International (pp. 570-573). IEEE. Tool Evaluation