Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview

Natural Language
Requirements Processing:
from Research to Practice
Alessio Ferrari

CNR-ISTI, Pisa, Italy

alessio.ferrari@isti.cnr.it

http://alessiofer.wixsite.com/alessioferrari

Twitter: @alessferra

Objectives
• Stimulate your curiosity
• Show that some things can be really easy to do at home

• Show that some things can easily become very complicated  
(don’t do that at home!)

• For practitioners and researchers

• Some parts are tutorial-like

• No, this is not about deep learning

NLP and Requirements Engineering

Natural Language
Processing (NLP)
Technologies enabling extraction and manipulation of information
from natural language (NL) - English, Italian, Swedish, etc.
Dan$Jurafsky$
Language(Technology(
Coreference$resoluIon$
QuesIon$answering$(QA)$
PartOofOspeech$(POS)$tagging$
Word$sense$disambiguaIon$(WSD)$
Paraphrase$
Named$enIty$recogniIon$(NER)$
Parsing$
SummarizaIon$
InformaIon$extracIon$(IE)$
Machine$translaIon$(MT)$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
making$good$progress$
sIll$really$hard$
Spam$detecIon$
Let’s$go$to$Agra!$
Buy$V1AGRA$…$
✓
✗
Colorless$$$green$$$ideas$$$sleep$$$furiously.$
$$$$$ADJ$$$$$$$$$ADJ$$$$NOUN$$VERB$$$$$$ADV$
Einstein$met$with$UN$oﬃcials$in$Princeton$
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
You’re$invited$to$our$dinner$
party,$Friday$May$27$at$8:30$
Party$
May$27$
add$
Best$roast$chicken$in$San$Francisco!$
The$waiter$ignored$us$for$20$minutes.$
Carter$told$Mubarak$he$shouldn’t$run$again.$
I$need$new$baWeries$for$my$mouse.$
The$13th$Shanghai$InternaIonal$Film$FesIval…$
13 …
The$Dow$Jones$is$up$
Housing$prices$rose$
Economy$is$
good$
Q.$How$eﬀecIve$is$ibuprofen$in$reducing$
fever$in$paIents$with$acute$febrile$illness?$
I$can$see$Alcatraz$from$the$window!$
XYZ$acquired$ABC$yesterday$
ABC$has$been$taken$over$by$XYZ$
Where$is$CiIzen$Kane$playing$in$SF?$$
Castro$Theatre$at$7:30.$Do$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Paraphrase$
Parsing$
SummarizaIon$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
sIll$really$hard$
Spam$detecIon$
Buy$V1AGRA$…$
✓
✗
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
Party$
May$27$
add$
13 …
Economy$is$
good$
you$want$a$Icket?$
The$S&P500$jumped$
Dan$Jurafsky$
Paraphrase$
Parsing$
SummarizaIon$
Dialog$
SenIment$analysis$
$$$
mostly$solved$
sIll$really$hard$
Spam$detecIon$
Buy$V1AGRA$…$
✓
✗
PERSON$$$$$$$$$$$$$$ORG$$$$$$$$$$$$$$$$$$$$$$LOC$
Party$
May$27$
add$
13 …
Economy$is$
good$
you$want$a$Icket?$
The$S&P500$jumped$
From the slides of
D. Jurafsky and C. Manning, 2012

Natural Language
Processing (NLP)
Technologies enabling extraction and manipulation of information
from natural language (NL) - English, Italian, Swedish, etc.
From my slides, 2018
(a couple of weeks ago)
Part-of-Speech Tagging
VERB, NOUN, ADJECTIVE
Word-sense Disambiguation
Machine Translation
Information Extraction
Dialogue
Parsing
Sentiment Analysis
Coreference Resolution
Spam Detection
Question-Answering
Mostly Solved Making Good Progress
Paraphrase
Named Entity Recognition
PERSON, LOCATION
Summarization

What Happened?
Large Amount of Data Computational Power (GPU)
Deep Neural Networks Competitions (Shared Tasks)

Rule-based vs
Machine Learning NLP
if (good or fantastic) in sent then sent.sentiment = positive
else if (bad or terrible) in sent then sent.sentiment = negative
else sent.sentiment = neutral
Rule-based
we had good food = positive
terrible experience = negative
dirty place = negative
Supervised Machine Learning
we had good food
nice meal
terrible experience
dirty place
Unsupervised Machine Learning
terrible experience
dirty place
we had good food
nice meal
I have to teach the computer
how to “understand” the text

NL Requirement
• Jackson and Zave: Condition over phenomena of the environment that we want to make
true by developing the system
• Lamsweerde: Goal under the responsibility of a single agent of the software-to-be
• ISO/IEC/IEEE 29148 Standard: Statement which translates or expresses a need and its
associated constraints and conditions
• Wikipedia: Singular documented physical or functional Need that a particular design,
product or process aims to satisfy

• No agreed INTENSIONAL definition

• Some confusion on the types of requirements (e.g., user, system, software, business,
functional, non-functional), the concept of specification, etc.

• So, let us give some EXAMPLES, and give an EXTENSIONAL definition

NL Requirement
As a user, I want to share pictures, so that my friends will see them
If track data at least to the location where the relevant MA ends are
not available on-board, the MA shall be rejected
The voucher numbers are system
generated and created with unique
identiﬁcation numbers with security
protocols in-built. The created unique
numbers are then printed out in the form
of bar-codes, which will complement (or
stuck on the voucher) the voucher. […]
User Story
One Sentence - High
Unstructured
When MA_received = FALSE and T_speed > 0 and MA_time > 15, then T_brake = 1
One Sentence - Low
Actor Student
Success Scenario 1. Student selects “List”
2. System displays available courses
3. Student selects one of the courses
Structured - Use Case

NL Requirement
It would be nice to have a way to search
my previous messages by keyword
User’s Feedback
Application does not create a new item when clicking the
SAVE button while creating a new item. Steps to
reproduce:
1) Login into the application
2) Pressed button New Item
3) Filled the information for the new item
4) Clicked on Save button
5) Seen an error page “ADA121 Exception: value error”
Bug Report

NL Requirement
• In this talk, a NL requirement is generally a chunk of text
in a requirements document

• A requirements document contains information to be
used for the development of a system

• Except in some cases, we do not deal with users’
feedback or bug reports

Why are NL Requirements
so Special?
• Let us compare a NL Requirements Corpus (PURE, ~80
documents) with a generic corpus (Brown)
Token: the, user, sets, the, input, parameters
Lexical Word: user, sets, input, parameters

• Requirements use a more restricted vocabulary (about a half of
generic texts in Brown)

• Requirements have longer sentences

• Requirements use a computer science terminology that is
common to diﬀerent documents (system, data)

• Requirements use domain-speciﬁc expressions (NPAC, TCS, etc.)

• 62% of the lexical words used in PURE do not
appear in Brown
This suggests that NLP tools trained on
generic texts may need to be tailored
for requirements

NL Requirements Tasks
DEFECT DETECTION
CATEGORISATION
TRACING
EQUIVALENT REQUIREMENTS
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
REGULATORY COMPLIANCE
USERS' FEEDBACK
ANALYSIS

Categorisation
Large requirements set
user interface
communication
security
usability
availability
braking
speed control
ﬂight balance
functional
categories
non-functional
categories
ﬁne-grained topics
Apportionment
Retrieval

Retrieval
Procurement
Documents
Existing
Requirements
Bid
Tailored
Product
Customer’s
Requirements

Tracing
High-level
Requirements
Low-level
Requirements
Design
Code
Architecture
Process Artefacts
Refactoring
and Impact Analysis
External
Assessment

Equivalent Requirements
Large requirements set
Equivalent
Requirements
Requirements
Analyst

Defect Detection
ambiguous
vague
weak verb
passive form
Requirements
Analyst

Glossary Extraction
train
automatic train protection
automatic train supervision
track circuitbalise
Domain-speciﬁc
Relevant Terms
Requirements
Document
Glossary
Categorisation
Model Generation

Model Synthesis
Early Requirements /
User Stories
train
track circuit
High-level
Model
Detailed Requirements
Problem
Scoping
Analysis
Detailed Model
(also Feature Model)
Documentation
Visual models provide
a more comprehensive view
on requirements

Regulatory Compliance
Regulations
Requirements
Rights
Obligations
Actors
Actions
Resources
Constraints
Privacy Policy Regulations
Sensitive Data
Ambiguity

Users’ Feedback Analysis
Large amount of
User’s Feedback
This app is amazing
When I press back, it crashes
Requirement
This app is amazing
Opinion
When I press back, it crashes
Bug
Refactoring
Update

Observations
• Most of RE problems could be solved top-down
• I can enforce tracing when writing requirements

• I can use constrained natural languages to improve quality

• I can tag classes in advance

• I can write a glossary in advance

• Unfortunately, this does not happen, that’s why we need NLP

• We need NLP also to recover from errors when RE problems are
addressed top-down by fallible humans

Where are we Today?
DEFECT DETECTION
CATEGORISATION
TRACING
MODEL SYNTHESIS
=
GLOSSARY EXTRACTION
RETRIEVAL
USERS' FEEDBACK
ANALYSIS
Mostly Solved Making Good Progress Still Very Hard

Basic Support Sub-Tasks
• There is no time to explore all possible tasks

• However, there are basic sub-tasks that are useful for
most of the tasks

• Information extraction: extraction of relevant parts  
of the text

• Similarity computation: estimating relatedness

DEFECT DETECTION
CATEGORISATION
TRACING
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
USERS' FEEDBACK
ANALYSIS

Similarity Computation
DEFECT DETECTION
CATEGORISATION
TRACING
GLOSSARY EXTRACTION
MODEL SYNTHESIS
=
RETRIEVAL
Natural
Language
Requirements
Document
USERS' FEEDBACK
ANALYSIS

with GATE
• Information Retrieval (IR): pulls documents from large corpora

• Information Extraction (IE): retrieves structured information from large corpora

• IR returns documents containing the relevant information (normally fast)

• IE returns precise and structured information (can be slow)

• GATE (General Architecture for Text Engineering, https://gate.ac.uk) supports IE

• Potential Usage
• Entity, Events, Relation extraction

• Annotate documents for machine learning

• Ambiguity detection in requirements with rule-based approaches

Deﬁnitions
• Document = text + annotations + features

• Corpus = collection of documents

• Linguistic information in documents is encoded in the form of
annotations (like coloured mark-ups)

• Annotations have features with relative types and values

• EXAMPLE
• Annotation: Sentence Length

• Feature 1: Length in Characters, Value 1 = 100
• Feature 2: Length in Tokens, Value 2 = 15

Processing Resources (PR)
• PR are algorithms that make NLP easy

• ANNIE English Tokeniser - identify tokens (words, numbers, etc.)

• ANNIE Sentence Splitter - identify sentence boundaries

• ANNIE Gazetteer - identify speciﬁc tokens in a list

• ANNIE POS Tagger - identify part-of-speech (POS), like name,
adjective, etc.

• JAPE Transducers - user-deﬁned annotations based on regular
expressions over annotations

• ANNIE collects all the algorithms and run them in a PIPELINE

Pipeline
Text
Tokenizer
Sentence
Splitter
Gazetteer
POS
Tagger
Jape
Transducer
Document

ANNIE English Tokenizer
Produces Token annotations

ANNIE Sentence Splitter
Produces Sentence annotations

Gazetteer Produces Lookup annotations

ANNIE POS Tagger
Modiﬁes Token annotations with a new feature:
category = NN, JJ, VB, etc.

JAPE Transducers
• Gazetteer lists are designed for annotating simple, regular features

• Even identifying simple patterns like e-mails is impossible with a
Gazetteer

• What is JAPE

• JAPE provides pattern matching in GATE

• Each JAPE rule consists of:

• LHS which contains patterns to match

• RHS which details the annotations (and optionally features) to be
created

• I want to ﬁnd all the occurrences of the term “level”
followed by a number (level 1, level 2, etc.)
ERTMS level 2 shall be backward compatible
with ERTMS level 1

• Adding features and values
ERTMS level 2 shall be backward compatible
with ERTMS level 1
Level {number = 2}
Annotation Feature Value
Level {number = 1}

Defect Detection as an Information
Extraction Problem

Redundant
Implementation-dependent Inconsistent
Incomplete Multiple
Unfeasible
Non-traceable
Non-veriﬁable
Requirement
Ambiguous

Ambiguity in RE (from Berry, Kamsties and Krieger, 2003)
• Property of an expression of being interpreted in multiple ways

• Vagueness: the sentence admits borderline cases  
(e.g., Avoid long C functions)

• Generality: the sentence/term needs to be specified more  
(e.g., The interface shall be coded in Java)

• Lexical ambiguity: term has different unrelated vocabulary meanings
(e.g., bank)

• Syntactic ambiguity: sentence has more than one syntax tree  
(e.g., Structured approaches and tools)

• Semantic ambiguity: sentence can be translated into more than one
logic expression (e.g., All lights have a switch)

• Pragmatic ambiguity: the meaning depends on the context – other
sentences, domain knowledge, common-sense, viewpoint
Berry, D., Kamsties, E. and M. Krieger. From Contract Drafting to Software Specification:

Linguistic Sources of Ambiguity. University of Waterloo. 2003

Vagueness
• Vagueness may occur when I have adjectives and (modal)
adverbs (so I can use a POS Tagger, but it gives many
false positive cases)

• I can use lists of pre-deﬁned vague terms, and include
them in a Gazetteer (still, a lot of false positives)

• This is the “dumb” approach, often recommended for
defect detection: it is easy to discard false positives
CTRL + F

adaptability, additionally, adequate, aggregate, also, ancillary, arbitrary, appropriate, as
appropriate, available, as far as, at last, as few as possible, as little as possible, as many
as possible, as much as possible, as required, as well as, bad, both, but, but also, but
not limited to, capable of, capable to, capability of, capability, common, correctly,
consistent, contemporary, convenient, credible, custom, customary, default, definable,
easily, easy, effective, efficient, episodic, equitable, equitably, eventually, exist, exists,
expeditiously, fast, fair, fairly, finally, frequently, full, general, generic, good, high-level,
impartially, infrequently, insignificant, intermediate, interactive, in terms of, less,
lightweight, logical, low-level, maximum, minimum, more, mutually-agreed,
mutually-exclusive, mutually- inclusive, near, necessary, neutral, not only, only, on the
fly, particular, physical, powerful, practical, prompt, provided, quickly, random, recent,
regardless of, relevant, respective, robust, routine, sufficiently, sequential, significant,
simple, specific, strong, there, there is, transient, transparent, timely, undefinable,
understandable, unless, unnecessary, useful, various, varying
List of Vague Terms (from Tjong and Berry’s SREE)
If the logical AND between the two input sensors is 1…
The system shall implement a logical sequence of steps for…
Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity finder for requirements specifications and lessons learned.

In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg.

Syntactic Ambiguity
(Coordination)
• Detect all sentences that include a potential coordination
ambiguity. Whenever I have ambiguity due to and/or.

• Example: The system shall produce the speed profile
plot or the data-log and the legend.

• The system shall produce [the speed profile plot or
the data-log] and the legend.

• The system shall produce the speed profile plot or
[the data-log and the legend].

Coordination Ambiguities - Jape Rule 0
Retrieve all occurrences of And / Or
Phase: MatchAndOr
Input: Token
Options: control = appelt
//Operator ==~ returns only whole string matches
//Operator (?i) tells that the matching is case insensitive
//Operator | is the logical "or" operator
Rule: checkAndOr
Priority: 1
(
{Token.string ==~ "(?i)and"} |
{Token.string ==~ "(?i)or"}
):coord
-->
:coord.AndOr = {}
A. Ferrari (ISTI) Requirements Engineering 36 / 60
Retrieve all occurrences of
AND / OR
Jape Rule 1
The system shall produce the speed proﬁle plot or the data-log and the legend
The system shall produce a sound alarm and a visual alarm

Coordination Ambiguities - Jape Rule 1
Annotate all the sequences of And/Or in the same sentence
Phase: MatchCoordinationSequences
Input: Split AndOr
//Note that, having Split among the input Annotations allows
//us to identify sequences of And/Or in the same sentence
//The ’+’ operator means "one or more occurrences"
//The ’*’ operator means "zero or more occurrences"
//The ’?’ operator means "zero or one occurrences"
Rule: checkCoordinationSequences
Priority: 1
(
{AndOr}
({AndOr})+
):coordSequence
-->
:coordSequence.AndOrSequence = {}
A. Ferrari (ISTI) Requirements Engineering 37 / 60
Jape Rule 2
Annotate AND / OR in the
same sentence

Jape Rule 3Annotate all the sentences with And/Or sequences
Phase: MatchCoordinationSentences
Input: Sentence AndOrSequence
//A "contains" B: searches for
//annotations A that contain annotations B
Rule: checkCoordinationSentences
Priority: 1
(
{Sentence contains AndOrSequence}
):coordSentence
-->
:coordSentence.AndOrSentence = {}
A. Ferrari (ISTI) Requirements Engineering 38
Annotate sentence with
sequences of AND / OR
The system shall produce the plot or the data-log, and the legend
Commas matter!

Your Own, Personal
Requirements Assistant

Experience: a Railway Company
requirement fragments (i.e., contiguous sequences of tokens in the requirement)
that match the pattern. In Table 2 we report the patterns in a compact version.
The JAPE implementation of the patterns, together with the discard-patterns
that will be introduced in Sect. 3.3, is available in our public repository1
.
Below, we describe the defect classes addressed by each pattern.
Table 2: Pattern adopted for each defect class.
Defect Class Pattern
Anaphoric ambiguity
PANA = (NP)(NP)+
(Split)[0,1]
(Token.POS == PP | Token.POS =⇠ PR*)
Coordination
ambiguity
PCO1
= ((Token)+ (Token.string == AND | OR)) [2]
PCO2
= (Token.POS == JJ) (Token.POS == NN | NNS)
(Token.string == AND | OR) (Token.POS == NN | NNS)
Vague terms PV AG = (Token.string 2 Vague)
Modal adverbs
PADV = (Token.POS == RB | RBR),
(Token.string =⇠ ”[.]*ly$”)
Passive voice
PP V = (AUXVERB)(NOT)?(Token.POS == RB | RBR)?
(Token.POS ==VBN)
Excessive length PLEN = Sentence.len > 60
Missing condition
PMC = (IF)(Token, !Token.kind == punctuation)*
(Token.kind == punctuation)(!(ELSE | OTHERWISE))
Missing unit
of measurement
PMU1
= (NUMBER)((Token)[0, 1](NUMBER))?(!MEASUREMENT)
PMU2
= (NUMBER)((Token)[0, 1](NUMBER))?(!PERCENT)
Missing reference
PMR = (Token.string == “Ref”)(Token.string == “.”)
(SpaceToken)?(NUMBER)
Undeﬁned term PUT = (Token.kind == word, Token.orth == mixedCaps)
1 https://github.com/ISTI-FMT/QUARS_plus_plus
Domain
Experts
11
Table 4: Discard patterns.
Defect Class Discard Pattern
Anaphoric ambiguity
DANA = ((Token.POS == PP | Token.POS = PR*)
within IT SHALL BE POSSIBLE)
Vague terms
DV AG1
= (PV AG, Token.string ==⇠ “(?i)sound” | “(?i)light”,
Token.POS == NN | NNS)
DV AG2
= (PV AG within IT SHALL BE POSSIBLE)
DV AG3
= (PV AG within StophP hrasesV ague)
Modal adverbs
DADV1
= (Token.string ==⇠ “(?i)manually” | “(?i)automatically”)
DADV2
= (PADV within INFORMATION PURPOSES ONLY)
Undeﬁned term DUT = (PUT contains KnownAcronym)
3.4 SREE Patterns
The tool SREE (Tjong and Berry, 2013) is a defect detection tool for NL re-
quirements that is oriented to achieve 100% recall for the defects in its scope,
Defect-detection patterns
Discard patterns
A lot of false
positives
1800
requirements
Adaptation to the language of the company
Essential to involve domain experts
Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018).
Detecting requirements defects with NLP patterns: an industrial experience
in the railway domain. Empirical Software Engineering, 1-50.

Domain-specific Terms
• Requirements typically include domain-specific terms, and
sometimes project specific ones (may be easier to extract thanks to
conventions, m_balise_group)

• Domain-specific terms may be single or multi-word
train
automatic train protection
automatic train supervision
track circuitbalise
abdominal hysterectomy
abdomen lymph nodes
continuous passive motion machine
administrative law judge
affirmative defense
just compensation
trial

Term Extraction
• We evaluate how much a word is independent from other
words

• If a word always occurs with diﬀerent words, it is likely to
be an independent term
• Example: The automatic train supervision platform  
dispatches the vehicles, while the system for  
automatic train protection brakes the vehicle  
in case of danger.

Term Extraction
• If a word often occurs with the same words, it is likely to
be part of a multi-word term
• Example: The automatic train supervision platform  
dispatches the vehicles, while the system for  
automatic train protection brakes the vehicle  
in case of danger.

C/NC-Value
Linguistic Analysis
POS Tagging Filters Stoplist
great, numerous,
several, year…
Noun, Adj
Candidate
Strings
C-Value
automatic train
protection,
track…
RANKED
Candidate
Strings
NC-Value
computes
termhood
considers term-context words
RE-RANKED
Candidate
Strings

Contrastive Analysis
• Extracted terms might be domain-generic or domain-
speciﬁc

• With contrastive analysis, terms are further ranked
according to their domain-speciﬁcity

• A contrastive corpus is a set of domain-generic documents (e.g.,
newspapers)

• Terms are extracted from the contrastive corpus

• The terms found in the requirements are compared with the terms of the
contrastive corpus

• If a term is less frequent in the contrastive corpus, it is considered as
a domain-speciﬁc term

• If a term is more frequent in the contrastive corpus, it is considered
as a domain-generic term
• A rank is associated to each term according to its domain-speciﬁcity

Contrastive
Analysis
Contrastive
Corpus
Requirements
C-NC Value
Ranking
C-NC Value
Ranking
Domain-generic
Terms
Domain-specific term
Domain-generic term
Domain-specific
Terms
Domain-generic
Terms
Domain-specific term
Domain-generic term

Experience: Product LinesStep 3: Commonality Candidates Identiﬁcation
Automatic Train Protection
Automatic Train Supervision
Interlocking
...
NetTrack
Region ATP
...
CCTV
...
CCTV
...
Smartlock
Region ATP
...
Airlink
...
A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 25 / 36
A Global Feature Diagram (excerpt)
ATP
Onboard
CBTC
ATP IXL ATS
IXL
Controllable
ATP
Wayside
IXL
Pure
ATP
Simple
ATP
IXL
ATS
Router
ATS
Simple
ATP
Controller
A. Ferrari, et al. (ISTI-CNR, ILC-CNR) Mining Commonalities and Variabilities 12 / 36
Ferrari, A., Spagnolo, G. O., & Dell'Orletta, F. (2013). Mining commonalities and variabilities from natural language documents.
In Proceedings of the 17th International Software Product Line Conference (pp. 116-120). ACM.
Nasr, S. B., Bécan, G., Acher, M., Ferreira Filho, J. B., Sannier, N., Baudry, B., & Davril, J. M. (2017). Automated extraction of product comparison matrices from informal
product descriptions. Journal of Systems and Software, 124, 82-103.

• Term Extraction: TerMine, http://www.nactem.ac.uk/software/
termine/

• Contrastive Analysis: Text2Knowledge, http://www.italianlp.it/demo/
t2k-text-to-knowledge/

• Further Readings:
• Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. "Automatic
recognition of multi-word terms: the C-value/NC-value method."
International journal on digital libraries 3.2: 115-130, 2000.

• Bonin, Francesca, et al. "A contrastive approach to multi-word
term extraction from domain corpora." Proceedings of the 7th
International Conference on Language Resources and Evaluation.
2010.

Many Reasons for Similarity
The system shall support user authentication
Before accessing the system, the user
shall be authenticated
Equivalence
Authentication shall be performed by means of iris recognition
Reﬁnement
If the authentication through iris recognition fails,
the system shall ask the user login information
Relatedness
If the authentication through iris recognition fails,
the system shall authenticate the user through ﬁngerprint recognition
Inconsistency

Lexical and Semantic
Similarity
The student shall enter login and password
Semantically
Similar
Lexically
Diﬀerent

Vector Representation
student system support user authentication enter login password
0 1 1 1 1 0 0 0
1 0 0 0 0 1 1 1
• Each sentence is represented as a vector of numbers

• Each component of the vector is a term in the complete set of terms

• The component is 1 if the term occurs in the requirement, 0 otherwise  
(other weighting schema can be used, e.g., TF/IDF, to emphasise rare terms)

Similarity Metrics
Dice +2 X
Jaccard + -
Cosine X

The system shall support

user authentication
User authentication shall be performed

through ﬁngerprint
cos > 0
Angle between the two
requirements vectors
the cosine is greater than zero when
the angle is lower than 90 degrees
cos = 0
The system shall support

user authentication
Response time is 100 ms
the cosine is zero when
the vectors are orthogonal
Vectors have one component
for each word in the vocabulary

student system support user authentication enter login password
0 1 1 1 1 0 0 0
1 0 0 0 0 1 1 1
0

Word Embeddings
• I want to enrich the semantic representation of words

• Avoid the problem of lexical similarity = 0

• I want compact vector representations (avoid sparse vectors)

• In 2013 by Mikolov et al. introduced Skip-gram with negative
sampling (SGNS), the most common word embeddings algorithm
• Implemented in the package word2vec
• Enhance similarity computation, but it is useful for any task in
which I want to represent the semantics of words

Word Embeddings: Idea
• For a human, the meaning of a term is given by the
mental (experiential) context of that term

• But a human has many senses to create meaning
DOG

Word Embeddings: Idea
• To let a system associate meaning to a term, I can consider
only the textual context

• Distributional hypothesis (Harris, 1954): the meaning of a
word is given by the company it keeps (in a set of documents)
The dog is a man’s best friend …
… then I went to walk with the dog
my dog does not bite, but…
…the dog barked too much
Documents Meaning
dog
friend bark
walkbite

Word Embeddings: Details
• To produce the word embedding vectors, a fake task is performed based on the
input text, and the word embeddings are produced as a by-product of the task
Source Text
{requirements, are}
{requirements, conditions}
{are, requirements}
{are, conditions}
{are, over}
Word Pairs
(Training Samples)
window size = 2 (context considered)
{conditions, are}
{…}
• A neural network (NN) is trained for the fake task, and the
word embeddings are the hidden layer of the trained NN

• Given the word pairs such as
{requirements, are}, requirements
is the input, and the expected output is are

• The weight in the hidden layer of the neural
network are incrementally adjusted
• At the end of the training, given a word,  
the output vector is a probability
distribution

• The word embeddings are the hidden
layer of the NN
0
0
0
1
0
0
0
0
vector len = |V|
(vocabulary size)
requirements
p1
p2
pV
…
p4
p5
p3
…
vector len = |V|
probability that the

context word is “are”
probability that the

context word is

“conditions”
e1 e2 e3 … … eL
vector len = L
(chosen by the user)
Word Embedding

• The nice property of word embedding is that vectors of
related words are closer than vector of unrelated words
requirement
constraint
dog
NOTE 1: the components of the vectors
here do not mean anything
NOTE 2: I have a vector for each word and
not for each sentence

Semantic Similarity with
Word Embeddings
• Given a sentence, I can produce the word embedding for
each word
• Word embeddings are vectors, so I can combine the word
embeddings with typical vector operations (e.g, average
vector) to represent requirements

• I can use the previous similarity measures (normally,
cosine similarity)

• More reﬁned measures exist (Word Mover Distance, Word
Centroid Similarity, etc.)

Semantic Similarity
import gensim
req_1 = "The system shall support user authentication"
req_2 = "The student shall enter login and password"
if __name__ == '__main__':
mdl = model = gensim.models.Word2Vec.load_word2vec_format(‘./model/ GoogleNews-vectors-
negative300.bin’, binary=True)
tok_req_1 = nltk.tokenize.word_tokenize(req_1)
vect_req_1 = [mdl[t] for t in tok_req_1 if t in mdl.wv.vocab]
v_req_1 = [sum(e)/len(e) for e in zip(*vect_req_1)]
print cosine_similarity(v_req_1, v_req_2)

Domain-speciﬁc Word
Embeddings
• Pre-trained word embeddings exist that are trained on
large amount of generic texts

• In requirements, the meaning highly depends on the
domain that I am considering

• Generic texts are diﬀerent from requirements, so word
embeddings may not represent the actual meaning
intended in the requirements

Domain-speciﬁc Meaning
and Relatedness
CODE
source
program
software
CODE
convention
dsm
identiﬁer
MACHINE
computation
instruction
MACHINE spirometer
oximeter
respirator

Wikipedia Crawling
word2vec
Domain-speciﬁc
Portals
Each portal includes all the Wiki pages
related to a certain domain
Domain-speciﬁc
word embeddings

Domain-specific
Requirements Similarity
req_1 = "The system shall support user authentication"
req_2 = "The student shall enter login and password"
if __name__ == '__main__':
mdl = Word2Vec.load(os.path.join(MODEL_PATH, "Computer_Science_D_2.bin"))
print cosine_similarity(v_req_1, v_req_2)
0.53966196
• I can use these domain-specific word embeddings  
to represent my domain-specific requirements

Domain-speciﬁc Word
Embeddings
• Let’s look at the neighbouring word vectors in the
diﬀerent domains
for MODEL_NAME in MODEL_LIST:
mdl = Word2Vec.load(os.path.join(MODEL_PATH, MODEL_NAME))
print MODEL_NAME[:-8], " ", mdl.wv.most_similar("code")
Sports [(u'rule', 0.7007173895835876), (u'regulation'), (u'definition'), (u’guideline’)…
Computer_Science [(u'compiled', 0.6769073605537415), (u'bytecode'), (u'executable'), (u’assembly’…
Medicine [(u'nomenclature', 0.7324844002723694), (u'listing'), (u'taxonomy'), (u’atc’),…
Computer_Science [(u'dbms', 0.7940356135368347), (u'rdbms'), (u'nosql'), (u’relational’)…
print MODEL_NAME[:-8], " ", mdl.wv.most_similar("database")
Literature [(u'internet', 0.9139115810394287), (u'web'), (u'streaming'), (u’librivox’)…
Mechanical_Engineering [(u'documentation', 0.8548465967178345), (u'online'), (u’internet’)…

Experience
Experimental Results – Crawled Documents
Table: Number of pages for each domain.
Domain Pages Words Vocabulary
Computer Science (CS) 10,000 3,985,740 104,907
Electronic Engineering (EEN) 8,568 4,576,917 100,272
Mechanical Engineering (MEN) 7,267 4,459,961 95,466
Medicine (MED) 10,000 5,470,284 150,617
Literature (LIT) 10,000 5,558,470 242,386
Sports (SPO) 10,000 5,725,688 165,814
Technical engineering domains have a more restricted vocabulary
A. Ferrari, et al. (ISTI-CNR) Domain-speciﬁc Ambiguities 17 / 26
Compute the potential for ambiguity
between different domains and Computer Science
Crawled Documents
Bring to the
same vector space
Compare domains
Ferrari, A., Donati, B., & Gnesi, S. (2017). Detecting Domain-specific Ambiguities: an NLP Approach based on
Wikipedia Crawling and Word Embeddings. In 2017 IEEE 25th International Requirements Engineering
Conference Workshops (REW) (pp. 393-399). IEEE.

“window” has a similar meaning
in CS and EEN, different in other domains
each line is associated to a domain
each radius is a term
if a point is close to the center, it means the meaning is very different

• GenSim (similarity, and working with embeddings): https://radimrehurek.com/gensim/

• Pre-trained word embeddings:

• from word2vec: https://code.google.com/archive/p/word2vec/

• from GloVE: http://nlp.stanford.edu/projects/glove/

• from fastText (2018, also multi-lingual): https://fasttext.cc

• Domain-speciﬁc word embeddings: https://github.com/alessioferrari/Domain-speciﬁc-ambiguity

• Further Readings:
• Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of
words and phrases and their compositionality. In Advances in neural information processing
systems (pp. 3111-3119).

• Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of
context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp.
238-247).

• http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Open Issues
(from the NLP4RE Workshop)

Data
Train supervised machine learning algorithms
Validate rule-based and unsupervised algorithms
Generalisation through different domains
Experiment replication
Requirements are conﬁdential
Annotations require domain-knowledge
Case studies are often the best option!

Better Data
Data quality impacts on performance
Linguistic quality (no grammatical errors)
Annotation quality (expert annotators)
No Bias! (annotate in advance)
Need for tools that learn on-the-job
Very hard to involve enough experts
Very hard to make them work without showing them a tool
Bad requirements are realistic!
Some tasks are inherently hard to perform in advance

Validation Metrics and
Workﬂows
We normally use information retrieval measures for RE tools
RE tasks are often composition of tasks (e.g., model generation)
Errors made by a tool can have different impacts, depending on the context
The context is given by the task, the process, the user
In general, it is safe to avoid false negatives…
Avoiding false negatives leads to false positives
Too many false positives means that the tool does not do its job
Try the tool in the ﬁeld!
Competitions!

Domain-specificity
Different domains speak different languages
Domain-adaptation is key
Domain-specific resources (ontologies) are needed
Different terms but also different business rules
Need to automate ontology-building
Domain-specific resources require
support from domain experts
Issues of tacit knowledge

Language Issues
Most of the available resources are in English
Most of the NLP research is for English
Requirements are written in different languages
Machine translation can be effective solely for certain tasks
(e.g., similarity)
Don’t forget rule-based techniques!

Human-in-the-loop
Clearly separate human and machine tasks
NLP tools do not replace humans
NLP tools empower domain experts
NLP tools cannot do everything
We need the support of domain experts
to build NLP tools for RE
Process changes when a tool is used
People tend to rely on the tool
Tools can have a learning effect

Players’ Cooperation
RE Researchers
Vendors of
Requirements
Management
Tools
NLP Researchers
Industry (Users)
Support for
hard NLP tasks
RE awareness
Provide pluggable
solutions*
Clarify NLP capabilities, principles
and needs (i.e., expert support)
*NLP technologies are GPL, RE tools are not!
Support for
scoping the discipline

NLP Technologies and
Resources
• Extract information from text: General Architecture for Text Engineering (GATE): https://gate.ac.uk

• Perform NLP ﬁne-grained analyses:

• Python Natural Language Toolkit (NLTK): https://www.nltk.org

• TextBlob (high-level API to NLTK): https://textblob.readthedocs.io/

• GenSim (for similarity): https://radimrehurek.com/gensim/

• Stanford CoreNLP (Java): https://stanfordnlp.github.io/CoreNLP/

• SpaCy (designed for speed): https://spacy.io

• Dive into machine learning and deep learning:
• WEKA (user-friendly, several algorithms): https://www.cs.waikato.ac.nz/ml/weka/
• TensorFlow: https://www.tensorﬂow.org

• Keras (high-level API to TensorFlow): https://keras.io

Selected Publications
(with trends based on my opinion)
 
Sultanov, H., & Hayes, J. H. (2013, July). Application of reinforcement learning to requirements engineering: requirements tracing. In Requirements Engineering
Conference (RE), 2013 21st IEEE International (pp. 52-61). IEEE.  
Gervasi, V., & Zowghi, D. (2014). Supporting traceability through affinity mining. In Requirements Engineering Conference (RE), 2014 IEEE 22nd International (pp.
143-152). IEEE.
Borg, M., Runeson, P., & Ardö, A. (2014). Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical
Software Engineering, 19(6), 1565-1616.
Mahmoud, A., & Nan N.. "On the role of semantics in automated requirements tracing." Requirements Engineering 20.3 (2015): 281-300.
Guo, J., Cheng, J., & Cleland-Huang, J. (2017). Semantically enhanced software traceability using deep learning techniques. In Proceedings of the 39th International
Conference on Software Engineering (pp. 3-14). IEEE Press.
Hübner, P., & Paech, B. (2018). Evaluation of Techniques to Detect Wrong Interaction Based Trace Links. In International Working Conference on Requirements
Engineering: Foundation for Software Quality (pp. 75-91). Springer, Cham.
Tracing
Ferrari, A., Dell’Orletta, F., Esuli, A., Gervasi, V., & Gnesi, S. (2017). Natural Language Requirements Processing: A 4D Vision. IEEE Software, 34(6), 28-35.
General Introduction
Casamayor, A., Godoy, D., & Campo, M. (2010). Identification of non-functional requirements in textual specifications: A semi-supervised learning approach.
Information and Software Technology, 52(4), 436-445.
Casamayor, A., Godoy, D., & Campo, M. (2012). Functional grouping of natural language requirements for assistance in architectural software design. Knowledge-
Based Systems, 30, 78-86.
Knauss, E., & Ott, D. (2014). (Semi-) automatic Categorization of Natural Language Requirements. In  
International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 39-54). Springer International Publishing.
Kurtanović, Z., & Maalej, W. (2017). Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning. In Requirements
Engineering Conference (RE), 2017 IEEE 25th International (pp. 490-495). IEEE.
Categorisation

Tjong, S. F., & Berry, D. M. (2013). The design of SREE – a prototype potential ambiguity ﬁnder for requirements speciﬁcations and lessons learned.

In International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 80-95). Springer Berlin Heidelberg.

Arora, C., Sabetzadeh, M., Briand, L., & Zimmer, F. (2015). Automated checking of conformance to requirements templates using natural language processing. IEEE
transactions on Software Engineering, 41(10), 944-968.

Femmer, H., Fernández, D. M., Wagner, S., & Eder, S. (2017). Rapid quality assurance with requirements smells.

Journal of Systems and Software, 123, 190-213.

Ferrari, A., Gori, G., Rosadini, B., Trotta, I., Bacherini, S., Fantechi, A., & Gnesi, S. (2018). Detecting requirements defects with NLP patterns: an industrial experience
in the railway domain. Empirical Software Engineering, 1-50.
Defect Detection
Falessi, D., Cantone, G., & Canfora, G. (2013). Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing
techniques. IEEE Transactions on Software Engineering, 39(1), 18-44.
Equivalent Requirements
Goldin, L., & Berry, D. M. (1997). AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. ASE, 4(4), 375-412.
Gacitua, R., Sawyer, P., & Gervasi, V. (2011). Relevance-based abstraction identification: technique and evaluation. Requirements Engineering, 16(3), 251.
Bakar, N. H., Kasirun, Z. M., & Salleh, N. (2015). Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic
literature review. Journal of Systems and Software, 106, 132-149.
Quirchmayr, T., Paech, B., Kohl, R., & Karey, H. (2017). Semi-automatic software feature-relevant information extraction from natural language user manuals. In
International Working Conference on Requirements Engineering: Foundation for Software Quality (pp. 255-272). Springer, Cham.
Glossary Extraction
Yue, T., Briand, L. C., & Labiche, Y. (2011). A systematic review of transformation approaches between user requirements and analysis models. Requirements
Engineering, 16(2), 75-99.
Yue, T., Briand, L. C., & Labiche, Y. (2015). aToucan: an automated framework to derive UML analysis models from use case models. ACM Transactions on Software
Engineering and Methodology (TOSEM), 24(3), 13.
Lucassen, G., Robeer, M., Dalpiaz, F., van der Werf, J. M. E., & Brinkkemper, S. (2017). Extracting conceptual models from user stories with Visual Narrator.
Requirements Engineering, 22(3), 339-358.
Model Synthesis

Chen, N., Lin, J., Hoi, S. C., Xiao, X., & Zhang, B. (2014). AR-miner: mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th
International Conference on Software Engineering (pp. 767-778). ACM.
Maalej, W., & Nabil, H. (2015, August). Bug report, feature request, or simply praise? on automatically classifying app reviews. In Requirements Engineering
Conference (RE), 2015 IEEE 23rd International (pp. 116-125). IEEE.
Guzman, E., Alkadhi, R., & Seyff, N. (2016). A needle in a haystack: What do twitter users say about software?. In Requirements Engineering Conference (RE),
2016 IEEE 24th International (pp. 96-105). IEEE.
Maalej, W., Nayebi, M., Johann, T., & Ruhe, G. (2016). Toward data-driven requirements engineering. IEEE Software, 33(1), 48-54.
Martin, W., Sarro, F., Jia, Y., Zhang, Y., & Harman, M. (2017). A survey of app store analysis for software engineering. IEEE transactions on software engineering, 43(9),
817-847.
Groen, E. C., Seyff, N., Ali, R., Dalpiaz, F., Doerr, J., Guzman, E., ... & Stade, M. (2017). The crowd in requirements engineering: The landscape and challenges.
IEEE software, 34(2), 44-52.
Users’ Feedback Analysis
Breaux, T. D., Vail, M. W., & Anton, A. I. (2006). Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In
Requirements Engineering, 14th IEEE International Conference (pp. 49-58). IEEE.
Cleland-Huang, J., Czauderna, A., Gibiec, M., & Emenecker, J. (2010). A machine learning approach for tracing regulatory codes to product specific requirements. In
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 (pp. 155-164). ACM.
Massey, A. K., Rutledge, R. L., Antón, A. I., & Swire, P. P. (2014). Identifying and classifying ambiguity for regulatory requirements. In Requirements Engineering
Conference (RE), 2014 IEEE 22nd International (pp. 83-92). IEEE.
Hosseini, M. B., Breaux, T. D., & Niu, J. (2018). Inferring Ontology Fragments from Semantic Role Typing of Lexical Variants. In International Working Conference on
Requirements Engineering: Foundation for Software Quality (pp. 39-56). Springer, Cham.
Regulatory Compliance

Natt och Dag, J., Gervasi, V., Brinkkemper, S., & Regnell, B. (2004). Speeding up requirements management in a product software company: Linking customer
wishes to product requirements through linguistic engineering. In Requirements Engineering Conference, 2004. Proceedings. 12th IEEE International (pp. 283-294).
IEEE.
Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., & Mirakhorli, M. (2011). On-demand feature recommendations derived from
mining public product descriptions. In Proceedings of the 33rd International Conference on Software Engineering (pp. 181-190). ACM.
Retrieval
Berry, D., Gacitua, R., Sawyer, P., & Tjong, S. F. (2012). The case for dumb requirements engineering tools. In International Working Conference on Requirements
Engineering: Foundation for Software Quality (pp. 211-217). Springer, Berlin, Heidelberg.
Berry, D. M. (2017). Evaluation of Tools for Hairy Requirements and Software Engineering Tasks. In 2017 IEEE 25th International Requirements Engineering
Conference Workshops (REW) (pp. 284-291). IEEE.
Berry, D. M., Cleland-Huang, J., Ferrari, A., Maalej, W., Mylopoulos, J., & Zowghi, D. (2017). Panel: Context-Dependent Evaluation of Tools for NL RE Tasks: Recall
vs. Precision, and Beyond. In Requirements Engineering Conference (RE), 2017 IEEE 25th International (pp. 570-573). IEEE.
Tool Evaluation

Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview

Similar to Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview (20)

More from alessio_ferrari

More from alessio_ferrari (7)

Recently uploaded

Recently uploaded (20)

Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview