Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements

.lu
software verification & validation
V
V
S
Using Domain-specific Corpora for
Improved Handling of Ambiguity in
Requirements
Saad Ezzini, Sallam Abualhaija, Chetan Arora,
Mehrdad Sabetzadeh*, Lionel Briand*
{saad.ezzini, sallam.abualhaija}@uni.lu
University of Luxembourg, Luxembourg
Also with *University of Ottawa, Canada
May 25, 2021

Text: “Display categorized instructions and documentation”
Reader 1 Reader 2
Readers:
I don’t get it!
This is
ambiguous.
Acknowledged (referred to as ACK)
Coordination Ambiguity (CA)
3

My interpretation:
Reader 2
Text: “Categorize images with tags”
My interpretation:
Unacknowledged (referred to as UNACK)
Prepositional-phrase Attachment Ambiguity (PAA)
4
Reader 1
Readers:

Motivation
• Ambiguity in natural-language requirements can lead to
misunderstandings and inconsistencies
• Requirements use domain-specific vocabulary
• Coordination Ambiguity (CA) and Prepositional-Phrase
Attachment Ambiguity (PAA) are prevalent in requirements
• PAA is underexplored
5

Existing Work
6
Papers
Ambiguity
type
Solution
Domain-
specific
corpus
Evaluation
of UNACK
ambiguity
RE
Ferrari and Esuli, 2019 (ASE’19)
Toews and Hollan, 2019 (REFSQW’19)
Jain et al., 2020 (REFSQW’20)
Lexical Detection Yes No
Yang et al., 2010 (ASE’10)
CA Detection No No
NLP
Chantree et al., 2005 (RANLP’05)
DeRoeck, 2007 (RANLP’07)
Agiree et al., 2008 (ACL’08)
Calvo and Gelbukh, 2003 (CIARP’03)
Pantel and Lin, 2000 (ACL’00)
PAA Interpretation No No
Nakov and Hearst (HLT’05)
CA
& PAA
Interpretation No No
Our Work (ICSE’21)
CA
& PAA
Detection &
Interpretation
Yes Yes

• Detection of ambiguous
requirements
• Automated text
interpretations
Contributions
7
• Standalone Domain-
specific corpus generation
method
• Fully automated
• No labelled data is needed
Significant Improvement
in accuracy
+33% in detection &
+16% in interpretation
Detecting ~90% of the
UNACK ambiguity
shorturl.at/bxyHU

Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Overview
9

The satellite-navigation
system will provide the
accuracy monitoring
necessary for civil
navigation.
Domain-specific Corpus Generation
satellite-
navigation
system
Wikipedia
Category
Sub-categories
Neighboring
Categories
Matching article
Satellite navigation
Automated
navigation systems
Geocaching
Radio
navigation
Satellite
10
Requirements
Document

Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Pattern Matching
11
• A total of 39 structural patterns: 27 are collected from the NLP
and RE literature, and 12 are enhanced
• Examples:
o “LEO (noun) satellites (noun) and (conjugation) terminals (noun)”
o “categorize (verb) outages (noun) with (preposition) standard
(adjective) discrete (adjective) tags (noun)”

Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Application of Heuristics
• Two heuristics are novel, and eight are consolidated and
optimized from the existing NLP & RE literature(Chantree et al., 2005, Kilgarriff, 2003,
Yang et al., 2010, Okumura and Muraki, 1994, Agirre et al., 2008, Calvo and Gelbukh, 2003)
12
Type CA PAA
Corpus-based
Coordination frequency
Preposition co-occurrence
frequency
Collocation frequency
Prepositional-phrase
co-occurrence frequency
Semantics-based
Distributional similarity
Semantic-class enrichment
Semantic similarity
Syntax-based Coordination and pp-attachment syntactic analysis
Morphology-based Suffix matching -

Examples on Corpus-based
Heuristics
13
CA
The frequency of the modifier occurring with the closest conjunct
(1) “project manager and designer”
(2) “technical configuration and installation”
PAA
The frequency of the preposition occurring with the preceding verb
versus with the preceding noun
(1) “provide the user with a valid option”
(2) “maximize the resilience of the system”

Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-specific
Corpus Generation
Final Output
Requirements
Document
Wikipedia
Articles
Pattern list
Wordnet
Ambiguity Handling
14
Input
That match
patterns
Are not interpretable
Coordination &
pp-attachment
phrases
That don’t match
patterns
Are interpretable
AND
OR

Document collection
16
• We evaluate our approach on 20 RDs with
~5000 requirements from 7 diverse domains
• ~25% of the requirements contain coordination or pp-
attachment phrases
• Our dataset was annotated by two external annotators
• ~62% of the phrases deemed ambiguous have
unacknowledged ambiguity

RQ1. What configuration of our approach yields the
most accurate results for ambiguity handling?
17
Corpus Patterns Heuristics
CAA PAA
P (%) R (%) P (%) R (%)
Domain-Specific
Corpus Collected
+
enhanced
optimized
79.8 87.9 80.3 90.1
British National
Corpus (Baseline)
51.6 63.8 53.3 59.6

18
RQ2. How effective is our approach at detecting
unacknowledged ambiguity?
• Comparing the phrases deemed ambiguous by our approach
against the phrases that had disagreements on the
interpretation in our ground truth
• Our approach can detect about 87% of the CA phrases and
91% of the PAA phrases that have unacknowledged ambiguity

Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements

Recommended

Recommended

More Related Content

Similar to Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements

Similar to Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements (20)

More from Lionel Briand

More from Lionel Briand (20)

Recently uploaded

Recently uploaded (20)

Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements