SlideShare a Scribd company logo
.lu
software verification & validation
V
V
S
Using Domain-specific Corpora for
Improved Handling of Ambiguity in
Requirements
Saad Ezzini, Sallam Abualhaija, Chetan Arora,
Mehrdad Sabetzadeh*, Lionel Briand*
{saad.ezzini, sallam.abualhaija}@uni.lu
University of Luxembourg, Luxembourg
Also with *University of Ottawa, Canada
May 25, 2021
Introduction
Text: “Display categorized instructions and documentation”
Reader 1 Reader 2
Readers:
I don’t get it!
This is
ambiguous.
Acknowledged (referred to as ACK)
Coordination Ambiguity (CA)
3
My interpretation:
Reader 2
Text: “Categorize images with tags”
My interpretation:
Unacknowledged (referred to as UNACK)
Prepositional-phrase Attachment Ambiguity (PAA)
4
Reader 1
Readers:
Motivation
• Ambiguity in natural-language requirements can lead to
misunderstandings and inconsistencies
• Requirements use domain-specific vocabulary
• Coordination Ambiguity (CA) and Prepositional-Phrase
Attachment Ambiguity (PAA) are prevalent in requirements
• PAA is underexplored
5
Existing Work
6
Papers
Ambiguity
type
Solution
Domain-
specific
corpus
Evaluation
of UNACK
ambiguity
RE
Ferrari and Esuli, 2019 (ASE’19)
Toews and Hollan, 2019 (REFSQW’19)
Jain et al., 2020 (REFSQW’20)
Lexical Detection Yes No
Yang et al., 2010 (ASE’10)
CA Detection No No
NLP
Chantree et al., 2005 (RANLP’05)
DeRoeck, 2007 (RANLP’07)
Agiree et al., 2008 (ACL’08)
Calvo and Gelbukh, 2003 (CIARP’03)
Pantel and Lin, 2000 (ACL’00)
PAA Interpretation No No
Nakov and Hearst (HLT’05)
CA
& PAA
Interpretation No No
Our Work (ICSE’21)
CA
& PAA
Detection &
Interpretation
Yes Yes
• Detection of ambiguous
requirements
• Automated text
interpretations
Contributions
7
• Standalone Domain-
specific corpus generation
method
• Fully automated
• No labelled data is needed
Significant Improvement
in accuracy
+33% in detection &
+16% in interpretation
Detecting ~90% of the
UNACK ambiguity
shorturl.at/bxyHU
Approach
Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Overview
9
The satellite-navigation
system will provide the
accuracy monitoring
necessary for civil
navigation.
Domain-specific Corpus Generation
satellite-
navigation
system
Wikipedia
Category
Sub-categories
Neighboring
Categories
Matching article
Satellite navigation
Automated
navigation systems
Geocaching
Radio
navigation
Satellite
10
Requirements
Document
Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Pattern Matching
11
• A total of 39 structural patterns: 27 are collected from the NLP
and RE literature, and 12 are enhanced
• Examples:
o “LEO (noun) satellites (noun) and (conjugation) terminals (noun)”
o “categorize (verb) outages (noun) with (preposition) standard
(adjective) discrete (adjective) tags (noun)”
Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-
specific Corpus
Generation
Final Output
Requirements
Document
Wikipedia
Articles
Coordination &
pp-attachment
phrases
phrases that
match patterns
Pattern list
Wordnet
phrases with
interpretations
Application of Heuristics
• Two heuristics are novel, and eight are consolidated and
optimized from the existing NLP & RE literature(Chantree et al., 2005, Kilgarriff, 2003,
Yang et al., 2010, Okumura and Muraki, 1994, Agirre et al., 2008, Calvo and Gelbukh, 2003)
12
Type CA PAA
Corpus-based
Coordination frequency
Preposition co-occurrence
frequency
Collocation frequency
Prepositional-phrase
co-occurrence frequency
Semantics-based
Distributional similarity
Semantic-class enrichment
Semantic similarity
Syntax-based Coordination and pp-attachment syntactic analysis
Morphology-based Suffix matching -
Examples on Corpus-based
Heuristics
13
CA
The frequency of the modifier occurring with the closest conjunct
(1) “project manager and designer”
(2) “technical configuration and installation”
PAA
The frequency of the preposition occurring with the preceding verb
versus with the preceding noun
(1) “provide the user with a valid option”
(2) “maximize the resilience of the system”
Ambiguous
Unambiguous
Preprocessing
Pattern
Matching
Application
of Heuristics
Ambiguity
Handling
Domain-specific
Corpus Generation
Final Output
Requirements
Document
Wikipedia
Articles
Pattern list
Wordnet
Ambiguity Handling
14
Input
That match
patterns
Are not interpretable
Coordination &
pp-attachment
phrases
That don’t match
patterns
Are interpretable
AND
OR
Empirical Evaluation
Document collection
16
• We evaluate our approach on 20 RDs with
~5000 requirements from 7 diverse domains
• ~25% of the requirements contain coordination or pp-
attachment phrases
• Our dataset was annotated by two external annotators
• ~62% of the phrases deemed ambiguous have
unacknowledged ambiguity
RQ1. What configuration of our approach yields the
most accurate results for ambiguity handling?
17
Corpus Patterns Heuristics
CAA PAA
P (%) R (%) P (%) R (%)
Domain-Specific
Corpus Collected
+
enhanced
optimized
79.8 87.9 80.3 90.1
British National
Corpus (Baseline)
51.6 63.8 53.3 59.6
18
RQ2. How effective is our approach at detecting
unacknowledged ambiguity?
• Comparing the phrases deemed ambiguous by our approach
against the phrases that had disagreements on the
interpretation in our ground truth
• Our approach can detect about 87% of the CA phrases and
91% of the PAA phrases that have unacknowledged ambiguity
Conclusion
19

More Related Content

Similar to Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements

Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
Ioannis Stavrakantonakis
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
Ashis Kumar Chanda
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
María Poveda Villalón
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
Sebastian Ruder
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
Eran Chinthaka Withana
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
Guus Schreiber
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetAndrea Nuzzolese
 
Using Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR SystemsUsing Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR Systems
Mauro Dragoni
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
Cassandra Jacobs
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
Andre Freitas
 
KBART Update ALA Annual 2008
KBART Update ALA Annual 2008KBART Update ALA Annual 2008
KBART Update ALA Annual 2008Jason Price, PhD
 
Word sense disambiguation and lexical chains construction using wordnet
Word sense disambiguation and lexical chains construction using wordnetWord sense disambiguation and lexical chains construction using wordnet
Word sense disambiguation and lexical chains construction using wordnet
University Politehnica Bucharest
 
Powering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
Powering NLU Engine with Apache Spark to Communicate with World with Rahul KumarPowering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
Powering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
Databricks
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
singhg77
 
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
eMadrid network
 
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
Oscar Corcho
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning Issues
Michael Petychakis
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 

Similar to Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements (20)

Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
 
Using Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR SystemsUsing Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR Systems
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
KBART Update ALA Annual 2008
KBART Update ALA Annual 2008KBART Update ALA Annual 2008
KBART Update ALA Annual 2008
 
Word sense disambiguation and lexical chains construction using wordnet
Word sense disambiguation and lexical chains construction using wordnetWord sense disambiguation and lexical chains construction using wordnet
Word sense disambiguation and lexical chains construction using wordnet
 
Powering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
Powering NLU Engine with Apache Spark to Communicate with World with Rahul KumarPowering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
Powering NLU Engine with Apache Spark to Communicate with World with Rahul Kumar
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
 
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
 
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning Issues
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 

More from Lionel Briand

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
Lionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
Lionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Lionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
Lionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
Lionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Lionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Lionel Briand
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
Lionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Lionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
Lionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
Lionel Briand
 

More from Lionel Briand (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 

Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements

  • 1. .lu software verification & validation V V S Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh*, Lionel Briand* {saad.ezzini, sallam.abualhaija}@uni.lu University of Luxembourg, Luxembourg Also with *University of Ottawa, Canada May 25, 2021
  • 3. Text: “Display categorized instructions and documentation” Reader 1 Reader 2 Readers: I don’t get it! This is ambiguous. Acknowledged (referred to as ACK) Coordination Ambiguity (CA) 3
  • 4. My interpretation: Reader 2 Text: “Categorize images with tags” My interpretation: Unacknowledged (referred to as UNACK) Prepositional-phrase Attachment Ambiguity (PAA) 4 Reader 1 Readers:
  • 5. Motivation • Ambiguity in natural-language requirements can lead to misunderstandings and inconsistencies • Requirements use domain-specific vocabulary • Coordination Ambiguity (CA) and Prepositional-Phrase Attachment Ambiguity (PAA) are prevalent in requirements • PAA is underexplored 5
  • 6. Existing Work 6 Papers Ambiguity type Solution Domain- specific corpus Evaluation of UNACK ambiguity RE Ferrari and Esuli, 2019 (ASE’19) Toews and Hollan, 2019 (REFSQW’19) Jain et al., 2020 (REFSQW’20) Lexical Detection Yes No Yang et al., 2010 (ASE’10) CA Detection No No NLP Chantree et al., 2005 (RANLP’05) DeRoeck, 2007 (RANLP’07) Agiree et al., 2008 (ACL’08) Calvo and Gelbukh, 2003 (CIARP’03) Pantel and Lin, 2000 (ACL’00) PAA Interpretation No No Nakov and Hearst (HLT’05) CA & PAA Interpretation No No Our Work (ICSE’21) CA & PAA Detection & Interpretation Yes Yes
  • 7. • Detection of ambiguous requirements • Automated text interpretations Contributions 7 • Standalone Domain- specific corpus generation method • Fully automated • No labelled data is needed Significant Improvement in accuracy +33% in detection & +16% in interpretation Detecting ~90% of the UNACK ambiguity shorturl.at/bxyHU
  • 9. Ambiguous Unambiguous Preprocessing Pattern Matching Application of Heuristics Ambiguity Handling Domain- specific Corpus Generation Final Output Requirements Document Wikipedia Articles Coordination & pp-attachment phrases phrases that match patterns Pattern list Wordnet phrases with interpretations Overview 9
  • 10. The satellite-navigation system will provide the accuracy monitoring necessary for civil navigation. Domain-specific Corpus Generation satellite- navigation system Wikipedia Category Sub-categories Neighboring Categories Matching article Satellite navigation Automated navigation systems Geocaching Radio navigation Satellite 10 Requirements Document
  • 11. Ambiguous Unambiguous Preprocessing Pattern Matching Application of Heuristics Ambiguity Handling Domain- specific Corpus Generation Final Output Requirements Document Wikipedia Articles Coordination & pp-attachment phrases phrases that match patterns Pattern list Wordnet phrases with interpretations Pattern Matching 11 • A total of 39 structural patterns: 27 are collected from the NLP and RE literature, and 12 are enhanced • Examples: o “LEO (noun) satellites (noun) and (conjugation) terminals (noun)” o “categorize (verb) outages (noun) with (preposition) standard (adjective) discrete (adjective) tags (noun)”
  • 12. Ambiguous Unambiguous Preprocessing Pattern Matching Application of Heuristics Ambiguity Handling Domain- specific Corpus Generation Final Output Requirements Document Wikipedia Articles Coordination & pp-attachment phrases phrases that match patterns Pattern list Wordnet phrases with interpretations Application of Heuristics • Two heuristics are novel, and eight are consolidated and optimized from the existing NLP & RE literature(Chantree et al., 2005, Kilgarriff, 2003, Yang et al., 2010, Okumura and Muraki, 1994, Agirre et al., 2008, Calvo and Gelbukh, 2003) 12 Type CA PAA Corpus-based Coordination frequency Preposition co-occurrence frequency Collocation frequency Prepositional-phrase co-occurrence frequency Semantics-based Distributional similarity Semantic-class enrichment Semantic similarity Syntax-based Coordination and pp-attachment syntactic analysis Morphology-based Suffix matching -
  • 13. Examples on Corpus-based Heuristics 13 CA The frequency of the modifier occurring with the closest conjunct (1) “project manager and designer” (2) “technical configuration and installation” PAA The frequency of the preposition occurring with the preceding verb versus with the preceding noun (1) “provide the user with a valid option” (2) “maximize the resilience of the system”
  • 14. Ambiguous Unambiguous Preprocessing Pattern Matching Application of Heuristics Ambiguity Handling Domain-specific Corpus Generation Final Output Requirements Document Wikipedia Articles Pattern list Wordnet Ambiguity Handling 14 Input That match patterns Are not interpretable Coordination & pp-attachment phrases That don’t match patterns Are interpretable AND OR
  • 16. Document collection 16 • We evaluate our approach on 20 RDs with ~5000 requirements from 7 diverse domains • ~25% of the requirements contain coordination or pp- attachment phrases • Our dataset was annotated by two external annotators • ~62% of the phrases deemed ambiguous have unacknowledged ambiguity
  • 17. RQ1. What configuration of our approach yields the most accurate results for ambiguity handling? 17 Corpus Patterns Heuristics CAA PAA P (%) R (%) P (%) R (%) Domain-Specific Corpus Collected + enhanced optimized 79.8 87.9 80.3 90.1 British National Corpus (Baseline) 51.6 63.8 53.3 59.6
  • 18. 18 RQ2. How effective is our approach at detecting unacknowledged ambiguity? • Comparing the phrases deemed ambiguous by our approach against the phrases that had disagreements on the interpretation in our ground truth • Our approach can detect about 87% of the CA phrases and 91% of the PAA phrases that have unacknowledged ambiguity