Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation

.lusoftware veriﬁcation & validation
VVS
Extracting Domain Models from
Natural-Language Requirements:
Approach and Industrial Evaluation
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer
Interdisciplinary Centre for Security, Reliability and Trust
Luxembourg
October 6, 2016

Satellite
Satellite
Ground Station
Satellite
S&T Station
Feeder Link
Ground Station
Data Ground
Station
transfers user
requests to1 1
Satellite
Control Centre
- location
Domain Models
A domain model is a visual representation of conceptual
entities or real-world objects in a domain of interest.
2
Concepts
Aggregations
Associations
Generalizations
Attributes

Context
3
Requirements  
Analysts
NL Requirements
Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain  
Model
Specify
Requirements
Ideal
Build  
Domain Model

Context
4
Requirements  
Analysts
NL Requirements
Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain  
Model
Build Domain
Model
Practice
Specify
Requirements

Problem Deﬁnition
• Manually building domain models is laborious
• Automated support is required for building domain models
5

State of the Art
• Multiple approaches exist for extracting domain models or
similar variants from requirements using extraction rules
• Majority assume speciﬁc structure, e.g., restricted NL
• Extraction of direct relations (mostly) at the level of words
• Little empirical insights on real requirements
6

Concepts
The simulator shall maintain the scheduled sessions,
the active session and also the list of sessions that
have been already handled.
8
• All subjects in the requirements are concepts
• All recurring nouns in the requirements are concepts
Noun PhrasesFiltering

Associations
The simulator shall maintain the scheduled sessions,
the active session and also the list of sessions that
have been already handled.
9
Simulator
Scheduled
Session
maintain
1 *
Direct
Relations
All transitive verbs are associations.

Aggregations
The library contains books.
10
Library Book
Speciﬁc
Patterns
Requirements with patterns,“contains”, “made up of”,  
“include”, […]

Attributes
Book’s title
11
• Genitive cases, NP’s NP, NP of NP
Book
-title
Aggregations
???
System Component
system’s component

Approach
14
Process
Requirements
Statements
Lift
Dependencies to
Semantic Units
Construct  
Domain Model
NL  
Requirements
Phrasal
Structure
Dependencies Phrase-level  
Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain  
Model
Extraction  
Rules

Grammatical Dependencies
15
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Conﬁguration
initalize

Lift Dependencies to Semantic Units
16
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Conﬁguration
initalize
System
Operator
Simulator
Conﬁguration
initalize
nsubj dobj

Approach
17
Process
Requirements
Statements
Lift
Dependencies to
Semantic Units
Construct  
Domain Model
NL  
Requirements
Phrasal
Structure
Dependencies Phrase-level  
Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain  
Model
Extraction  
Rules

System
Operator
????
able
New Rule - N1
The system operator shall be able to initialize the  
simulator configuration, and to edit the existing configuration.
18
System 
Operator
Simulator
Conﬁguration
intialize
System 
Operator
Existing
Conﬁguration
edit

Link Paths
19
The simulator shall send log messages to the
database via the monitoring interface.
Simulator Log Message
send
Simulator Database
send log message 
to
Simulator
Monitoring
Interface
send log message 
to database via
IR-Domain

RQ1: How frequently are different
extraction rules triggered?
21
3
case
studies
1
case
study380 Requirements
158 Requirements
138 Requirements
110 Requirements

RQ1: How frequently are different
extraction rules triggered?
• Pattern based rules were never/seldom triggered
• Generic rules triggered most often, e.g.,
transitive verbs, genitive cases, and  
all new rules (including link paths)
22

RQ2: How useful is our
approach?
23
1
case
study
50 Requirements
213 Relations
• Interview Survey
• Correctness and Relevance
of each relation
• Missing relations in each
requirement

Correctness (%) - 90% (avg.)
Correctness
Existing Rules New Rules
SpeciﬁcPattern
E1 E2 E3 E4 E5 E6 N1 N2 N3 LP

Observed reasons for
Incorrectness
25
• NLP Mistakes
• Wrong Relation Extracted

Relevance (%) - 36% (avg.)
Relevance
SpeciﬁcPattern
SpeciﬁcPattern
Existing Rules New Rules
E1 E2 E3 E4 E5 E6 N1 N2 N3 LP

Observed Reasons for
Irrelevance
TCP/IP
Protocol
27
SNMP
Common
Knowledge

Missing Relations
8% Relations Missed
92% Relevant Relation Extracted
28

Conclusion
• Our extensions are of practical signiﬁcance for domain model
extraction
• Empirical evaluation (in industrial settings) - provides
insights into the usefulness of existing rules
• An important observation about the automated model
extractors — relevance
• Future work - Look into ways to improve relevance
29

Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation

Similar to Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation (20)

More from Lionel Briand

More from Lionel Briand (20)

Recently uploaded

Recently uploaded (20)

Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation