.lusoftware verification & validation
VVS
Extracting Domain Models from
Natural-Language Requirements:
Approach and Industrial Evaluation
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer
Interdisciplinary Centre for Security, Reliability and Trust
Luxembourg
October 6, 2016
Satellite
Satellite
Ground Station
Satellite
S&T Station
Feeder Link
Ground Station
Data Ground
Station
transfers user
requests to1 1
Satellite
Control Centre
- location
Domain Models
A domain model is a visual representation of conceptual
entities or real-world objects in a domain of interest.
2
Concepts
Aggregations
Associations
Generalizations
Attributes
Context
3
Requirements 

Analysts
NL Requirements
Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain 

Model
Specify
Requirements
Ideal
Build 

Domain Model
Context
4
Requirements 

Analysts
NL Requirements
Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain 

Model
Build Domain
Model
Practice
Specify
Requirements
Problem Definition
• Manually building domain models is laborious
• Automated support is required for building domain models
5
State of the Art
• Multiple approaches exist for extracting domain models or
similar variants from requirements using extraction rules
• Majority assume specific structure, e.g., restricted NL
• Extraction of direct relations (mostly) at the level of words
• Little empirical insights on real requirements
6
Existing Extraction Rules
Concepts
The simulator shall maintain the scheduled sessions,
the active session and also the list of sessions that
have been already handled.
8
• All subjects in the requirements are concepts
• All recurring nouns in the requirements are concepts
Noun PhrasesFiltering
Associations
The simulator shall maintain the scheduled sessions,
the active session and also the list of sessions that
have been already handled.
9
Simulator
Scheduled
Session
maintain
1 *
Direct
Relations
All transitive verbs are associations.
Aggregations
The library contains books.
10
Library Book
Specific
Patterns
Requirements with patterns,“contains”, “made up of”, 

“include”, […]
Attributes
Book’s title
11
• Genitive cases, NP’s NP, NP of NP
Book
-title
Aggregations
???
System Component
system’s component
12
Are they even useful?
Our Contributions
Approach
14
Process
Requirements
Statements
Lift
Dependencies to
Semantic Units
Construct 

Domain Model
NL 

Requirements
Phrasal
Structure
Dependencies Phrase-level 

Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain 

Model
Extraction 

Rules
Grammatical Dependencies
15
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Configuration
initalize
Lift Dependencies to Semantic Units
16
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Configuration
initalize
System
Operator
Simulator
Configuration
initalize
nsubj dobj
Approach
17
Process
Requirements
Statements
Lift
Dependencies to
Semantic Units
Construct 

Domain Model
NL 

Requirements
Phrasal
Structure
Dependencies Phrase-level 

Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain 

Model
Extraction 

Rules
System
Operator
????
able
New Rule - N1
The system operator shall be able to initialize the 

simulator configuration, and to edit the existing configuration.
18
System

Operator
Simulator
Configuration
intialize
System

Operator
Existing
Configuration
edit
Link Paths
19
The simulator shall send log messages to the
database via the monitoring interface.
Simulator Log Message
send
Simulator Database
send log message

to
Simulator
Monitoring
Interface
send log message

to database via
IR-Domain
20
RQ1: How frequently are different
extraction rules triggered?
21
3
case
studies
1
case
study380 Requirements
158 Requirements
138 Requirements
110 Requirements
RQ1: How frequently are different
extraction rules triggered?
• Pattern based rules were never/seldom triggered
• Generic rules triggered most often, e.g.,
transitive verbs, genitive cases, and 

all new rules (including link paths)
22
RQ2: How useful is our
approach?
23
1
case
study
50 Requirements
213 Relations
• Interview Survey
• Correctness and Relevance
of each relation
• Missing relations in each
requirement
Correctness (%) - 90% (avg.)
Correctness
Existing Rules New Rules
SpecificPattern
E1 E2 E3 E4 E5 E6 N1 N2 N3 LP
Observed reasons for
Incorrectness
25
• NLP Mistakes
• Wrong Relation Extracted
Relevance (%) - 36% (avg.)
Relevance
SpecificPattern
SpecificPattern
Existing Rules New Rules
E1 E2 E3 E4 E5 E6 N1 N2 N3 LP
Observed Reasons for
Irrelevance
TCP/IP
Protocol
27
SNMP
Common
Knowledge
Missing Relations
8% Relations Missed
92% Relevant Relation Extracted
28
Conclusion
• Our extensions are of practical significance for domain model
extraction
• Empirical evaluation (in industrial settings) - provides
insights into the usefulness of existing rules
• An important observation about the automated model
extractors — relevance
• Future work - Look into ways to improve relevance
29

Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation

  • 1.
    .lusoftware verification &validation VVS Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer Interdisciplinary Centre for Security, Reliability and Trust Luxembourg October 6, 2016
  • 2.
    Satellite Satellite Ground Station Satellite S&T Station FeederLink Ground Station Data Ground Station transfers user requests to1 1 Satellite Control Centre - location Domain Models A domain model is a visual representation of conceptual entities or real-world objects in a domain of interest. 2 Concepts Aggregations Associations Generalizations Attributes
  • 3.
    Context 3 Requirements 
 Analysts NL Requirements Document ClassA Class B Class C Class D 1 * Relation Domain 
 Model Specify Requirements Ideal Build 
 Domain Model
  • 4.
    Context 4 Requirements 
 Analysts NL Requirements Document ClassA Class B Class C Class D 1 * Relation Domain 
 Model Build Domain Model Practice Specify Requirements
  • 5.
    Problem Definition • Manuallybuilding domain models is laborious • Automated support is required for building domain models 5
  • 6.
    State of theArt • Multiple approaches exist for extracting domain models or similar variants from requirements using extraction rules • Majority assume specific structure, e.g., restricted NL • Extraction of direct relations (mostly) at the level of words • Little empirical insights on real requirements 6
  • 7.
  • 8.
    Concepts The simulator shallmaintain the scheduled sessions, the active session and also the list of sessions that have been already handled. 8 • All subjects in the requirements are concepts • All recurring nouns in the requirements are concepts Noun PhrasesFiltering
  • 9.
    Associations The simulator shallmaintain the scheduled sessions, the active session and also the list of sessions that have been already handled. 9 Simulator Scheduled Session maintain 1 * Direct Relations All transitive verbs are associations.
  • 10.
    Aggregations The library containsbooks. 10 Library Book Specific Patterns Requirements with patterns,“contains”, “made up of”, 
 “include”, […]
  • 11.
    Attributes Book’s title 11 • Genitivecases, NP’s NP, NP of NP Book -title Aggregations ??? System Component system’s component
  • 12.
  • 13.
  • 14.
    Approach 14 Process Requirements Statements Lift Dependencies to Semantic Units Construct
 Domain Model NL 
 Requirements Phrasal Structure Dependencies Phrase-level 
 Dependencies Class A Class B Class C Class D 1 * Relation Domain 
 Model Extraction 
 Rules
  • 15.
    Grammatical Dependencies 15 The systemoperator shall initialize the simulator configuration. nsubj dobj Operator Configuration initalize
  • 16.
    Lift Dependencies toSemantic Units 16 The system operator shall initialize the simulator configuration. nsubj dobj Operator Configuration initalize System Operator Simulator Configuration initalize nsubj dobj
  • 17.
    Approach 17 Process Requirements Statements Lift Dependencies to Semantic Units Construct
 Domain Model NL 
 Requirements Phrasal Structure Dependencies Phrase-level 
 Dependencies Class A Class B Class C Class D 1 * Relation Domain 
 Model Extraction 
 Rules
  • 18.
    System Operator ???? able New Rule -N1 The system operator shall be able to initialize the 
 simulator configuration, and to edit the existing configuration. 18 System
 Operator Simulator Configuration intialize System
 Operator Existing Configuration edit
  • 19.
    Link Paths 19 The simulatorshall send log messages to the database via the monitoring interface. Simulator Log Message send Simulator Database send log message
 to Simulator Monitoring Interface send log message
 to database via IR-Domain
  • 20.
  • 21.
    RQ1: How frequentlyare different extraction rules triggered? 21 3 case studies 1 case study380 Requirements 158 Requirements 138 Requirements 110 Requirements
  • 22.
    RQ1: How frequentlyare different extraction rules triggered? • Pattern based rules were never/seldom triggered • Generic rules triggered most often, e.g., transitive verbs, genitive cases, and 
 all new rules (including link paths) 22
  • 23.
    RQ2: How usefulis our approach? 23 1 case study 50 Requirements 213 Relations • Interview Survey • Correctness and Relevance of each relation • Missing relations in each requirement
  • 24.
    Correctness (%) -90% (avg.) Correctness Existing Rules New Rules SpecificPattern E1 E2 E3 E4 E5 E6 N1 N2 N3 LP
  • 25.
    Observed reasons for Incorrectness 25 •NLP Mistakes • Wrong Relation Extracted
  • 26.
    Relevance (%) -36% (avg.) Relevance SpecificPattern SpecificPattern Existing Rules New Rules E1 E2 E3 E4 E5 E6 N1 N2 N3 LP
  • 27.
  • 28.
    Missing Relations 8% RelationsMissed 92% Relevant Relation Extracted 28
  • 29.
    Conclusion • Our extensionsare of practical significance for domain model extraction • Empirical evaluation (in industrial settings) - provides insights into the usefulness of existing rules • An important observation about the automated model extractors — relevance • Future work - Look into ways to improve relevance 29