Wcre11b.ppt

Requirements Traceability for Object OrientedRequirements Traceability for Object Oriented
Systems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source Code
WCRE 2011, Limerick, IrelandWCRE 2011, Limerick, Ireland
Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

Requirements Traceability
Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and backwards
direction”direction” [Gotel, 1994]
2WCRE 2011

What’s Requirements Traceability Good For?
Program Comprehension
Discover what code must change to handle a
new requirementnew requirement
Aid in determining whether a specification is
completely implemented
3WCRE 2011

IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4WCRE 2011

Problem in IR-based Approaches
Requirement
5WCRE 2011

Goal
• Reduce manual effort required to verify false-
positive links
• Increase F-measure• Increase F-measure
6WCRE 2011

Coparvo - COde PARtitioning and VOting
1. Partitioning source code
2. Defining experts
7WCRE 2011
2. Defining experts
3. Link recovery and expert voting

Partitioning Source Code
Class Name
8WCRE 2011
Method Name
Variable Name
Comments

Defining Experts
Class Name A
Class Name B
Merged Class Names
------------------------------------
Class Name A
Class Name B
Class Name C
9WCRE 2011
Class Name C
Class Name D
Class Name C
Class Name D
Performed same step for method, variable names, comments, and requirements

Defining Experts (Cont.)
Merged Class Names Merged Requirements
------------------------------------
Requirement 1
Requirement 1
Merged Method Names
20%
70%
10WCRE 2011
Requirement 1
……….
……
Requirement N
Merged Variable Names
Merged Comments
40%
60%

Defining Experts (Cont.)
Method Name
Comments
70%
60%
11WCRE 2011
Variable Names
Class Names
40%
20%
Extreme Cases:
•5% difference in two experts
•95% difference in two experts

Link Recovery and Expert Voting
Class A Requirements
------------------------------------
Email client must
support pop3
12WCRE 2011
support pop3
integration……….
Method Names of Class A
Comments of Class A

Case Studies
• Goal: Investigate the effectiveness of Coparvo in
improving the accuracy of VSM and reducing the
effort required to manually discard false-positive
links
• Quality focus: Ability to recover traceability links
13WCRE 2011
• Quality focus: Ability to recover traceability links
between requirements and source code
• Context: Recovering requirements traceability
links of three open-source programs, Pooka, SIP,
and iTrust

Research Questions
R01: How does Coparvo help to find valuable partitions of
source code that help in recovering traceability links?
R02: How much Coparvo helps to reduce the effort required
14
R02: How much Coparvo helps to reduce the effort required
to manually verify recovered traceability links?
R03: How does the F-measure value of the traceability links
recovered by Coparvo compare with a traditional VSM-
based approach?
WCRE 2011

Datasets
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
iTrust: Medical Application
15
Pooka SIP Communicator iTrust
Version 2.0 1.0 10
Number of Classes 298 1,771 526
Number of Methods 20,868 31,502 3,404
LOC 244K 487K 19K
WCRE 2011

IR Quality Measures
16WCRE 2011
callecision
callecision
F
RePr
RePr
2
+
×
×=

Source Code Partitions
1.Class name
1.Method name
17
2.Variable name
3.Comments
WCRE 2011

Text Preprocessing
• Filter (#43@$)
18
• Stop words (the, is, an….)
• Stemmer
(attachment, attached -> attach)
WCRE 2011

Information Retrieval (IR) Methods
• Vector Space Model (VSM)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
19WCRE 2011

Defining Expert
40
50
60
CN
20WCRE 2011
0
10
20
30
Pooka SIP iTrust
MN
VN
Cmt

Voting vs. Combination
• Can we only use different combinations
of source code partitions to create
requirements traceability links?
24WCRE 2011
• How much a combination of source code
improves the F-measure?

Statistical Tests
Non-parametric test – Mann-Whitney test
28WCRE 2011
F-measure
Pooka SIP Comm. iTrust
P-value p<0.01 p<0.01 p<0.01

Effort Analysis
40,000
50,000
60,000
70,000
80,000
90,000
VSM
29WCRE 2011
0
10,000
20,000
30,000
40,000
Coparvo

Effort Analysis (F-Measure)
8
10
12
14
VSM
30WCRE 2011
0
2
4
6
VSM
Coparvo

RQ Answers
R01: Combinations or single source-code partitions also
sometime provides better results than Coparvo
R02: Using different source of information reduces
experts’ effort up to 83%experts’ effort up to 83%
R03: Partitioning source code and using the partitions as
experts for voting yields better accuracy
31WCRE 2011

Threats to Validity
• External validity:
• We analyzed only three systems
• Different source code size
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by the other two experts
• iTrust oracle was developed by developer(s)
• Conclusion validity: Non-parametric test
• Tool is online at www.factrace.net
32WCRE 2011

Ongoing work
More IR approaches
Empirical study
Threshold
33WCRE 2011

Wcre11b.ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Wcre11b.ppt

Similar to Wcre11b.ppt (20)

More from Yann-Gaël Guéhéneuc

More from Yann-Gaël Guéhéneuc (20)

Recently uploaded

Recently uploaded (20)

Wcre11b.ppt