Requirements traceability ensures that source
code is consistent with documentation and that all requirements
have been implemented. During software evolution, features
are added, removed, or modified, the code drifts away from its
original requirements. Thus traceability recovery approaches
becomes necessary to re-establish the traceability relations
between requirements and source code.
This paper presents an approach (Coparvo) complementary
to existing traceability recovery approaches for object-oriented
programs. Coparvo reduces false positive links recovered by
traditional traceability recovery processes thus reducing the
manual validation effort.
Coparvo assumes that information extracted from different
entities (e.g., class names, comments, class variables, or methods signatures) are different information sources; they may
have different level of reliability in requirements traceability
and each information source may act as a different expert
recommending traceability links.
We applied Coparvo on three data sets, Pooka, SIP Communicator, and iTrust, to filter out false positive links recovered
via the information retrieval approach i.e., vector space model.
The results show that Coparvo significantly improves the of
the recovered links accuracy and also reduces up to
1. Requirements Traceability for Object OrientedRequirements Traceability for Object Oriented
Systems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source Code
WCRE 2011, Limerick, IrelandWCRE 2011, Limerick, Ireland
Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
2. Requirements Traceability
Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and backwards
direction”direction” [Gotel, 1994]
2WCRE 2011
3. What’s Requirements Traceability Good For?
Program Comprehension
Discover what code must change to handle a
new requirementnew requirement
Aid in determining whether a specification is
completely implemented
3WCRE 2011
4. IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4WCRE 2011
9. Defining Experts
Class Name A
Class Name B
Merged Class Names
------------------------------------
Class Name A
Class Name B
Class Name C
9WCRE 2011
Class Name C
Class Name D
Class Name C
Class Name D
Performed same step for method, variable names, comments, and requirements
11. Defining Experts (Cont.)
Method Name
Comments
70%
60%
11WCRE 2011
Variable Names
Class Names
40%
20%
Extreme Cases:
•5% difference in two experts
•95% difference in two experts
12. Link Recovery and Expert Voting
Class A Requirements
------------------------------------
Email client must
support pop3
12WCRE 2011
support pop3
integration……….
Method Names of Class A
Comments of Class A
13. Case Studies
• Goal: Investigate the effectiveness of Coparvo in
improving the accuracy of VSM and reducing the
effort required to manually discard false-positive
links
• Quality focus: Ability to recover traceability links
13WCRE 2011
• Quality focus: Ability to recover traceability links
between requirements and source code
• Context: Recovering requirements traceability
links of three open-source programs, Pooka, SIP,
and iTrust
14. Research Questions
R01: How does Coparvo help to find valuable partitions of
source code that help in recovering traceability links?
R02: How much Coparvo helps to reduce the effort required
14
R02: How much Coparvo helps to reduce the effort required
to manually verify recovered traceability links?
R03: How does the F-measure value of the traceability links
recovered by Coparvo compare with a traditional VSM-
based approach?
WCRE 2011
15. Datasets
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
iTrust: Medical Application
15
Pooka SIP Communicator iTrust
Version 2.0 1.0 10
Number of Classes 298 1,771 526
Number of Methods 20,868 31,502 3,404
LOC 244K 487K 19K
WCRE 2011
18. Text Preprocessing
• Filter (#43@$)
18
• Stop words (the, is, an….)
• Stemmer
(attachment, attached -> attach)
WCRE 2011
19. Information Retrieval (IR) Methods
• Vector Space Model (VSM)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
19WCRE 2011
24. Voting vs. Combination
• Can we only use different combinations
of source code partitions to create
requirements traceability links?
24WCRE 2011
• How much a combination of source code
improves the F-measure?
31. RQ Answers
R01: Combinations or single source-code partitions also
sometime provides better results than Coparvo
R02: Using different source of information reduces
experts’ effort up to 83%experts’ effort up to 83%
R03: Partitioning source code and using the partitions as
experts for voting yields better accuracy
31WCRE 2011
32. Threats to Validity
• External validity:
• We analyzed only three systems
• Different source code size
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by the other two experts
• iTrust oracle was developed by developer(s)
• Conclusion validity: Non-parametric test
• Tool is online at www.factrace.net
32WCRE 2011