Trust-based Requirements Traceability
ICPC 2011, Kingston
Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Requirements Traceability
• Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and
backwards direction” (Gotel, 1994)
2
What’s Requirements Traceability Good For?
Program Comprehension
Discover what code needs to change to
handle a new requirement
Aid in determining whether a specification is
completely implemented
3
IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4
Problem!
5
Requirements0.000132% - Similarity
Goal
• Mining software repository to improve
recovery traceability links
• Using software repository links to improve
expert’s trust in an automatically recovered
link
6
Inspiration
• Web trust model (Palmer, 2000, McKnight, 2002)
• Initial Trust
• Reputation Trust
7
How do we trust?
8
How do we trust?
9
Other
Approaches
Trustrace
10
Example: Requirements to Code Links
Example: Requirements to SVN Links (1)
Example: Requirements to SVN Links (2)
Example: Re-weighting All Links
Case Studies
15
Pooka SIP Communicator
Version 2.0 1.0
Number of Classes 298 1,771
Number of Methods 20,868 31,502
LOC 244K 487K
SVN History 2000 – 2010 2005-2010
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
Hypotheses
16
H01: There is no statistical difference in the precision of the
recovered traceability links when using Trustrace or a VSM-
based approach
H02: There is no statistical difference in the recall of the
recovered traceability links when using Trustrace or a VSM-
based approach
IR Quality Measures
17
Identifiers / Commit Messages Extraction
18
Extraction
• Class Name
• Method Names
• Variable Names
• Comments
• SVN Commit Messages
• SVN Commit File Names
SVN Logs Preprocessing
19
We extract CVS/SVN commits and discards those that:
1. Are tagged as “delete”
2. Does not concern source code (e.g., changed manual pages or
documentation only)
3. Have messages of length shorter or equal to two words.
Text Preprocessing
20
• Filter (#43@$)
• Stop words (the, is, an….)
• Stemmer (attachment -> attach)
Information Retrieval (IR) Methods
• Vector Space Model (VSM) (Salton et al., 1975)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
21
Pooka’s Results
22
SIP Results
23
Statistical Tests
24
Precision
VSM Trustrace p-value
Pooka 42.28 54.35 p<0.01
SIP Com. 14.23 25.13 P<0.01
Pooka Results
25
SIP Results
26
Statistical Tests
27
Recall
VSM Trustrace p-value
Pooka 11.14 12.6 P>0.7
SIP Com. 13.42 16.63 P>0.5
Discussion
• Using different source of information reduces an
experts effort up to 50%
• Using temporal information with IR-based
approaches yields better results
• The results tend to improve when increasing the SVN
commit log size
• Trustrace also improves LSI results at k=50 and k=200
values for Pooka and SIP respectively
28
Threats to Validity
• External validity:
• We analyzed only two systems
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by other two experts
• Internal validity: Different ʎ value may lead to different results
• Reliability validity: replication package is available online at
www.ptidej.net
• Tool is online at www.factrace.net
29
Ongoing work
More IR approaches and datasets
Empirical study
Including other friends (bug reports etc.)
Determine heuristics to identify the best ʎ
30
Summary
• Only similarity value is not enough to trust a
link
• Other source of information is required to
increase trust of a link
31

Icpc11c.ppt

  • 1.
    Trust-based Requirements Traceability ICPC2011, Kingston Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
  • 2.
    Requirements Traceability • Requirementstraceability is defined as “the ability to describe and follow the life of a requirement, in both a forwards and backwards direction” (Gotel, 1994) 2
  • 3.
    What’s Requirements TraceabilityGood For? Program Comprehension Discover what code needs to change to handle a new requirement Aid in determining whether a specification is completely implemented 3
  • 4.
    IR-based Approaches • VectorSpace Model (Antoniol et al. 2002) • Latent Semantic Indexing (Marcus and Maletic, 2003) • Jensen Shannon Divergence (Abadi et al. 2008) • Latent Dirichlet Allocation (Asuncion, 2010) 4
  • 5.
  • 6.
    Goal • Mining softwarerepository to improve recovery traceability links • Using software repository links to improve expert’s trust in an automatically recovered link 6
  • 7.
    Inspiration • Web trustmodel (Palmer, 2000, McKnight, 2002) • Initial Trust • Reputation Trust 7
  • 8.
    How do wetrust? 8
  • 9.
    How do wetrust? 9 Other Approaches
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Case Studies 15 Pooka SIPCommunicator Version 2.0 1.0 Number of Classes 298 1,771 Number of Methods 20,868 31,502 LOC 244K 487K SVN History 2000 – 2010 2005-2010 SIP Communicator: Voice over IP and instate messenger Pooka: An email Client
  • 16.
    Hypotheses 16 H01: There isno statistical difference in the precision of the recovered traceability links when using Trustrace or a VSM- based approach H02: There is no statistical difference in the recall of the recovered traceability links when using Trustrace or a VSM- based approach
  • 17.
  • 18.
    Identifiers / CommitMessages Extraction 18 Extraction • Class Name • Method Names • Variable Names • Comments • SVN Commit Messages • SVN Commit File Names
  • 19.
    SVN Logs Preprocessing 19 Weextract CVS/SVN commits and discards those that: 1. Are tagged as “delete” 2. Does not concern source code (e.g., changed manual pages or documentation only) 3. Have messages of length shorter or equal to two words.
  • 20.
    Text Preprocessing 20 • Filter(#43@$) • Stop words (the, is, an….) • Stemmer (attachment -> attach)
  • 21.
    Information Retrieval (IR)Methods • Vector Space Model (VSM) (Salton et al., 1975) – Each document, d, is represented by a vector of ranks of the terms in the vocabulary: vd = [rd(w1), rd(w2), …, rd(w|V|)] – The query is similarly represented by a vector – The similarity between the query and document is the cosine of the angle between their respective vectors 21
  • 22.
  • 23.
  • 24.
    Statistical Tests 24 Precision VSM Trustracep-value Pooka 42.28 54.35 p<0.01 SIP Com. 14.23 25.13 P<0.01
  • 25.
  • 26.
  • 27.
    Statistical Tests 27 Recall VSM Trustracep-value Pooka 11.14 12.6 P>0.7 SIP Com. 13.42 16.63 P>0.5
  • 28.
    Discussion • Using differentsource of information reduces an experts effort up to 50% • Using temporal information with IR-based approaches yields better results • The results tend to improve when increasing the SVN commit log size • Trustrace also improves LSI results at k=50 and k=200 values for Pooka and SIP respectively 28
  • 29.
    Threats to Validity •External validity: • We analyzed only two systems • Construct validity: • The two researchers built both oracles • Oracles were validated by other two experts • Internal validity: Different ʎ value may lead to different results • Reliability validity: replication package is available online at www.ptidej.net • Tool is online at www.factrace.net 29
  • 30.
    Ongoing work More IRapproaches and datasets Empirical study Including other friends (bug reports etc.) Determine heuristics to identify the best ʎ 30
  • 31.
    Summary • Only similarityvalue is not enough to trust a link • Other source of information is required to increase trust of a link 31