An Assessment of Test-Driven Reuse: Promisses and Pitfalls
1. An Assessment of Test-Driven Reuse
Promises and Pitfalls
Mehrdad Nurolahzade
Robert J. Walker
Frank Maurer
{mnurolah, walker, maurer}@ucalgary.ca
University of Calgary
13th International Conference of Software Reuse
Pisa, Italy
20 June 2013
2. Test-Driven Development (TDD)
An Assessment of Test-Driven Reuse 1
Feature E
Feature B
Feature A
Feature F
Feature D
Feature C
Feature G
New Feature
New
Feature’s
Test
3. Test-Driven Development Reuse (TDR)
Developing Locating and reusing source
code through the provision of test cases
describing the behavior of the feature of
interest.
An Assessment of Test-Driven Reuse 2
4. An Assessment of Test-Driven Reuse 3
Source: http://stackoverflow.com/questions/663374/java-ordered-map
5. TDR Process
An Assessment of Test-Driven Reuse 4
Test Case
Developer
Feature
✔
✖
✖
✖
?
?
?
?
Test Case
Source Code
Repository
➊
➋
➌
➍
6. Related Work
Behavior Sampling (Podgurski and Pierce)
Test-Driven Reuse
– Code Conjurer (Hummel et al.)
– CodeGenie(Lemos et al.)
– S6 (Reiss)
An Assessment of Test-Driven Reuse 5
A few hundred
executables
Millions of source
files
7. Test-Driven Reuse
Tool
TDR Process: Current Tools
An Assessment of Test-Driven Reuse 6
Potential
Candidates
Test Case
Interface
Transformed
Candidates
Compiled
Candidates
Results
Extract
Search
Compile
Test
Transform
DisplayWrite
8. • Are evaluation functions realistic?
• Why arbitrary functions cannot be found?
• What attributes of evaluation functions makes
them retrievable by TDR tools?
An Assessment of Test-Driven Reuse 7
Study Motivation
9. Experiment Overview
• 10 realistic TDR tasks
– Developer forums
– Programming tutorials
– Example source code catalogs
• Solutions were confirmed to be in the tool
repositories.
• Top 10 recommendations were qualitatively
analyzed.
An Assessment of Test-Driven Reuse 8
10. Measures
• Relevance:the extent of the features satisfied
• Effort:the amount of work anticipated to adapt
• Why not quantitative measures?
– Precision vs. accuracy
– How poor vs. why are they poor
An Assessment of Test-Driven Reuse 9
12. External Validation
• Have the subjective scores been assigned fairly?
• A random subset (12 out of 109 recommendations)
• 5 participants
• Example and guidelines
• Inter-rater reliability (Spearman’s ρ)
An Assessment of Test-Driven Reuse 11
Participan
t
P1 P2 P3 P4 P5
Relevance 0.86 0.89 0.93 0.84 0.81
Effort 0.72 0.75 0.82 0.74 0.72
13. An Assessment of Test-Driven Reuse 12
Source: http://stackoverflow.com/questions/663374/java-ordered-map
14. Why TDR Tools Do NOT Work
• A developer may not always provide precise
input.
• TDR filters candidates based on lexical and
syntactic similarity.
• TDR expects the developer to provide precise
input.
An Assessment of Test-Driven Reuse 13
16. Why TDR Tools Do NOT Work
• TDR evaluation tasks seek common variations
– Easy to retrieve
• Variants expose the same interface
– But behave differently
• TDR fails at retrieving variants of common
functions.
An Assessment of Test-Driven Reuse 15
17. Why TDR Tools Do NOT Work
• Compiling and Running Source Code
– Not working as intended
• Difficult to automate
– Resolving dependencies
– Provision of runtime environment or resources
– Adaptation
An Assessment of Test-Driven Reuse 16
19. Implications for Future Research
• Leveraging contextual facts
– Helper types
– Constraints: pre and post conditions
– Data flow
– Control flow
An Assessment of Test-Driven Reuse 18
20. Implications for Future Research
• Pipeline similarity matching
• Multi-indexing
An Assessment of Test-Driven Reuse 19
f1 f2 f3
f1
f2
f3
Big
Index
Index1 Index2 Index3
21. Conclusion
• Test cases can express a variety of structure
and semantics that TDR can utilize.
• Current TDR prototypes put too much
emphasis on lexical and syntactic similarity.
• Future TDR research should leverage other
aspects of test cases.
An Assessment of Test-Driven Reuse 20