Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritization Techniques

133 views

Published on

The large body of existing research in Test Case Prioritization (TCP) techniques, can be broadly classified into two categories: dynamic techniques (that rely on run-time execution information) and static techniques (that operate directly on source and test code). Absent from this current body of work is a comprehensive study aimed at understanding and evaluating the static approaches and comparing them to dynamic approaches on a large set of projects.
In this work, we perform the first extensive study aimed at empirically evaluating four static TCP techniques comparing them with state-of-research dynamic TCP techniques at different test-case granularities (e.g., method and class level) in terms of effectiveness, efficiency and similarity of faults detected. This study was performed on 30 real-word Java programs encompassing 431 KLoC. In terms of effectiveness, we find that the static call-graph based technique outperforms the other static techniques at test-class level, but the topic-model-based technique performs better at test-method level. In terms of efficiency, the static call-graph based technique is also the most efficient when compared to other static techniques. When examining the similarity of faults detected for the four static techniques compared to the four dynamic ones, we find that on average, the faults uncovered by these two groups of techniques are quite dissimilar, with the top 10% of test cases agreeing on only ~ 25% - 30% of detected faults. This prompts further research into the severity/importance of faults uncovered by these techniques, and into the potential for combining static and dynamic information for more effective approaches.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritization Techniques

  1. 1. FSE16 Seattle,WA Wednesday, November 16th, 2016 Qi Luo, Kevin Moran, & Denys Poshyvanyk College of William & Mary - SEMERU - Department of Computer Science A Large-Scale Empirical Comparison of Static and DynamicTest Case PrioritizationTechniques & SE ERU
  2. 2. v1.0 RegressionTesting &Test Case Prioritization Software Regressions
  3. 3. v1.0 v1.2 v2.0 vN … RegressionTesting &Test Case Prioritization Software Regressions
  4. 4. v1.0 v1.2 v2.0 vN … RegressionTesting &Test Case Prioritization Software Regressions
  5. 5. v1.0 v1.2 v2.0 vN … RegressionTesting &Test Case Prioritization
  6. 6. v1.0 v1.2 v2.0 vN … t1 t2 t3 t4 RegressionTesting &Test Case Prioritization
  7. 7. v1.0 v1.2 v2.0 vN … t1 t2 t3 t4 RegressionTesting &Test Case Prioritization
  8. 8. v1.0 v1.2 v2.0 vN … t1 t2 t3 t4 RegressionTesting &Test Case Prioritization
  9. 9. v1.0 v1.2 v2.0 vN … t1 t2 t3 t4 RegressionTesting &Test Case Prioritization
  10. 10. v1.0 v1.2 v2.0 vN … t1 t2 t3 t4 RegressionTesting &Test Case Prioritization
  11. 11. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4
  12. 12. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering
  13. 13. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1
  14. 14. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2
  15. 15. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3
  16. 16. RegressionTesting &Test Case Prioritization v1.2 t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4
  17. 17. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4
  18. 18. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 t3 t1 t2 t4
  19. 19. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 t3 t1 t2 t4
  20. 20. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 t3 t1 t2 t4
  21. 21. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 t3 t1 t2 t4
  22. 22. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 4) t4 t3 t1 t2 t4
  23. 23. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 4) t4 t3 t1 t2 t4 APFD: Average Percentage of Faults Detected APFD = 54%
  24. 24. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 4) t4 t3 t1 t2 t4 APFD: Average Percentage of Faults Detected APFD = 54%
  25. 25. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 4) t4 t3 t1 t2 t4 APFD: Average Percentage of Faults Detected APFD = 54% APFD = 96%
  26. 26. RegressionTesting &Test Case Prioritization v1.2 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t3 2) t1 3) t2 4) t4 t3 t1 t2 t4 APFD: Average Percentage of Faults Detected APFD = 54% APFD = 96% The Red ordering of test cases outperforms the Blue ordering in terms of APFD The main goal of TCP is to prioritize test cases so as to maximize APFD
  27. 27. Study Motivation
  28. 28. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques
  29. 29. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques • Number and size of subject programs in previous studies are generally small
  30. 30. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques • Number and size of subject programs in previous studies are generally small • Efficiency is generally not considered by previous studies
  31. 31. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques • Number and size of subject programs in previous studies are generally small • Efficiency is generally not considered by previous studies • Some past studies do not consider test case granularity
  32. 32. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques • Number and size of subject programs in previous studies are generally small • Efficiency is generally not considered by previous studies • Some past studies do not consider test case granularity • Similarity of detected faults is not well studied
  33. 33. Study Motivation • Static test case prioritization techniques (TCPs) have generally not been compared to Dynamic techniques • Number and size of subject programs in previous studies are generally small • Efficiency is generally not considered by previous studies • Some past studies do not consider test case granularity • Similarity of detected faults is not well studied The main goal of this study is to compare the effectiveness, efficiency, and similarity of detected faults for static and dynamic TCP techniques.
  34. 34. Experimental Settings 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  35. 35. Experimental Settings • 30 real-world open source Java programs - latest version1 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  36. 36. Experimental Settings • 30 real-world open source Java programs - latest version1 • 9 TCP Techniques (4-Dynamic 5-Static) 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  37. 37. Experimental Settings • 30 real-world open source Java programs - latest version1 • 9 TCP Techniques (4-Dynamic 5-Static) • Faults Seeded using mutation (using Pit w/default operators) 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  38. 38. Experimental Settings • 30 real-world open source Java programs - latest version1 • 9 TCP Techniques (4-Dynamic 5-Static) • Faults Seeded using mutation (using Pit w/default operators) • 500 randomly seeded faults per program 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  39. 39. Experimental Settings • 30 real-world open source Java programs - latest version1 • 9 TCP Techniques (4-Dynamic 5-Static) • Faults Seeded using mutation (using Pit w/default operators) • 500 randomly seeded faults per program • 100 faulty versions (5 faults per version) 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  40. 40. Experimental Settings • 30 real-world open source Java programs - latest version1 • 9 TCP Techniques (4-Dynamic 5-Static) • Faults Seeded using mutation (using Pit w/default operators) • 500 randomly seeded faults per program • 100 faulty versions (5 faults per version) • Existing JUnit Test Suites included with each subject 1 Yafeng Lu, Yiling Lou, Shiyang Cheng, Lingming Zhang, Dan Hao, Yangfan Zhou, and Lu Zhang. 2016. How does regression test prioritization perform in real-world software evolution?. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16)
  41. 41. Test Case PrioritizationTechniques StaticDynamic
  42. 42. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) StaticDynamic
  43. 43. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) • Info Type: Typically uses test-coverage information StaticDynamic
  44. 44. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) • Info Type: Typically uses test-coverage information • Efficiency: Can be expensive due to instrumentation and collection of data. StaticDynamic
  45. 45. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) • Info Type: Typically uses test-coverage information • Efficiency: Can be expensive due to instrumentation and collection of data. • Data Collection: Analyze source and test code (e.g. static information) StaticDynamic
  46. 46. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) • Info Type: Typically uses test-coverage information • Efficiency: Can be expensive due to instrumentation and collection of data. • Data Collection: Analyze source and test code (e.g. static information) • Info Type: Typically use diversity metrics for prioritization StaticDynamic
  47. 47. Test Case PrioritizationTechniques • Data Collection: Data that must be collected at runtime (e.g. dynamically) • Info Type: Typically uses test-coverage information • Efficiency: Can be expensive due to instrumentation and collection of data. • Data Collection: Analyze source and test code (e.g. static information) • Info Type: Typically use diversity metrics for prioritization • Efficiency: Typically more efficient than dynamic techniques StaticDynamic
  48. 48. DynamicTechniques
  49. 49. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 DynamicTest Case PrioritizationTechniques t5
  50. 50. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 DynamicTest Case PrioritizationTechniques Execute Test Suite t5
  51. 51. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 DynamicTest Case PrioritizationTechniques Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN
  52. 52. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 DynamicTest Case PrioritizationTechniques Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Greedy or Additional Strategy Prioritized Test Suite
  53. 53. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization t5
  54. 54. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite
  55. 55. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite
  56. 56. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite
  57. 57. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite t3 t1 t4 t2 t5
  58. 58. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite t3 t1 t4 t2 t5 Randomly Select t4 t4 coverage t1 t1
  59. 59. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite t3 t1 t4 t2 t5 Randomly Select t4 t4 coverage Calculate Pair-Wise Distances t3 t1 t2 t5 0.4 0.1 0.7 0.9 t1 t1
  60. 60. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 RegressionTesting &Test Case Prioritization Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Adaptive Random Prioritized Test Suite t3 t1 t4 t2 t5 Randomly Select t4 t4 coverage Calculate Pair-Wise Distances t3 t1 t2 t5 0.4 0.1 0.7 0.9 Select Avg, Min, or Max Continue Process t1 t1
  61. 61. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 t5 DynamicTest Case PrioritizationTechniques
  62. 62. • Greedy (Total & Additional) • Adaptive Random • Search-Based t1 t2 t3 t4 Execute Test Suite Code Coverage Information t1 t2 t3 t4 t5 t5 tN t3 t1 t4 t2 t5 Search Based GA Prioritized Test Suite DynamicTest Case PrioritizationTechniques
  63. 63. StaticTechniques
  64. 64. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 t5 StaticTest Case PrioritizationTechniques
  65. 65. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static Call Graph Analysis t5 StaticTest Case PrioritizationTechniques
  66. 66. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static Call Graph Analysis Static Call Graphs t5 StaticTest Case PrioritizationTechniques t1 t2 t3 t4 t5 tN
  67. 67. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static Call Graph Analysis Static Call Graphs t5 t3 t1 t4 t2 t5 Total or Additional Prioritized Test Suite StaticTest Case PrioritizationTechniques t1 t2 t3 t4 t5 tN
  68. 68. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 t5 StaticTest Case PrioritizationTechniques
  69. 69. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis t5 StaticTest Case PrioritizationTechniques
  70. 70. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis String Representations of Test Code t5 StaticTest Case PrioritizationTechniques t1 t2 t3 t4 t5 tN
  71. 71. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis String Representations of Test Code t5 t3 t1 t4 t2 t5 Maximize Diversity Prioritized Test Suite StaticTest Case PrioritizationTechniques t1 t2 t3 t4 t5 tN
  72. 72. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 t5 StaticTest Case PrioritizationTechniques
  73. 73. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis t5 StaticTest Case PrioritizationTechniques
  74. 74. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis Semantic Topic Models t5 StaticTest Case PrioritizationTechniques Topic 1 Topic2 Topic 3 t1-0.2 t2-0.4 t3-0.9 t4-0.1 t5-0.2 t1-0.5 t2-0.2 t3-0.6 t4-0.3 t5-0.1 t1-0.2 t2-0.9 t3-0.8 t4-0.3 t5-0.4
  75. 75. • Call Graph • String Distance • Topic Model t1 t2 t3 t4 Static String Analysis Semantic Topic Models t5 t3 t1 t4 t2 t5 Maximize Diversity Prioritized Test Suite StaticTest Case PrioritizationTechniques Topic 1 Topic2 Topic 3 t1-0.2 t2-0.4 t3-0.9 t4-0.1 t5-0.2 t1-0.5 t2-0.2 t3-0.6 t4-0.3 t5-0.1 t1-0.2 t2-0.9 t3-0.8 t4-0.3 t5-0.4
  76. 76. Research Questions
  77. 77. Research Questions • RQ1: Effectiveness?
  78. 78. Research Questions • RQ1: Effectiveness? • RQ2: Granularity?
  79. 79. Research Questions • RQ1: Effectiveness? • RQ2: Granularity? • RQ3: Similarity?
  80. 80. Research Questions • RQ1: Effectiveness? • RQ2: Granularity? • RQ3: Similarity? • RQ4: Efficiency?
  81. 81. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities
  82. 82. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB
  83. 83. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB
  84. 84. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2
  85. 85. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities
  86. 86. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities • Most effective at test-class level: TPcg−add
  87. 87. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities • Most effective at test-class level: TPcg−add • Most effective at test-method level: TPadd
  88. 88. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities • Most effective at test-class level: TPcg−add • Most effective at test-method level: TPadd • Static Techniques are generally more effective at Test Class level
  89. 89. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities • Most effective at test-class level: TPcg−add • Most effective at test-method level: TPadd • Static Techniques are generally more effective at Test Class level • Dynamic Techniques are generally more effective at Test Method Level
  90. 90. Results: RQ1 & RQ2 - Effectiveness & Granularity APFD Values across all subjects at Differing Granularities Granularity Metric TPcg−tot TPcg−add TPstr TPtopic−r TPtopic−m TPtotal TPadd TPart TPsearch ANOVA p-value Test-class Avg 0.782 0.793 0.769 0.680 0.747 0.748 0.789 0.659 0.786 1.86E-07 HSD A A A BC AB AB A C A Test- method Avg 0.768 0.816 0.819 0.783 0.828 0.794 0.898 0.795 0.883 1.69E-13 HSD D CD CD CD BC CD A CD AB Results for RQ1 & RQ2 • There is a statistically significant difference between the effectiveness of the studied techniques at differing granularities • Most effective at test-class level: TPcg−add • Most effective at test-method level: TPadd • Static Techniques are generally more effective at Test Class level • Dynamic Techniques are generally more effective at Test Method Level • Method-Level granularity performs better than Class-Level
  91. 91. Test Case Prioritization - Fault Similarity v X.x t1 t2 t3 t4
  92. 92. Test Case Prioritization - Fault Similarity v X.x t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering
  93. 93. Test Case Prioritization - Fault Similarity v X.x t1 t2 t3 t4 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4
  94. 94. Test Case Prioritization - Fault Similarity v X.x 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4
  95. 95. Test Case Prioritization - Fault Similarity v X.x 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 t5 t6 t7 t8
  96. 96. Test Case Prioritization - Fault Similarity v X.x 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t5 2) t6 3) t7 4) t8 t5 t6 t7 t8
  97. 97. Test Case Prioritization - Fault Similarity v X.x 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t5 2) t6 3) t7 4) t8 t5 t6 t7 t8 APFD: Average Percentage of Faults Detected APFD = 96% APFD = 96%
  98. 98. Test Case Prioritization - Fault Similarity v X.x 0 1 2 3 4 0 1 2 3 4 # Faults Found #Tests Executed Test Ordering 1) t1 2) t2 3) t3 4) t4 1) t5 2) t6 3) t7 4) t8 t5 t6 t7 t8 APFD: Average Percentage of Faults Detected APFD = 96% APFD = 96% Blue Red Fault Similarity? F1 F2 F3 F4 F5
  99. 99. Test Case Prioritization - Fault Similarity Technique 1
  100. 100. Test Case Prioritization - Fault Similarity 10% 20% 30%Technique 1
  101. 101. Test Case Prioritization - Fault Similarity 10% 20% 30% 010010100010… 010011100011… 010011110011…Technique 1
  102. 102. Test Case Prioritization - Fault Similarity 10% 20% 30% 010010100010… 010011100011… 010011110011… Technique 1
  103. 103. Test Case Prioritization - Fault Similarity 10% 20% 30% 010010100010… 010011100011… 010011110011… Technique 1 10% 20% 30% 101001100010… Technique 2 101001111010… 101011111011…
  104. 104. Test Case Prioritization - Fault Similarity 10% 20% 30% 010010100010… 010011100011… 010011110011… Technique 1 10% 20% 30% 101001100010… Technique 2 101001111010… 101011111011…
  105. 105. Test Case Prioritization - Fault Similarity 10% 20% 30% 010010100010… 010011100011… 010011110011… Technique 1 10% 20% 30% 101001100010… Technique 2 101001111010… 101011111011… Jaccard Distance J(Ti A, Ti B) = | Ti A Ti B | | Ti A [ Ti B |
  106. 106. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects Test-Class Level
  107. 107. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects
  108. 108. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects Test-Method Level
  109. 109. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects Test-Method Level Results for RQ3
  110. 110. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects Test-Method Level Results for RQ3 • The TCP techniques tend to uncover dissimilar faults for the most highly prioritized test cases
  111. 111. Results: RQ3 - Fault Similarity Average Jaccard similarity of faults detected across all subjects Test-Method Level Results for RQ3 • The TCP techniques tend to uncover dissimilar faults for the most highly prioritized test cases • Different subjects exhibited different degrees of similarity
  112. 112. Results: RQ4 - Efficiency of StaticTechniques Execution Cost in Seconds for Static TCP Techniques (Test Class/ Test Method) Techniques Avg. Min Max Sum Avg. Min Max Sum TPcg−tot 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.21/0.50 0/0 3.10/10.58 6.22/15.00 TPcg−add 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.19/0.82 0/0 2.87/19.98 5.70/24.57 TPstr 0.40/0.41 0.06/0.08 2.95/2.41 12.05/ 12.35 3.57/ 1,960.20 0.02/0.02 67.13/ 57,134.30 107.25/ 58,805.94 TPtopic-r 0.50/0.53 0.11/0.11 3.19/3.74 14.94/ 15.86 0.14/ 1,362.52 0/0.02 1.38/ 40,594.66 4.10/ 40,875.75 TPtopic-m 0.72/4.28 0.13/0.22 6.18/50.01 21.66/ 128.48 0.16/373.52 0/0.09 1.68/ 10,925.26 4.83/ 11,205.73 Pre-Processing Test Prioritization
  113. 113. Results: RQ4 - Efficiency of StaticTechniques Execution Cost in Seconds for Static TCP Techniques (Test Class/ Test Method) Techniques Avg. Min Max Sum Avg. Min Max Sum TPcg−tot 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.21/0.50 0/0 3.10/10.58 6.22/15.00 TPcg−add 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.19/0.82 0/0 2.87/19.98 5.70/24.57 TPstr 0.40/0.41 0.06/0.08 2.95/2.41 12.05/ 12.35 3.57/ 1,960.20 0.02/0.02 67.13/ 57,134.30 107.25/ 58,805.94 TPtopic-r 0.50/0.53 0.11/0.11 3.19/3.74 14.94/ 15.86 0.14/ 1,362.52 0/0.02 1.38/ 40,594.66 4.10/ 40,875.75 TPtopic-m 0.72/4.28 0.13/0.22 6.18/50.01 21.66/ 128.48 0.16/373.52 0/0.09 1.68/ 10,925.26 4.83/ 11,205.73 Pre-Processing Test PrioritizationResults for RQ4
  114. 114. Results: RQ4 - Efficiency of StaticTechniques Execution Cost in Seconds for Static TCP Techniques (Test Class/ Test Method) Techniques Avg. Min Max Sum Avg. Min Max Sum TPcg−tot 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.21/0.50 0/0 3.10/10.58 6.22/15.00 TPcg−add 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.19/0.82 0/0 2.87/19.98 5.70/24.57 TPstr 0.40/0.41 0.06/0.08 2.95/2.41 12.05/ 12.35 3.57/ 1,960.20 0.02/0.02 67.13/ 57,134.30 107.25/ 58,805.94 TPtopic-r 0.50/0.53 0.11/0.11 3.19/3.74 14.94/ 15.86 0.14/ 1,362.52 0/0.02 1.38/ 40,594.66 4.10/ 40,875.75 TPtopic-m 0.72/4.28 0.13/0.22 6.18/50.01 21.66/ 128.48 0.16/373.52 0/0.09 1.68/ 10,925.26 4.83/ 11,205.73 Pre-Processing Test PrioritizationResults for RQ4 • At Test-Method level, TPcg-tot and TPcg-add are the most efficient.
  115. 115. Results: RQ4 - Efficiency of StaticTechniques Execution Cost in Seconds for Static TCP Techniques (Test Class/ Test Method) Techniques Avg. Min Max Sum Avg. Min Max Sum TPcg−tot 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.21/0.50 0/0 3.10/10.58 6.22/15.00 TPcg−add 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.19/0.82 0/0 2.87/19.98 5.70/24.57 TPstr 0.40/0.41 0.06/0.08 2.95/2.41 12.05/ 12.35 3.57/ 1,960.20 0.02/0.02 67.13/ 57,134.30 107.25/ 58,805.94 TPtopic-r 0.50/0.53 0.11/0.11 3.19/3.74 14.94/ 15.86 0.14/ 1,362.52 0/0.02 1.38/ 40,594.66 4.10/ 40,875.75 TPtopic-m 0.72/4.28 0.13/0.22 6.18/50.01 21.66/ 128.48 0.16/373.52 0/0.09 1.68/ 10,925.26 4.83/ 11,205.73 Pre-Processing Test PrioritizationResults for RQ4 • At Test-Method level, TPcg-tot and TPcg-add are the most efficient. • TPstr, TPtopic-r, and TPtopic-m tend to take significantly longer as the number of test cases increases
  116. 116. Results: RQ4 - Efficiency of StaticTechniques Execution Cost in Seconds for Static TCP Techniques (Test Class/ Test Method) Techniques Avg. Min Max Sum Avg. Min Max Sum TPcg−tot 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.21/0.50 0/0 3.10/10.58 6.22/15.00 TPcg−add 4.66/4.66 1.69/1.69 11.95/11.95 139.69/ 139.69 0.19/0.82 0/0 2.87/19.98 5.70/24.57 TPstr 0.40/0.41 0.06/0.08 2.95/2.41 12.05/ 12.35 3.57/ 1,960.20 0.02/0.02 67.13/ 57,134.30 107.25/ 58,805.94 TPtopic-r 0.50/0.53 0.11/0.11 3.19/3.74 14.94/ 15.86 0.14/ 1,362.52 0/0.02 1.38/ 40,594.66 4.10/ 40,875.75 TPtopic-m 0.72/4.28 0.13/0.22 6.18/50.01 21.66/ 128.48 0.16/373.52 0/0.09 1.68/ 10,925.26 4.83/ 11,205.73 Pre-Processing Test PrioritizationResults for RQ4 • At Test-Method level, TPcg-tot and TPcg-add are the most efficient. • TPstr, TPtopic-r, and TPtopic-m tend to take significantly longer as the number of test cases increases • All TCP techniques share similar efficiency at Class-level
  117. 117. Impact and Directions for Future Research
  118. 118. Impact and Directions for Future Research • Key Findings:
  119. 119. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness
  120. 120. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs
  121. 121. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs • For the most highly prioritized test cases faults uncovered are dissimilar.
  122. 122. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs • For the most highly prioritized test cases faults uncovered are dissimilar. • Directions for Future Work:
  123. 123. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs • For the most highly prioritized test cases faults uncovered are dissimilar. • Directions for Future Work: • New TCP techniques should evaluate different test granularities and fault similarity
  124. 124. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs • For the most highly prioritized test cases faults uncovered are dissimilar. • Directions for Future Work: • New TCP techniques should evaluate different test granularities and fault similarity • Further investigation into how program characteristics impact TCP performance
  125. 125. Impact and Directions for Future Research • Key Findings: • Test Granularity impacts TCP effectiveness • The performance of TCP techniques vary across different subject programs • For the most highly prioritized test cases faults uncovered are dissimilar. • Directions for Future Work: • New TCP techniques should evaluate different test granularities and fault similarity • Further investigation into how program characteristics impact TCP performance • Future studies might consider mutation fault severity/importance
  126. 126. Thank You! Kevin Moran kpmoran@cs.wm.edu Qi Luo qluo@cs.wm.edu Denys Poshyvanyk denys@cs.wm.edu SE ERU Replication Data & Online Appendix: http://www.cs.wm.edu/semeru/data/FSE16-TCPSTUDY/ Acknowledgements: -Thanks to Lingming Zhang for lending his expertise -Thanks to Chris H. for slide examples
  127. 127. Discussion Questions • Can you think any other types of information that could be used to prioritize test cases? • The study concluded that TCP tend to uncover dissimilar faults. How can we make this information actionable?

×