Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
3rd Java Unit Testing Tool
Competition
Tanja E.J. Vos
Urko Rueda
Universidad Politecnica de Valencia
http://sbstcontest.ds...
A (3rd) tool competition… WHY?
§  Competition between different types of automated unit testing tools
(evolutionary, guid...
¢  Commercial Tool
¢  anonymous, dynamic approach, deployment and configuration for competition done by UPV
¢  EvoSuite...
¢  Baseline: Randoop (random testing)
¢  Baseline: Manual
•  3 testers (professional tester + researcher + PhD
student)
...
§  Instruction coverage
§  Branch coverage
§  Mutation coverage
§  Time for generation of tests
§  Execution time
§ ...
Unit Testing Tool Competition – Round Two 7
RUN TOOL
RUN TOOL
RUN TOOL
RUN TOOL
T1T1
T2
TN-1
TN
SCORE
BENCHMARKTOOL
CUTs
G...
8 Sebastian Bauersfeld, Tanja E. J. Vos, and Kiran Lakhotia
run tool
for Tool T
Benchmark
Framework
"BENCHMARK"
Src Path /...
¢  Same as the 2nd competition (but nobody knew ;-))
¢  Java open source libraries
¢  9 Projects (async http client, ec...
6 runs (indeterminism caused by tools and classes)
Results
(do not try to read this …. Just wanna show that we have done t...
Results per class
(do not try to read this …. Just wanna show that we have done the work)
Results per class
(do not try to read this …. Just wanna show that we have done the work)
And the winner is…….
210, 45 Manual
203, 73 GRT (1)
190,64 EvoSuite (2)
189,22 MOSA-EvoSuite (3)
186,15 T3 (4)
159,16 Jtex...
Combined strength
tools tools+humans
Average covi 78.0 % 84.9 %
Average covb 64.7 % 70.1 %
Average covm 60.3 % 69.4 %
# CU...
¢  More classes
¢  Need testers for manual baseline
¢  Any volunteers? ;-)
¢  More participants!!
¢  Score:
¢  The p...
Contact
§  Tanja	
  E.	
  J.	
  Vos	
  
§  correo:	
  tvos@pros.upv.es	
  
§  twi2er/skype:	
  tanja_vos	
  
§  web:	
...
Upcoming SlideShare
Loading in …5
×

SBST 2015 - 3rd Tool Competition for Java Junit test Tools

826 views

Published on

Published in: Software
  • Be the first to comment

  • Be the first to like this

SBST 2015 - 3rd Tool Competition for Java Junit test Tools

  1. 1. 3rd Java Unit Testing Tool Competition Tanja E.J. Vos Urko Rueda Universidad Politecnica de Valencia http://sbstcontest.dsic.upv.es/ 8th International Workshop on Search-Based Software Testing (SBST) at the 37th IEEE International Conference on Software Engineering ICSE 2015
  2. 2. A (3rd) tool competition… WHY? §  Competition between different types of automated unit testing tools (evolutionary, guided/random, dynamic) §  Task: generate regression Junit tests for given unknown set of classes §  Score takes into account: –  Effectiveness : instruction coverage, branch coverage, mutation coverage –  Efficiency: time to prepare, generate and execute §  Allows comparison between different approaches §  Help developers: –  Improve their tools –  Guide future developments
  3. 3. ¢  Commercial Tool ¢  anonymous, dynamic approach, deployment and configuration for competition done by UPV ¢  EvoSuite ¢  G. Fraser, A. Arcuri, evolutionary/search-based, static analysis ¢  Evosuite-Mosa ¢  A. Panichella, P. Tonella, F.M. Kifetew, A. Panico, evolutionary ¢  GRT ¢  L. Ma, C. Artho, C. Zhang, guided random, static analysis ¢  jTexPert ¢  A. Sakti, guided random, static analysis ¢  T3 ¢  W. Prasetya, random testing, pair-wise testing, … WHO were the participants (alphabetical order)
  4. 4. ¢  Baseline: Randoop (random testing) ¢  Baseline: Manual •  3 testers (professional tester + researcher + PhD student) •  “Write unit tests for given classes! Take as much time as you think is necessary” •  Measure time to get familiar with class and to write the tests WHAT were the baselines
  5. 5. §  Instruction coverage §  Branch coverage §  Mutation coverage §  Time for generation of tests §  Execution time §  Preparation time Unit Testing Tool Competition – Round Two 9 we defined a benchmark function which assigns to each run of a test tool T a core as the weighted sum over the measured variables: scoreT := X class  !i · covi(class) + !b · covb(class)+ !m · covm(class) !t · ✓ tprep + X class ⇥ tgen(class) + texec(class) ⇤ ◆ where, consistent with Section 2.4, covi, covb, covm refer to achieved instruction, ωi = 1 ωb = 2 ωm = 4 ωt = 1 HOW do we compare them
  6. 6. Unit Testing Tool Competition – Round Two 7 RUN TOOL RUN TOOL RUN TOOL RUN TOOL T1T1 T2 TN-1 TN SCORE BENCHMARKTOOL CUTs GENERATED TEST CASES COMPILE EXECUTE MEASURE PERFORMANCE: M1 COMPETITION EXECUTION FRAMEWORK AGGREGATOR MEASURE PERFORMANCE: M2 HOW do we execute them JaCoCo PiTest
  7. 7. 8 Sebastian Bauersfeld, Tanja E. J. Vos, and Kiran Lakhotia run tool for Tool T Benchmark Framework "BENCHMARK" Src Path / Bin Path / ClassPath ClassPath for JUnit Compilation "READY" . . . name of CUT . . . generate file in ./temp/testcases "READY" compile + execute + measure test case loop preparation Fig. 2. Benchmark Automation Protocol HOW to implement RUNTOOL
  8. 8. ¢  Same as the 2nd competition (but nobody knew ;-)) ¢  Java open source libraries ¢  9 Projects (async http client, eclipse checkstyle, gdata client, guava, hibernate, java machine learning library, Java wikipedia library, scribe, twitter4j) ¢  Sources: Google Code, GitHub, Sourceforge.net ¢  7 classes per project è total of 63 classes ¢  Packages with highest value for the Afferent Coupling Metric ¢  AFC  determines  the  number  of  classes  from  other  packages  that  depend  on   classes  in  the  current  package.     ¢  Select  ”popular”  classes  within  a  project.   ¢  Classes with highest Nested Block Depth ¢  NBD  determines  the  maximal  depth  of  nested  statements  such  as  if-­‐else   constructs,  loops  and  excepCon  handlers.     ¢  Select  complex  classes  for  which  it  is  difficult  to  achieve  high  branch  coverage.   ¢  No exclusions: abstract, small, large, file constructors… WHAT were the Benchmark Classes
  9. 9. 6 runs (indeterminism caused by tools and classes) Results (do not try to read this …. Just wanna show that we have done the work)
  10. 10. Results per class (do not try to read this …. Just wanna show that we have done the work)
  11. 11. Results per class (do not try to read this …. Just wanna show that we have done the work)
  12. 12. And the winner is……. 210, 45 Manual 203, 73 GRT (1) 190,64 EvoSuite (2) 189,22 MOSA-EvoSuite (3) 186,15 T3 (4) 159,16 Jtexpert (5) 93,45 Randoop 65,5 CT (6)
  13. 13. Combined strength tools tools+humans Average covi 78.0 % 84.9 % Average covb 64.7 % 70.1 % Average covm 60.3 % 69.4 % # CUTs with covb = 100% 6 7 # CUTs with covb 80% 31 34 CUTs with covi  10% { 43,45,49,61 } { 45 } CUTs with covi  5% { 45,61 } { 45 } SCORE 266.7 277.8 TABLE IV COMBINED STRENGTH OF THE CONTESTING TOOLS In Table IVCombined strength of the contesting toolstable.4 [1] S. B less [2] “Ra 22/0 [3] G. F obje [4] L. M “En [5] A. S repr IEE [6] A. cove Con
  14. 14. ¢  More classes ¢  Need testers for manual baseline ¢  Any volunteers? ;-) ¢  More participants!! ¢  Score: ¢  The participants will have a lot to say ;-) ¢  Tool library dependencies appearing as CUTs (the known Guava library problems) Future Editions
  15. 15. Contact §  Tanja  E.  J.  Vos   §  correo:  tvos@pros.upv.es   §  twi2er/skype:  tanja_vos   §  web:  hIp://staq.dsic.upv.es/   §  teléfono:  +34  690  917  971    

×