Csmr2013 presentation

372 views

Published on

My presentation at the 2013 European Conference on Software Maintenance and Reverse Engineering

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Csmr2013 presentation

  1. 1. Change-Based Test Selection in the Presence of Developer Tests Quinten David Soetens Serge Demeyer Andy ZaidmanBon Giorno, my name is Quinten, I’m from the University of Antwerp.I will be showing a technique that we investigated to reduce the size of a test suite.
  2. 2. Test Suites Grow 2As a Software System grows, so does its Test Suite. And they can grow very large indeed!We talked to a couple of companies in industry and they confirmed that this is indeed a relevant problem for them. ... Why?
  3. 3. un o R rs t ou e H Ta k 3Because large test suites lead to inefficient testing -- it takes too long to run all the testsOne company we talked to mentioned that their tests take up to 13 hours to run. They start the tests in the evening and whenthey come back in the morning for their daily standup scrum meeting the testing is still going on.
  4. 4. 4This leads in turn leads to delays -- delays in executing the test as well as delays in the updating of the test cases.This leads to to a reduced test coverage and larger feedback cycles.It takes a lot longer for a developer to know when his code was good or not.
  5. 5. R un T ests i n Par allel 5One solution to this problem could be to run the tests in parallel to save time.For instance another company that we talked to had tests that run 8 hours. Their solution was to run the tests in parallel in 16different machines effectively reducing the runtime of their testsuite from 8 hours to half an hour. Which in my opinion is still along time to wait. Especially as a developer who just wants to check if his code is OK.
  6. 6. Which tests should I run when changing this method? 6In light of this, developers are faced with a problem: Which test(s) should they run when changing a particular part of a system?Currently developers use their own gut feeling, common knowledge in the company or expert knowledge of a collegue to selecta subset of tests that could be relevant for the code he is working on. However tool support to aid in this task is desirable.We therefore need to find which tests are relevant for that particular change. We can do this when we have recorded the finegrained changes made during the development.
  7. 7. ChEOPSJ Applications TestSelection Model ChangeRecorders Change Distiller Logger Distiller SVNKit ChEOPSJ: Change-Based Test Optimization Quinten David Soetens and Serge Demeyer In "Proceedings of 16th European Conference on Software Maintenance and Reengineering, CSMR 2012 7This approach was implemented in a tool called ChEOPSJ, which I presented at last years CSMR.
  8. 8. ChEOPSJ Applications TestSelection Model ChangeRecorders Change Distiller Logger Distiller SVNKit ChEOPSJ: Change-Based Test Optimization Quinten David Soetens and Serge Demeyer In "Proceedings of 16th European Conference on Software Maintenance and Reengineering, CSMR 2012 8We consider changes made to the source code as first class objects, -- tangible entities that we can analyze and manipulate.Basically it’s a tool that can record changes in the background while you are programming.And in order to work with real world cases we also have the capability of recovering changes from source code repositories.Once a change model is instantiated for a system we can analyze the change model and run different applications (for now onlythe test selection application).
  9. 9. First Class Change Objects Changes act on Source Code (FAMIX) Entities (e.g. AddClassChange, AddMethodChange, etc.) 9For instance adding a new class will result in an Add-Class change
  10. 10. First Class Change Objects Changes have Structural Dependencies (e.g. AddMethod ---> AddClass--->AddPackage etc.) 10We can also define dependencies between these changes. For instance adding a method to a class requires the class to be addedfirst.Therefor there is a dependency between the AddMethodChange and the AddClassChange.
  11. 11. First Class Change Objects Traceability via Dependencies between Test and Program Code Changes to Changes to Program Code Test Code 11It’s these dependencies that we can use to find relevant tests. Since Tests are also source code, so we can find a series ofdependencies between the test code and the source code.These dependencies form a live traceability link between the test code and the source code. Using these links we can selectrelevant tests for a particular change.
  12. 12. Research Questions Compare test subset against “retest all” Size Reduction? Quality? Accuracy? 12We evaluated our approach on two open source cases: Cruisecontrol and PMD.For each class we searched for the relevant test classes. (Using the changes in the class).We could then compare the found subset(s) of tests against the entire (larger) test suite.And we compared this on three criteria. How much did we actually reduce the test suite? What was the quality of the reduced test suite? Is this the same or worse? And we used a metric called Mutation Coverage togauge the quality of a set of tests. And finally we also looked at the accuracy of our approach, which means we looked at precision and recall.The First Question is: When we reduce the test suite to a subset of tests, how much did we actually reduce it?
  13. 13. Size Red ucti on? Cruisecontrol, 295 Tests Reduced to 1 Test PMD$ 215 Tests Reduced to 1 Test 13For 54% and 44% of the classes we found that there was only 1 relevant test.For 11% and 20% of the classes we found there were 2 relevant tests.For 13% and 10% of the classes had 3 relevant tests.For 21% and 26% there were 4 or more relevant tests.Cruisecontrol: 1 (54.5%) 2 (11.4%) 3 (13.1%) >=4 (21.0%)PMD: 1 (44.0%) 2 (19.9%) 3 (09.9%) >=4 (26.2%)
  14. 14. Size Red ucti on? Cruisecontrol, 295 Tests Reduced to 2 Tests PMD$ 215 Tests Reduced to 2 Test 14For 54% and 44% of the classes we found that there was only 1 relevant test.For 11% and 20% of the classes we found there were 2 relevant tests.For 13% and 10% of the classes had 3 relevant tests.For 21% and 26% there were 4 or more relevant tests.Cruisecontrol: 1 (54.5%) 2 (11.4%) 3 (13.1%) >=4 (21.0%)PMD: 1 (44.0%) 2 (19.9%) 3 (09.9%) >=4 (26.2%)
  15. 15. Size Red ucti on? Cruisecontrol, 295 Tests Reduced to 3 Tests PMD$ 215 Tests Reduced to 3 Test 15For 54% and 44% of the classes we found that there was only 1 relevant test.For 11% and 20% of the classes we found there were 2 relevant tests.For 13% and 10% of the classes had 3 relevant tests.For 21% and 26% there were 4 or more relevant tests.Cruisecontrol: 1 (54.5%) 2 (11.4%) 3 (13.1%) >=4 (21.0%)PMD: 1 (44.0%) 2 (19.9%) 3 (09.9%) >=4 (26.2%)
  16. 16. Size Red ucti on? 295 Tests Reduced to Cruisecontrol, 4 or more Tests (max = 22) 215 Tests Reduced to 4 or more Test PMD$ (max = 37) 16For 54% and 44% of the classes we found that there was only 1 relevant test.For 11% and 20% of the classes we found there were 2 relevant tests.For 13% and 10% of the classes had 3 relevant tests.For 21% and 26% there were 4 or more relevant tests.Cruisecontrol: 1 (54.5%) 2 (11.4%) 3 (13.1%) >=4 (21.0%)PMD: 1 (44.0%) 2 (19.9%) 3 (09.9%) >=4 (26.2%)
  17. 17. Test Suites Grow 17As such we can say that we can reduce ALL the tests
  18. 18. Size Red ucti on? 18to a handful of tests. -- 80 to 90 % of the classes had up to 5 relevant tests!
  19. 19. Research Questions Compare test subset against “retest all” Size Reduction? Quality? Accuracy? 19Next Question was: Does the quality of the reduced test sets remain the same or is it worse than retest all?
  20. 20. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? All Tests Pass } public void berekenDTotVorige(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } } } public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 20To asses the quality of a set of tests, we used mutation testing.In short. this is inserting a fault into the code and checking if your test set fails (mutation killed) or not (mutation survived).We used PIT as a tool to do this automatically for us.We start with a green test suite (i.e. all tests pass)
  21. 21. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE ≈ pitest.org private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ Introduce Mutant firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } + Rerun Tests } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? } public void berekenDTotVorige(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } } } public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 21After inserting a mutation we run the tests. If the tests still pass, we say that the mutation survived (Which is BAD, because youintroduced a bug in your system and the tests did not catch it.)
  22. 22. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? All Tests Pass } public void berekenDTotVorige(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Mutation Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } Survived } } public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 22After inserting a mutation we run the tests. If the tests still pass, we say that the mutation survived (Which is BAD, because youintroduced a bug in your system and the tests did not catch it.)
  23. 23. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? } ≈ pitest.org public void berekenDTotVorige(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; Introduce Mutant for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } + Rerun Tests } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } } } public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 23After inserting another mutation we run the tests again. Now some of tests fail, so we can say that this mutation was killed (Thisis GOOD)
  24. 24. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? } Some Tests Fail public void berekenDTotVorige(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Mutation Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } } } Killed public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 24After inserting another mutation we run the tests again. Now some of tests fail, so we can say that this mutation was killed (Thisis GOOD)
  25. 25. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } ≈ pitest.org private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ Repeat For All ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? } public void berekenDTotVorige(){ Possible Mutations for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } } } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); s.dTotVorige = j-vorigeIndex; } } } } public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 25We do this for all mutations and we get a metric: Mutation Coverage which is the percentage of the number of mutants killed outof the total number of mutants introduced.We can use this metric to gauge the quality of a set of tests. And we now want to see if for a particular class the quality remainsthe same? when only using a reduced set of tests.
  26. 26. Qua Mutation Testing lity? package engine; import java.util.*; public class SuffixTree { int[] hdlabel = new int[10000]; int[] ithSuf; int ithSufLength; int ithSufBegin; int[] firstSuf; public Vertex root= null; private Vector pStringV; private int[] a; public Vector pMatches = new Vector(); Vector inputFiles; int in;//de in-de suffix public SuffixTree(Vector symbolen, Vector files) { pStringV = symbolen; inputFiles = files; ithSuf = new int[pStringV.size()]; firstSuf = new int[pStringV.size()]; } SOURCE private void ithSuffix1(){//nieuwe versie, nu voor i=1 for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==false){ firstSuf[j]=s.symbool; ithSuf[j]=s.symbool; } else{ firstSuf[j]=s.dTotVorige; ithSuf[j]=s.dTotVorige; } } ithSufLength = pStringV.size(); ithSufBegin=0; CODE } ≈ ipitest.org private void ithSuffix(int i){//nieuwste versie, niet voor i=1 ithSufBegin = i-1; Symbool sym = (Symbool)pStringV.elementAt(i-2); if(sym.parameter==true && sym.dTotVolgende!=0){ Repeat For All ithSuf[i-2+sym.dTotVolgende] = 0; } ithSufLength = pStringV.size()-i+1; // return ithSufClone;//nodig? } public void berekenDTotVorige(){ Possible Mutations for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int vorigePos=-1; for(int k=j-1;k>=0;k--){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); ts K lled if(sym.symbool==s.symbool) {vorigePos=k;break;} }//is er een probleem als een n-par en een par dezelfde int hebben? if(vorigePos==-1) s.dTotVorige=0; else s.dTotVorige = j-vorigePos; } n } # Muta } public void berekenDTotVorige2(){ Hashtable ht = new Hashtable(); for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ Integer i = new Integer(s.symbool); if(!ht.containsKey(i)){ s.dTotVorige=0; ht.put(i,new Integer(j)); roduced } else{ int vorigeIndex = ((Integer)ht.get(i)).intValue(); ht.put(i,new Integer(j)); overage = # Mutants Int s.dTotVorige = j-vorigeIndex; } } } } utation C public void berekenDTotVolgende(){ for(int j=0;j<=pStringV.size()-1;j++){ Symbool s = (Symbool)pStringV.elementAt(j); if(s.parameter==true){ int volgendePos=-1; M for(int k=j+1;k<pStringV.size();k++){//zoek of de parameter al eerder voorkwam Symbool sym = (Symbool)pStringV.elementAt(k); if(sym.symbool==s.symbool) {volgendePos=k;break;} } if(volgendePos==-1) s.dTotVolgende=0; else s.dTotVolgende = volgendePos-j; } SOURCE CODE © ≈ http://pitest.org ≈ 25We do this for all mutations and we get a metric: Mutation Coverage which is the percentage of the number of mutants killed outof the total number of mutants introduced.We can use this metric to gauge the quality of a set of tests. And we now want to see if for a particular class the quality remainsthe same? when only using a reduced set of tests.
  27. 27. Qua lity? Cruisecontrol, 88% equal Mutation Coverage PMD$ 50% equal Mutation Coverage 26In 88% and 50% of the inspected classes we have a mutation coverage that remained the same. (i.e. the quality of the reducedtest set is equal to that of the full test suite.)In 12% (Cruisecontrol) and 50% (PMD) however we have a worse Mutation Coverage, but the question then arises
  28. 28. Qua lity? Cruisecontrol, 88% equal Mutation Coverage rse is ch wo w mu era ge? Ho nC ov tio uta equal Mutation Coverage PMD$ th e M 50% 27How much worse is the mutation coverage in these cases?
  29. 29. Qua lity? 100" 100" Percentage)of)more)surviving) Percentage)of)more)surviving) 90" 90" 80" 80" 70" 70" mutants) 60" 60" mutants) 50" 50" 40" 40" 30" 30" 20" 20" 10" 10" 0" 0" ,20" 30" 80" 130" 180" ,20" 30" 80" 130" 180" Total)number)of)mutants) Total)number)of)mutants) 28So we looked at those test subsets were more mutants survived than with the retest all.We see that it varies from a couple of percent to a hundred percent more mutants surviving.However we need to take in account the total number of mutants introduced.So that is what is shown here.On the vertical axis we show the percentage of more surviving mutants. Meaning the lower the better.On the horizontal axis we show the total number of mutants introduced. Which puts some of the data points in perspective.
  30. 30. Qua lity? 100" 100" Percentage)of)more)surviving) Percentage)of)more)surviving) 90" 90" 80" 80" 70" 70" mutants) 60" 60" mutants) 50" 50" 40" 40" 30" 30" 20" 20" 10" 10" 0" 0" ,20" 30" 80" 130" 180" ,20" 30" 80" 130" 180" Total)number)of)mutants) Total)number)of)mutants) 29For Cruisecontrol for instance there is one point where a 100% of the introduced mutants survived the subset, but were caught inthe retest all. However when put in perspective this is out of a total of only 3 mutants!!!
  31. 31. Qua lity? 100" 100" Percentage)of)more)surviving) Percentage)of)more)surviving) 90" 90" 80" 80" 70" 70" mutants) 60" 60" mutants) 50" 50" 40" 40" 30" 30" 20" 20" 10" 10" 0" 0" ,20" 30" 80" 130" 180" ,20" 30" 80" 130" 180" Total)number)of)mutants) Total)number)of)mutants) 30The data points that are more worrisome in Cruisecontrol are the two in the middle. Because, here a relatively high number ofmutants is introduced an quite a few of them survived the subset of tests where they did not survive the full test set.
  32. 32. Qua lity? 100" 100" Percentage)of)more)surviving) Percentage)of)more)surviving) 90" 90" 80" 80" 70" 70" mutants) 60" 60" mutants) 50" 50" 40" 40" 30" 30" 20" 20" 10" 10" 0" 0" ,20" 30" 80" 130" 180" ,20" 30" 80" 130" 180" Total)number)of)mutants) Total)number)of)mutants) 31PMD performs a lot worse. As we can see all of these data points with high numbers of mutants surviving the subset and not thefull set.
  33. 33. Qua lity? 100" 100" Percentage)of)more)surviving) Percentage)of)more)surviving) 90" 90" 80" 80" 70" 70" mutants) 60" 60" mutants) 50" 50" 40" 40" 30" 30" 20" 20" 10" 10" 0" 0" ,20" 30" 80" 130" 180" ,20" 30" 80" 130" 180" Total)number)of)mutants) Total)number)of)mutants) On average 12% more On average 24% more mutants survive mutants survive (weighted average) (weighted average) 32Still on average we can say that 12% and 24% more mutants survive, and this is a weighted average where we took the totalnumber of mutants as weights.In short the closer the data points are to the axes,the better.So our approach up to now is good, but it’s not perfect. We do miss some relevant tests.
  34. 34. Research Questions Compare test subset against “retest all” Size Reduction? Quality? Accuracy? 33Which leads us automatically to the next question, what’s our precision and recall?i.e. How many of the selected tests are really relevant tests (precision)? How many of the really relevant tests are selected (recall)?To measure precision and recall we need some kind of oracle to tell us which actually are the relevant tests for each class.
  35. 35. Acc urac Dynamic Analysis y? ∀ t ∈Tests: execute t ∀ m : Method invoked during run of t t is a relevant test for m 34We used a dynamic analysis to tell us.In short we wrote a simple aspect in aspectj that during the execution of a test, notes which methods were invoked.We can then say that that test is relevant for those methods.Using these results we could compare to our static analysis of the changes...
  36. 36. Acc urac y? Precision) Precision) [0.25,0.5[$[0,0.25[$ [0,0.25[$ [0.25,0.5[$ [0.5,0.75[$ [0.5,0.75[$ [0.75,1[$ [1]$ [0.75,1[$ [1]$ Avg: 0.88 Avg: 0.83 Recall) Recall) [0,0.25[$ [0.25,0.5[$ [0,0.25[$ [1]$ [0.5,0.75[$ [1]$ [0.25,0.5[$ [0.75,1[$ [0.75,1[$ Avg: 0.77 Avg: 0.58 [0.5,0.75[$ 35We find for both Cruisecontrol and PMD high precision values (on average 0.88 and 0.83%).Which means that most of the test that we selected in the subsets were in fact relevant tests!The recall values are a bit lower especially in the case of PMD. With an average recall of 77% and 58%.This means that some of the actually relevant tests where not selected in the subsets by our tool.This was also apparent in the mutation testing approach.But is this really bad?
  37. 37. 36When we look back at our individual developer. He is performing changes on a software system. And wants to test his code.When he gets tool support saying, these are the relevant tests for your changes, he gets more confident about his code.He will test more often. He gets shorter feedback cycles.The selected subset is not safe as it occasionally misses a few relevant tests, however it is adequate especially since the completetest suite will be executed as part of the integration build anyway.
  38. 38. 37What’s next after this?We need to do some more work on this, basically polishing the approach (try to improve recall, probably at the cost of precision)See how this approach performs on industrial cases.On the other hand we also want to have a look at other applications of Change Centric Software Development.One thing that we are currently looking at is looking if we can detect patterns in the set of changes. -- Either predefined patterns like refactorings, and checking if we can identify those. -- Or just frequent pattern mining on a set of changes and not knowing in advance what kind of patterns we mightuncover.Another application is that successful changes on one branch of a piece of software might be reapplied on other branches ofthat system(bug fixes?)
  39. 39. Future Directions • Reducing Test Runtime • Polishing of the Approach (& Implementation) • More (Industrial) Cases • Detect Change Patterns • Identify Refactorings • Recurring sequences of changes • Reapplying changes • bug fixes • design improvements • API evolution 37What’s next after this?We need to do some more work on this, basically polishing the approach (try to improve recall, probably at the cost of precision)See how this approach performs on industrial cases.On the other hand we also want to have a look at other applications of Change Centric Software Development.One thing that we are currently looking at is looking if we can detect patterns in the set of changes. -- Either predefined patterns like refactorings, and checking if we can identify those. -- Or just frequent pattern mining on a set of changes and not knowing in advance what kind of patterns we mightuncover.Another application is that successful changes on one branch of a piece of software might be reapplied on other branches ofthat system(bug fixes?)
  40. 40. 38To wrap up....We were looking for a way to find relevant tests for small changes to the software.We found that our technique could reduce the test suite to a handful of test (5 tests in 80-90% of the cases)We found that in 50-80% those reduced test suites had the same mutation coverage (quality) as the full test set)The test sets that had a worse mutation coverage, was actually not that bad.And we found that we had really good precision, but lower recall, meaning that we did in fact miss some relevant tests.However as we mentioned this is not a very big problem since the full test suite will in the end also be built anyway.

×