Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Cause Effect Chains from Version Archives - ISSRE 2011

1,203 views

Published on

Software reliability is determined by software changes. How do these changes relate to each other? By analyzing the impacted method definitions and usages, we determine dependencies between changes, resulting in a change genealogy that captures how earlier changes enable and cause later ones. Model checking this genealogy reveals temporal process patterns that encode key features of the software process: “Whenever class A is changed, its test case is later updated as well.” Such patterns can be validated automatically: In an evaluation of four open source histories, our prototype would recommend pending activities with a precision of 60– 72%.

Published in: Technology, Business
  • Be the first to comment

Mining Cause Effect Chains from Version Archives - ISSRE 2011

  1. 1. Mining Cause-Effect-Chains from Version HistoriesFunded by Kim Herzig & Andreas Zeller Faculty Grant from Saarland University, Germany
  2. 2. Cause Effect Chain Initial Change
  3. 3. Cause Effect Chain Initial Change
  4. 4. Cause Effect Chain Initial Change
  5. 5. Cause Effect Chain Initial ChangeWhat is the long-term impact of the initial change? and can we predict them?
  6. 6. What does this has to do with reliability?
  7. 7. What does this has to do with reliability? Funded by Faculty Grant
  8. 8. Web-Application Development• Changing live systems• One large repository• Direct impact on functionality, stability,
  9. 9. Web-Application Development• Changing live systems• One large repository Funded by• Direct impact on Faculty Grant functionality, stability,
  10. 10. Cause Effect chain Initial Change
  11. 11. Cause Effect chain Initial Change C1 Cn dependency
  12. 12. Change GenealogiesChange C2 depends on change 1. Analyzing source code changes C1 (not every revision can be compiled)Change C2 can only be applied 2. Extracting method definitions after applying C1. and method calls from changes (added, modified, deleted) T1 T2 3. Computing dependencies ✓ •• between method definition T1 T2 and call changes •• ✗ (e.g. call depends on previous definition) [1] Capturing the long-term impact of changes , Herzig, ICSE ’10 Ph.D. Symposium
  13. 13. Change Genealogies - int A.foo(int)File 1 + int A.foo(int) + float A.foo(float)File 2 + int B.bar(int) + d = A.foo(5d) - x = B.bar(5)File 3 + B.bar(5) + x = A.foo(5f)File 4 + d = A.foo(d) = A.foo(-1f) +e C1 C2 C3 C4 C5 time
  14. 14. Change Genealogies - int A.foo(int)File 1 + int A.foo(int) + float A.foo(float)File 2 + int B.bar(int) + d = A.foo(5d) - x = B.bar(5)File 3 + B.bar(5) + x = A.foo(5f)File 4 + d = A.foo(d) = A.foo(-1f) +e C1 C2 C3 C4 C5 time
  15. 15. Change Genealogies • Graph structure ‣ models structural dependencies • directed & acyclic✗ ‣ future cannot influence past •2 dimensional (time & space)2D ‣ vertex annotation: changed files, bug
  16. 16. Change Genealogies of formal methods! a llo ws the use • Graph structure ‣ models structural dependencies • directed & acyclic✗ ‣ future cannot influence past •2 dimensional (time & space)2D ‣ vertex annotation: changed files, bug
  17. 17. Long Term Couplings example on genealogy usage known from Amazon
  18. 18. Long Term Couplings example on genealogy usage known from Amazon developers who changed this artifact also changed ...[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, Zeller, ICSE 04
  19. 19. Long Term Couplings example on genealogy usage known from Amazon developers who changed this artifact also changed ...[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, Zeller, ICSE 04 this file got frequently changed by ... [3] Codebook Discovering and Exploiting Relationships in Software Repositories, Begel, Phang, Zimmermann, FSE 10
  20. 20. Long Term Couplings example on genealogy usage known from Amazon developers who changed this artifact also ime! in t changed ... lim ited Zeller, ICSE 04[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, this file got frequently changed by ... spa ce! [3] Codebook Discovering and Exploiting Relationships in Software t e d in i Repositories, Begel, Phang, Zimmermann, FSE 10 lim
  21. 21. Long Term Couplings example on genealogy usage developers who changed this artifact also ime! in t changed ... ited Zeller, ICSE 04 lim[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, spa ce! this file got frequently changed by ... e d in li mit [3] Codebook Discovering and Exploiting Relationships in Software Repositories, Begel, Phang, Zimmermann, FSE 10 changing this artifact always eventually causes ... [4] Using multivariate time series and association rules to detect logical change coupling: an empirical study, G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta, ICSM 2010.
  22. 22. Long Term Couplings example on genealogy usage developers who changed this artifact also ime! in t changed ... ited Zeller, ICSE 04 lim[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, spa ce! this file got frequently changed by ... e d in li mit [3] Codebook Discovering and Exploiting Relationships in Software Repositories, Begel, Phang, Zimmermann, FSE 10 changing this artifact always eventually causes ... [4] Using multivariate time series and association rules to detect logical change coupling: an empirical study, G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta, ICSM 2010. does not consider structural dependencies!
  23. 23. Long Term Couplings example on genealogy usage developers who changed this artifact also ime! in t changed ... ited Zeller, ICSE 04 lim[2] Mining Version Histories to Guide Software Changes, Zimmermann, Weißgerber, Diehl, spa ce! this file within time window got frequently changed by ... e d in li mit [3] Codebook Discovering and Exploiting Relationships in Software Repositories, Begel, Phang, Zimmermann, FSE 10 changing this artifact always eventually causes ... [4] Using multivariate time series and association rules to detect logical change coupling: an empirical study, G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta, ICSM 2010. does not consider structural dependencies!
  24. 24. Long Term Couplings example on genealogy usagechanging this artifact always eventually causes ... ai ) EF aj Computational Tree Logic (CTL)
  25. 25. Long Term Couplings example on genealogy usage within time windowchanging this artifact always eventually causes ... ai ) EF aj Computational Tree Logic (CTL)
  26. 26. Model Checking Genealogies A E C B D extract changegenealogy fromversion archive
  27. 27. Model Checking Genealogies A A E E C C B B D D extract extract valid change CTL rules usinggenealogy from model checkingversion archive
  28. 28. Model Checking Genealogies recommendations A A 1 a1 ) EF a2 E E C C 2 a1 ) EF (a2 ^ a3 ) B B D D 3 a1 ) AG (a2 ) EF a3 ) ... transform extract frequent extract valid change occurring rules CTL rules usinggenealogy from into model checkingversion archive recommendatio ns
  29. 29. Recommendation GenerationC1 C2 C3 time window time
  30. 30. Recommendation GenerationC1 C2 C3 time window time Extract subgraph and add final state S C2 C3
  31. 31. Recommendation GenerationC1 C2 C3 time window time Extract subgraph and add final state S C2 C3 F
  32. 32. Recommendation GenerationC1 C2 C3 time window time Extract subgraph and add final state S C2 C3 F Change labels with names of corresponding changed filesF1,F2 F2 F1,F3 F
  33. 33. Using CTL Templates F1,F2 F2 F1,F3 F
  34. 34. Using CTL Templates F1,F2 F2 F1,F3 FEF FxEF (Fx ^ Fy )(EF Fx ) ^ (EF Fy )AG (Fx ) EF Fy ) CTL Templates
  35. 35. Using CTL Templates F1,F2 F2 F1,F3 FEF Fx ✔EF (Fx ^ Fy )(EF Fx ) ^ (EF Fy )AG (Fx ) EF Fy ) CTL Templates
  36. 36. Using CTL Templates F1,F2 F2 F1,F3 FEF Fx ✔ ...EF (Fx ^ Fy ) F1 ) EF F3(EF Fx ) ^ (EF Fy )AG (Fx ) EF Fy ) F2 ) EF F3 ... CTL Templates Recommendations
  37. 37. Recommendation Ranking <premiss> ) <implication> confidence(F) support(F ) # times premiss true support(F) # Kripke structures formula F evaluated true
  38. 38. Conditional Rules• Genealogyvertices annotated with change properties ‣ bug fix, big change, modified definition, authors, dependency types, ...
  39. 39. Conditional Rules• Genealogyvertices annotated with change properties ‣ bug fix, big change, modified definition, authors, dependency types, ... ai ^ ‘ ‘is fix” ) EF aj
  40. 40. Conditional Rules• Genealogyvertices annotated with change properties ‣ bug fix, big change, modified definition, authors, dependency types, ... ai ^ ‘ ‘is fix” ) EF aj• Certain rules become only important under certain conditions!
  41. 41. Recommendations Examples from JRuby VariableCompiler ) EF StandardASMCompiler• Never changed together, support=20, confidence=0.8
  42. 42. Recommendations Examples from JRuby VariableCompiler ) EF StandardASMCompiler• Never changed together, support=20, confidence=0.8 MainTestSuite ) (EF RubyObject)
  43. 43. Recommendations Examples from JRuby VariableCompiler ) EF StandardASMCompiler• Never changed together, support=20, confidence=0.8 MainTestSuite ) (EF RubyObject)RubyIO ) ((EF RubyStructure) ^ (EF Visibility))• support=20, confidence=0.5 / if bug fix: confidence=0.8
  44. 44. Experimental Setup10% training phase project history
  45. 45. Experimental Setup10% training phase project history use the top 3 ranked recommendations to predict files that will change in time window. (ranked by confidence, support) support > 2, confidence ≥ 0.5
  46. 46. Experimental Setup10% training phase project history use the top 3 ranked recommendations to predict files that will change in time window. (ranked by confidence, support)use formulas forfurther training support > 2, confidence ≥ 0.5
  47. 47. Experimental Setuptraining phase project history use the top 3 ranked recommendations to predict files that will change in time window. (ranked by confidence, support)use formulas forfurther training support > 2, confidence ≥ 0.5
  48. 48. Benchmark Model the three top mostConstant ly predicts iles to cha nge again! changed f
  49. 49. Precision of Recommendations file changes predicted and applied within time window #true positivesprecision = #true positives+#false positives file changes predicted but not applied within time window
  50. 50. Precision of Recommendations #true positivesrecall = #true positives+#false negatives file changes not predicted but applied within time window
  51. 51. Precision of Recommendations #true positives recall = #true positives+#false negatives file changes not predicted but applied within time windowHow much of a systems’s future evolution can be predicted from the past?
  52. 52. Precision of Recommendations y na ture! ery l w b positives e sense o #true recall =V#true positives+#falsek s not ma negatives asur e doe This me h ere! file changes not predicted but applied within time windowHow much of a systems’s future evolution can be predicted from the past?
  53. 53. Project Details ArgoUML Jaxen JRuby XStream history 12 years 9 years 9 years 7 years#transactio 16,481 1,353 11,060 1,683 ns #authors 50 20 66 11 #files 16,658 9,831 15,029 1,188#LTC >0.7 94 10 99 19 #vertices 8,716 1,330 11,055 1,680 ∅ out 8 7 10 5 degree time 16 days 9 days 8 days 4 days window time window = median gap between vertex and youngest child
  54. 54. How Precise are our Predictions? Precision CTL Benchmark0.8 0.7 0.7 0.6 0.6 0.60.4 0.3 0.3 0.3 0 ArgoUML Jaxen JRuby XStream
  55. 55. How Precise are our Predictions? Precision Avg. rank of highest hit CTL Benchmark CTL Benchmark0.8 2.4 2.4 0.7 0.7 2.1 2.1 2.0 0.6 1.9 0.6 0.6 1.8 1.8 1.80.4 1.2 0.3 0.3 0.3 0 0 ArgoUML Jaxen JRuby XStream ArgoUML Jaxen JRuby XStream
  56. 56. Including Inner-Commit Rules Precision Avg. rank of highest hit with inner rules w/o inner rules with inner rules w/o inner rules Benchmark Benchmark0.8 2.4 2.4 0.7 0.70.7 2.1 2.1 2.1 0.7 0.7 2.0 2.0 2.0 2.0 0.6 1.9 0.6 0.6 0.6 1.8 1.8 1.80.4 1.2 0.3 0.3 0.3 0 0 ArgoUML Jaxen JRuby XStream ArgoUML Jaxen JRuby XStream
  57. 57. % Commits all Predictions True w/o inner-commit rules with inner-commit rules 70 68.8 58.0 52.3 54.2 47 49.1 48.0 47.8 43.8 23 0 ArgoUML Jaxen JRuby XStream
  58. 58. Funded Faculty
  59. 59. Funded Faculty
  60. 60. Funded Faculty
  61. 61. Funded Faculty

×