Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

  • 638 views
Uploaded on

Paper: ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems …

Paper: ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

Authors: Kenichi Kobayashi, Akihiko Matsuo, Katsuro Inoue, Yasuhiro Hayase, Manabu Kamimura and Toshiaki Yoshino

Session: Research Track 2: Impact Analysis

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
638
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ImpactScale:Quantifying Change Impact to Predict Faultsin Large Software Systems Kenichi Kobayashi Akihiko Matsuo Katsuro Inoue Fujitsu Laboratories Fujitsu Laboratories Osaka University Yasuhiro Hayase Manabu Kamimura Toshiaki Yoshino University of Tsukuba Fujitsu Laboratories Fujitsu
  • 2. Overview1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. SummaryICSM2011 @ Williamsburg, 2011-09-27 1 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 3. Background  Fault prediction in maintenance is a difficult task, and predictive performance is not enough only with product metrics.  Product Metrics are metrics extracted from software product such as source code.  Therefore, process metrics, such as code churn and logical coupling, have been combined to product metrics.  Process Metrics are metrics extracted from software process such as change histories. Practitioners’ Point of View  However, in enterprise scenes of maintenance, documents, change histories, bug reports, and specialists’ knowledge are often lost, out-of-date, or unable to be used.ICSM2011 @ Williamsburg, 2011-09-27 2 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 4. Goals Problem  Process metrics cannot be always obtained. Motivation  To achieve high predictive performance only with product metrics extractable from source code Goals  To define a new product metric  To show the effectiveness of the metricICSM2011 @ Williamsburg, 2011-09-27 3 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 5. Basic IdeaSoftware dependency is one of We assumed Change Impactsurviving factors of faults even Analysis enables us to extract implicitafter release. dependency. 暗黙の依存関係 implicit dependency Change Impact Analysis Technique to solve the affected areas when some part of software is changed. 修 fix Weakness 修 fix High computational cost 修 fix 修正忘れ missed fix Need not to solve the affected areas. Only need to solve the scale of them. Hypothesis A metric that quantifies the scale of ImpactScale change impact can improve the (abbrev. IS) performance of fault prediction.ICSM2011 @ Williamsburg, 2011-09-27 4 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 6. Overview1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. SummaryICSM2011 @ Williamsburg, 2011-09-27 5 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 7. Overview of ImpactScale Definition  Dependencies are extract Propagation Graph from target software, and Quantity of Propagation Graph is built. Change Impact Code Node from C to A  Propagation Model  Probabilistic propagation Change!  Relation-sensitive propagation  ImpactScale is sum of all Quantities of Change Impact. Dependency Data NodeICSM2011 @ Williamsburg, 2011-09-27 6 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 8. Propagation Graph ① Build dependency graph extracted from target software 《Dependency Graph》 Code Node module, class, function, source code Data Node Dependency Edge DB table, with relation type global variable CALL, READ, WRITEICSM2011 @ Williamsburg, 2011-09-27 7-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 9. Propagation Graph ① Build dependency graph ② Add reverse edges extracted from target software to build Propagation Graph 《Dependency Graph》 《Propagation Graph》 Code Node module, class, function, source code Data Node Dependency Edge DB table, with relation type global variable CALL, READ, WRITE Change impact analysis for ImpactScale is performed on Propagation Graph.ICSM2011 @ Williamsburg, 2011-09-27 7-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 10. Probabilistic Propagation We assume that change impact probabilistically propagates from a node to another node as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07] Propagation Probability Quantity of change impact ×0.5 from the source node Change! ×0.5 ×0.5 In this presentation, propagation probability is always 0.5.ICSM2011 @ Williamsburg, 2011-09-27 8 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 11. Relation-sensitive Propagation To avoid overestimation, we used context information to eliminate unlikely propagation.  We use an edge’s relation type as minimal context information in point of computational time. Cut Rules determine whether propagation from one node to its next node is cut or not, referring its previous and next edge’s relation type. previous next relation type current relation type next node refer node refer Cut Rule We call such controlled propagation relation-sensitive propagation. Computational complexity is practically low.ICSM2011 @ Williamsburg, 2011-09-27 9 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 12. Example of Cut Rules Example from “C” Example from “F” Cut Rule 2 During finding callers, don’t find callees. Change! Change! Cut Rule 1 Cut Rule 3 During finding callees, Don’t find beyond don’t find callers. READ edges.ICSM2011 @ Williamsburg, 2011-09-27 10 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 13. Overview1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. SummaryICSM2011 @ Williamsburg, 2011-09-27 11 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 14. Data Sets for EvaluationsTwo enterprise accounting systems in different companies Faulty Term Data Set #Faulty #Modules Total LOC #Faults Module Fault- Name Modules Rate Collected 40 DS1 5.8k 1.6M 269 215 3.7% months 40 DS2 7.6k 3.7M 250 208 2.7% months  Common Properties  Collected Metrics  Language: COBOL  7 Existing Metrics  Age: Over 20 years  LOC, WMC, MaxVG, Sections, Calls, Fan-in, Fan-out  ImpactScaleICSM2011 @ Williamsburg, 2011-09-27 12 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 15. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 16. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 17. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-3 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 18. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-4 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 19. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-5 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 20. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-6 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 21. Real Example of Calculating ImpactScaleDS1#modules 5.8kEach square-shapedgroup ofmodules is asub-system.ICSM2011 @ Williamsburg, 2011-09-27 13-7 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 22. Measurement Results Distribution of ImpactScale 4000 Number of Modules Data Set Mean IS Max IS 3000 Long-tailed DS1 86.0 2989.6 2000 DS2 156.5 3338.2 1000 0 ~50 ~100 ~150 ~200 ~250 ~300 ~350 ~400 ~450 ~500 ~550 ~600 ~650 ~700 ~750 ~800 ~850 ~900 ~950 ImpactScale Calculation Time Practically Spike:  DS1: about 10 sec. short • system-wide dispatcher or  DS2: about 30 sec. • symptom of bad smellICSM2011 @ Williamsburg, 2011-09-27 14 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 23. ImpactScale and Faults First 20% of modules contain 48.8% faults. IS highly correlates with faults. Module Database Table ImpactScaleHigh 10-quartile LowICSM2011 @ Williamsburg, 2011-09-27 15 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 24. Overview1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. SummaryICSM2011 @ Williamsburg, 2011-09-27 16 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 25. Overview of Evaluations Evaluation Procedure  100 times random sub-sampling validation Evaluations  Fault Prediction RQ1 Does adding ImpactScale to existing product metrics improve predictive performance? • Predicting Faulty or Not Faulty • Effort-aware Fault Prediction RQ2 • Comparison between ImpactScale and Network Measures  Validating ImpactScale Definition RQ3ICSM2011 @ Williamsburg, 2011-09-27 17 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 26. Predicting Faulty or Not Faulty  Faults are predicted using logistic regression.  MET = Model without ImpactScale / MET+IS = Model with ImpactScale Performance DS1 DS1 Improvem Performance DS2 DS2 Improvem Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS Precision 0.148 0.168 +0.020 Precision 0.139 0.162 +0.020 Recall 0.315 0.392 +0.077 Recall 0.253 0.334 +0.077 F1 0.200 0.234 +0.034 F1 0.177 0.216 +0.034 All improvements are significant in Wilcoxon’s signed rank test. Adding IS improves all performance measures  supports RQ1 is YES. Practitioners’ Point of View  Practically, these Precision/Recall/F1 evaluations are not very useful.  Because in maintenance, high fault-estimated modules tend to be large.  Actually, in the case of DS2, the top 10% of high fault-estimated modules has 24% LOC. It is not effort-effective.ICSM2011 @ Williamsburg, 2011-09-27 18 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 27. Effort-aware Fault Prediction Model Problem  In maintenance, modules estimated as faulty tend to be large.  A large module needs large effort to be reviewed or tested. Practitioners’ Opinion  “Budget and schedule are very demanding. We want to find more faults with less effort.”  Therefore, effort-effectiveness is our main concern. We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10] # errors( x)  It prioritize modules in the order of relative risk to maximize effort-effectiveness. Effort ( x )  Poisson Regression is used to learn relative risk.ICSM2011 @ Williamsburg, 2011-09-27 19 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 28. Results of Effort-aware Evaluation《Effort-based Cumulative Lift Chart of DS1》 AUC is the Area Under the Curve of lift chart. AUC shows overall predictive performance. High AUC means high Faults detected performance. ddr10 is “detected defect rate in first 0.296 DS1-MET 10% effort”. ddr10 shows the predictive 0.186 DS1-MET+IS performance in the limited effort. Optimal High ddr10 means high performance. Effort (LOC inspected) Practitioners’ Point of View Performance DS1- DS1- Improvem Measure MET MET+IS ent by IS In maintenance, budget, schedule AUC 0.635 0.680 +0.045 and effort is always limited, therefore, ddr10 is more important. ddr10 0.186 0.296 ×1.60 All improvements are significant in Wilcoxon’s signed rank test.ICSM2011 @ Williamsburg, 2011-09-27 20-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 29. Results of Effort-aware Evaluation《Effort-based Cumulative Lift Chart of DS1》 《Effort-based Cumulative Lift Chart of DS2》 RQ1 Does adding ImpactScale to existing product metrics improve predictive performance? is YES. Faults detected Faults detected 0.343 0.296 DS1-MET DS2-MET DS1-MET+IS 0.225 DS2-MET+IS 0.186 Optimal Optimal Effort (LOC inspected) Effort (LOC inspected) Performance DS1- DS1- Improvem Performance DS2- DS2- Improvem Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS AUC 0.635 0.680 +0.045 AUC 0.669 0.714 +0.045 ddr10 0.186 0.296 ×1.60 ddr10 0.225 0.343 ×1.53 All improvements are significant in Wilcoxon’s signed rank test.ICSM2011 @ Williamsburg, 2011-09-27 20-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 30. Comparison with Network Measures Network Measures  Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis (SNA) on a software dependency graph representing relationships between binary modules of software systems.  Over 50 network measures were used. For example, • in/out Degrees • Network Diameter a.k.a. Page Rank • Closeness • Eigenvector Centrality, etc.  They and some replication studies [Tosun09][Nguyen10] reported they work well in some cases. RQ2 “Does adding ImpactScale to existing product metrics and network measures improve predictive performance?”ICSM2011 @ Williamsburg, 2011-09-27 21 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 31. ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model Model with existing metrics +ImpactScale +network measures Adding ImpactScale improves +network measures +ImpactScale performance. All improvements and deterioration are significant in Models are learned by using Principal Wilcoxon’s signed rank test. Component Poisson Regression. *: P<0.05, **: P<0.01, unmarked: P<0.001ICSM2011 @ Williamsburg, 2011-09-27 22-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 32. ImpactScale vs. Network MeasuresHierarchical Model Comparison based on Effort-aware Model Model with existing metrics +ImpactScale +network measures Adding ImpactScale improves +network measures +ImpactScale performance. RQ2 All improvements and deterioration are significant in Models are learned by ImpactScale to existing product metrics and test. YES. “Does adding using Principal Wilcoxon’s signed rank is network measures improve predictive performance?”unmarked: P<0.001 Component Poisson Regression. *: P<0.05, **: P<0.01,ICSM2011 @ Williamsburg, 2011-09-27 22-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 33. Validating ImpactScale RQ3 Is considering distant nodes meaningful? 《Test method》 Compare Models with ImpactScale variants with limited maximum distance of path-finding. ddr10 0.40 ddr10 0.45 0.35 0.40 0.35 0.30 0.30 0.25 “Limit=1” 0.25 variant means 0.20 almost 0.20fan-in + fan-out. 0.15 0.15 DS1 DS2 0.10 0.10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Limit of Maximum Distance of Path-finding Answer YES.ICSM2011 @ Williamsburg, 2011-09-27 23 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 34. Overview1. Background and Goal2. Definition of ImpactScale3. Measuring ImpactScale in Real Systems4. Fault Prediction and Evaluation5. SummaryICSM2011 @ Williamsburg, 2011-09-27 24 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 35. Summary of Evaluations RQ1 YES Does adding ImpactScale to existing product metrics improve predictive performance? YES RQ2 “Does adding ImpactScale to existing product metrics and YES network measures improve predictive performance?” RQ3 Is considering distant nodes meaningful? YES Hypothesis A metric that quantifies the scale of change impact can TRUE improve the performance of fault prediction.ICSM2011 @ Williamsburg, 2011-09-27 25 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 36. Threats to Validity Language  ImpactScale has no language-specific feature, but the evaluations are done in only COBOL systems. COBOL has a lot of difference from other languages. Application Domain  The evaluated systems are only in accounting business domain. Call Graph Analysis  The impact of dynamic dispatching (e.g. polymorphism and reflection) is not assessed.ICSM2011 @ Williamsburg, 2011-09-27 26 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 37. Conclusion We defined a new product metric quantifying change impact, called ImpactScale.  Probabilistic propagation  Relation-sensitive propagation  Practical computational time even for large-scale software systems We evaluated its predictive performance in enterprise systems.  Adding ImpactScale improves the performance • Over 1.5 times in first 10% effort (LOC).  Additional Finding • Considering distant nodes in dependency graph is meaningful for fault prediction.ICSM2011 @ Williamsburg, 2011-09-27 27 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 38. Future Works Extending supported languages  Java, C, C++ Expanding use cases  Rapid risk assessment  Watching violations of modularity  Measuring software decayICSM2011 @ Williamsburg, 2011-09-27 28 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 39. Thank you! Kenichi Kobayashi Fujitsu LabsICSM2011 @ Williamsburg, 2011-09-27 29 Copyright 2011 FUJITSU LABORATORIES LIMITED
  • 40. ICSM2011 @ Williamsburg, 2011-09-27