Your SlideShare is downloading. ×
ICSM10a.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ICSM10a.ppt

75
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
75
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Physical and Conceptual Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Physical and Conceptual Identifier Dispersion: Gu´h´neuc, e eGiuliano Antoniol Measures and Relation to Fault PronenessIntroductionOur studyDispersion Venera Arnaoudova Laleh Eshkevari Rocco Olivetomeasures Yann-Ga¨l Gu´h´neuc Giuliano Antoniol e e eOur study - refinedCase study ´ SOCCER Lab. – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada eRQ1 – Metric Relevance SE@SA Lab – DMI, University of Salerno - Salerno - ItalyRQ2 – Relation to Faults ´ Ptidej Team – DGIGL, Ecole Polytechnique de Montr´al, Qc, Canada eConclusions andfuture work September 15, 2010 SOftware Cost-effective Change and Evolution Research Lab Software Engineering @ SAlerno Pattern Trace Identification, Detection, and Enhancement in Java
  • 2. Physical and Conceptual Outline Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Introduction Gu´h´neuc, e eGiuliano Antoniol Our studyIntroductionOur study Dispersion measuresDispersionmeasuresOur study - refined Our study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults Case studyConclusions andfuture work RQ1 – Metric Relevance RQ2 – Relation to Faults Conclusions and future work 2 / 16
  • 3. Physical and Conceptual Introduction Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e Fault identificationGiuliano Antoniol size (e.g., [Gyim´thy et al., 2005]) oIntroduction cohesion (e.g., [Liu et al., 2009])Our study coupling (e.g., [Marcus et al., 2008])Dispersionmeasures number of changes (e.g., [Zimmermann et al., 2007])Our study - refined Importance of linguistic informationCase studyRQ1 – Metric Relevance program comprehension (e.g.,RQ2 – Relation to Faults [Takang et al., 1996, Deissenboeck and Pizka, 2006,Conclusions andfuture work Haiduc and Marcus, 2008, Binkley et al., 2009]) code quality (e.g., [Marcus et al., 2008, Poshyvanyk and Marcus, 2006, Butler et al., 2009]) 3 / 16
  • 4. Physical and Conceptual Our study Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l Gu´h´neuc, e e e Term dispersionGiuliano Antoniol We are interested in studying the relation between termIntroduction dispersion and the quality of the source code.Our studyDispersion term basic component of identifiersmeasures dispersion the way terms are scattered among differentOur study - refined entities (attributes and methods)Case studyRQ1 – Metric Relevance quality absence of faultsRQ2 – Relation to FaultsConclusions andfuture work Example: What is the impact of using getRelativePath, returnAbsolutePath, and setPath as method names on the fault proneness of those methods? 4 / 16
  • 5. Physical and Conceptual Dispersion measures Identifier Dispersion (1/3) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l Gu´h´neuc, e e e Physical dispersion - EntropyGiuliano Antoniol TermsIntroduction EntropyOur studyDispersion feemeasuresOur study - refinedCase study fooRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work bar Entities E1 E2 E3 E4 E5 The circle indicates the occurrences of a term in an entity. The higher the size of the circle the higher the number of occurrences. 5 / 16
  • 6. Physical and Conceptual Dispersion measures Identifier Dispersion (2/3) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Conceptual dispersion - Context Coverage Gu´h´neuc, e eGiuliano Antoniol Entity Contexts C4Introduction C1 E4 C2Our study E1 E2Dispersionmeasures E5 Terms ContextOur study - refined E3 coverageCase study C3RQ1 – Metric Relevance feeRQ2 – Relation to Faults Entity contexts are identified taking into account the terms contained in the entities.Conclusions and foofuture work bar C1 C2 C3 C4 Contexts The star indicates that the term appears in the particular context. 6 / 16
  • 7. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano AntoniolIntroduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 8. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano AntoniolIntroduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults ?Conclusions andfuture work th CC Context Coverage 7 / 16
  • 9. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano AntoniolIntroduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric Relevance H: used in few identifiersRQ2 – Relation to Faults CC: used in similar contextsConclusions andfuture work th CC Context Coverage 7 / 16
  • 10. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano Antoniol ?Introduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 11. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano Antoniol H: used in many identifiersIntroduction CC: used in similar contexts th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 12. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano AntoniolIntroduction th HOur studyDispersionmeasuresOur study - refinedCase study ?RQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 13. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano AntoniolIntroduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults H: used in few identifiersConclusions and CC: used in different contextsfuture work th CC Context Coverage 7 / 16
  • 14. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e EntropyGiuliano Antoniol ?Introduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 15. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiersGiuliano Antoniol CC: used in different contextsIntroduction th HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 16. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiersGiuliano Antoniol CC: used in different contextsIntroduction th ! HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage 7 / 16
  • 17. Physical and Conceptual Dispersion measures Identifier Dispersion Aggregated metric - numHEHCC Venera (3/3)Arnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e Entropy H: used in many identifiersGiuliano Antoniol CC: used in different contextsIntroduction th ! HOur studyDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work th CC Context Coverage For each entity, numHEHCC counts the number of such terms 7 / 16
  • 18. Physical and Conceptual Our study - refined Identifier Dispersion (1/2) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e eGiuliano Antoniol Research question 1Introduction RQ1 – Metric Relevance: Does numHEHCC captureOur study characteristics different from size?Dispersionmeasures Our believe: Yes it does, although we expect someOur study - refined overlap.Case studyRQ1 – Metric Relevance To this end, we verify the following:RQ2 – Relation to Faults 1. To what extend numHEHCC and size vary together.Conclusions andfuture work 2. Can size explain numHEHCC ? 3. Does numHEHCC bring additional information to size for fault explanation? 8 / 16
  • 19. Physical and Conceptual Our study - refined Identifier Dispersion (2/2) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e eGiuliano Antoniol Research question 2IntroductionOur study RQ2 – Relation to Faults: Do term entropy andDispersion context coverage help to explain the presence of faultsmeasures in an entity?Our study - refinedCase study Our believe: Yes it does!RQ1 – Metric RelevanceRQ2 – Relation to Faults How?Conclusions and 1. Estimate the risk of being faulty when entities containfuture work terms with high entropy and high context coverage. 9 / 16
  • 20. Physical and Conceptual Objects Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e eGiuliano Antoniol ObjectsIntroduction ArgoUML v0.16 – a UML modeling CASE tool.Our study Rhino v1.4R3 – a JavaScript/ECMAScript interpreterDispersionmeasures and compiler.Our study - refinedCase studyRQ1 – Metric Relevance Program LOC # Entities # TermsRQ2 – Relation to Faults ArgoUML 97,946 12,423 2517Conclusions andfuture work Rhino 18,163 1,624 949 We consider as entities both methods and attributes. 10 / 16
  • 21. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (1/3) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Results for RQ1 – Metric Relevance Gu´h´neuc, e eGiuliano Antoniol To what extend numHEHCC and size vary together?IntroductionOur studyDispersion Correlation between numHEHCC and LOCmeasuresOur study - refined ArgoUML: 40% Rhino: 43%Case study numHEHCCRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work LOC 11 / 16
  • 22. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (2/3) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Results for RQ1 – Metric Relevance Gu´h´neuc, e eGiuliano Antoniol Can size explain numHEHCC ?IntroductionOur studyDispersionmeasures ArgoUML: 17%Our study - refined Rhino: 19%Case studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work Composition of numHEHCC. 12 / 16
  • 23. Physical and Conceptual Case study Identifier Dispersion RQ1 – Metric Relevance (3/3) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l Gu´h´neuc, e e e Results for RQ1 – Metric Relevance (cont’d)Giuliano AntoniolIntroduction Does numHEHCC bring additional information to sizeOur study for fault explanation?Dispersionmeasures Variables Coefficients p-values Intercept -1.688e+00 2e − 16Our study - refined LOC 7.703e-03 8.34e − 10Case study MArgoUMLRQ1 – Metric Relevance numHEHCC 7.490e-02 1.42e − 05RQ2 – Relation to Faults LOC:numHEHCC -2.819e-04 0.000211Conclusions andfuture work Intercept -4.9625130 2e − 16 LOC 0.0041486 0.17100 MRhino numHEHCC 0.2446853 0.00310 LOC:numHEHCC -0.0004976 0.29788 13 / 16
  • 24. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain termsGiuliano Antoniol with high entropy and high context coverage.IntroductionOur study All entitiesDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work 14 / 16
  • 25. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain termsGiuliano Antoniol with high entropy and high context coverage.IntroductionOur study All entitiesDispersionmeasuresOur study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work 14 / 16
  • 26. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain termsGiuliano Antoniol with high entropy and high context coverage.IntroductionOur study All entitiesDispersionmeasuresOur study - refined 10% of the entitiesCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults numHEHCCConclusions andfuture work 14 / 16
  • 27. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain termsGiuliano Antoniol with high entropy and high context coverage.IntroductionOur study All entitiesDispersionmeasuresOur study - refined 10% of the entitiesCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults numHEHCCConclusions andfuture work Risk of being faulty? 14 / 16
  • 28. Physical and Conceptual Case study Identifier Dispersion Results for RQ2 – Relation to Faults (1/1) VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e e The risk of being faulty when entities contain termsGiuliano Antoniol with high entropy and high context coverage.IntroductionOur study All entitiesDispersionmeasuresOur study - refined 10% of the entitiesCase studyRQ1 – Metric RelevanceRQ2 – Relation to Faults numHEHCCConclusions andfuture work Risk of being faulty? ArgoUML: 2 x higher Rhino: 6 x higher 14 / 16
  • 29. Physical and Conceptual Conclusions and future work Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Conclusions Gu´h´neuc, e eGiuliano Antoniol Entropy and context coverage, together, captureIntroduction characteristics different from size!Our study Entropy and context coverage, together, help to explainDispersion the presence of faults in entities!measuresOur study - refinedCase study Future directionsRQ1 – Metric RelevanceRQ2 – Relation to Faults Replicate the study to other systems.Conclusions and Use entropy and context coverage to suggestfuture work refactoring. Study the impact of lexicon evolution on entropy and context coverage. 15 / 16
  • 30. Physical and Conceptual Thank you! Identifier Dispersion VeneraArnaoudova, Laleh Eshkevari, RoccoOliveto, Yann-Ga¨l e Gu´h´neuc, e eGiuliano AntoniolIntroductionOur studyDispersionmeasures Questions?Our study - refinedCase studyRQ1 – Metric RelevanceRQ2 – Relation to FaultsConclusions andfuture work 16 / 16
  • 31. Physical and Conceptual Binkley, D., Davis, M., Lawrie, D., and Morrell, C. Identifier Dispersion (2009). Venera To CamelCase or Under score.Arnaoudova, Laleh Eshkevari, Rocco In Proceedings of 17th IEEE International Conference onOliveto, Yann-Ga¨l Gu´h´neuc, e e e Program Comprehension. IEEE CS Press.Giuliano Antoniol Butler, S., Wermelinger, M., Yu, Y., and Sharp, H.Introduction (2009).Our study Relating identifier naming flaws and code quality: AnDispersionmeasures empirical study.Our study - refined In Proceedings of the 16th Working Conference onCase study Reverse Engineering, pages 31–35. IEEE CS Press.RQ1 – Metric RelevanceRQ2 – Relation to Faults Deissenboeck, F. and Pizka, M. (2006).Conclusions andfuture work Concise and consistent naming. Software Quality Journal, 14(3):261–282. Gyim´thy, T., Ferenc, R., and Siket, I. (2005). o Empirical validation of object-oriented metrics on open source software for fault prediction. 16 / 16
  • 32. Physical and IEEE Transactions on Software Engineering, Conceptual Identifier 31(10):897–910. Dispersion Venera Haiduc, S. and Marcus, A. (2008).Arnaoudova, Laleh Eshkevari, Rocco On the use of domain terms in source code.Oliveto, Yann-Ga¨l Gu´h´neuc, e e e In Proceedings of 16th IEEE International Conference onGiuliano Antoniol Program Comprehension, pages 113–122. IEEE CSIntroduction Press.Our study Liu, Y., Poshyvanyk, D., Ferenc, R., Gyim´thy, T., and oDispersionmeasures Chrisochoides, N. (2009).Our study - refined Modelling class cohesion as mixtures of latent topics.Case study In Proceedings of 25th IEEE International Conference onRQ1 – Metric RelevanceRQ2 – Relation to Faults Software Maintenance, pages 233–242, Edmonton,Conclusions and Canada. IEEE CS Press.future work Marcus, A., Poshyvanyk, D., and Ferenc, R. (2008). Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Transactions on Software Engineering, 34(2):287–300. 16 / 16
  • 33. Physical and Conceptual Poshyvanyk, D. and Marcus, A. (2006). Identifier Dispersion The conceptual coupling metrics for object-oriented Venera systems.Arnaoudova, Laleh Eshkevari, Rocco In Proceedings of 22nd IEEE International Conference onOliveto, Yann-Ga¨l Gu´h´neuc, e e e Software Maintenance, pages 469 – 478. IEEE CS Press.Giuliano Antoniol Takang, A., Grubb, P., and Macredie, R. (1996).Introduction The effects of comments and identifier names onOur study program comprehensibility: an experiential study.Dispersionmeasures Journal of Program Languages, 4(3):143–167.Our study - refined Zimmermann, T., Premraj, R., and Zeller, A. (2007).Case studyRQ1 – Metric Relevance Predicting defects for eclipse.RQ2 – Relation to Faults In Proceedings of the Third International Workshop onConclusions andfuture work Predictor Models in Software Engineering. 16 / 16

×