Exploring the Influence of Identifier Names
                             on Code Quality:
                            an emp...
Introduction


Identifier names
        primary source of concepts in source code
        crucial to program comprehension ...
Evaluating Identifier Name Quality


Relf’s Identifier Naming Style Guidelines
        21 guidelines for Ada & Java
        ...
Evaluating Code Quality

Static analysis
        FindBugs
               Java specific static analysis tool
               ...
Methodology




Data Collection
        8 mature FLOSS Java projects from different domains
        each with 1-12 thousand...
Methodology

Naming Quality
        Names split into hard words on typographical boundaries
               NullPointerExce...
Statistical Analysis

Null hypothesis: independent distributions
        χ2 test applied to assess independence of identifi...
Non-Dictionary Words Flaw




Simon Butler et al. (Open Univ., UK)   The Influence of Identifiers on Code Quality   CSMR’10 ...
Identifier flaws and FindBugs priority 2 warnings




                                                                      ...
Identifier flaws and Cyclomatic Complexity >= 10




                                                                       ...
Identifier flaws and Less-Readable methods




                                                                             ...
Identifier flaws and Less-Maintainable methods




                                                                         ...
Conclusions


We found:
        Poor quality identifier names are associated with:
               more complex
            ...
Upcoming SlideShare
Loading in …5
×

The influence of identifiers on code quality

666 views

Published on

The slides of our paper (http://oro.open.ac.uk/19224/) at the European Conf. on Software Maintenance and Reengineering, Madrid, 15-18 March 2010.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
666
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The influence of identifiers on code quality

  1. 1. Exploring the Influence of Identifier Names on Code Quality: an empirical study Simon Butler, Michel Wermelinger, Yijun Yu and Helen Sharp Centre for Research in Computing The Open University, UK CSMR, Madrid, 18 March 2010 Centre for Research in Computing Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 1 / 13
  2. 2. Introduction Identifier names primary source of concepts in source code crucial to program comprehension and readability reflect cognitive processes A wider influence? connection between readability and defects (Buse & Weimer) Research Question ‘What is the influence of identifier name quality on source code quality?’ Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 2 / 13
  3. 3. Evaluating Identifier Name Quality Relf’s Identifier Naming Style Guidelines 21 guidelines for Ada & Java evaluated empirically focus on typography of names simple approach to use of natural language Applying the Guidelines adapted 9 guidelines as naming flaw indicators length: too few/many words/characters typographical conventions: capitalization, type encoding natural language: English and extended dictionaries Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 3 / 13
  4. 4. Evaluating Code Quality Static analysis FindBugs Java specific static analysis tool Identifies a range of priority 1 and 2 bug patterns Google: most identified issues required correction Metrics Readability human-trained layout metric (Buse & Weimer) Cyclomatic Complexity to measure branching complexity Maintainability Index based on LOC, cyclomatic complexity, Halstead volume (Welker et al.) Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 4 / 13
  5. 5. Methodology Data Collection 8 mature FLOSS Java projects from different domains each with 1-12 thousand methods computed metrics and extracted names from source code ran FindBugs on corresponding bytecode Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 5 / 13
  6. 6. Methodology Naming Quality Names split into hard words on typographical boundaries NullPointerException is split into {Null, Pointer, Exception} MOUSE EVENT MASK is split into {MOUSE, EVENT, MASK} Extended dictionaries created with unrecognised hard words built dictionaries for words used in 3, 5 or 10 unique identifiers Identifier names analysed for compliance with each guideline Code Quality binary classification of methods into with/without FindBugs priority 1 (or 2) warnings readability below/above 0.5 cyclomatic complexity below/above 6 (or 10) maintainability index below/above 65 Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 6 / 13
  7. 7. Statistical Analysis Null hypothesis: independent distributions χ2 test applied to assess independence of identifier flaws and: FindBugs warnings less readable methods less maintainable methods less complex methods null hypothesis was rejected if p < 5% Guidelines as classifiers? Applied diagnostic test evaluation used in medicine Compared each guideline vs reference classifiers JFreeChart FindBugs Priority Two Warnings Non-Dictionary Words methods with methods with- sensitivity = 103 ÷ (103 + 37) = 0.74 out specificity = 5165 ÷ (2925 + 5165) = 0.64 AUC = 0.69 methods with 103 2925 methods without 37 5165 Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 7 / 13
  8. 8. Non-Dictionary Words Flaw Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 8 / 13
  9. 9. Identifier flaws and FindBugs priority 2 warnings JasperReports JFreeChart Hibernate Freemind Tomcat Cactus jEdit Ant Capitalisation Anomaly .62 .62 – – .57 Excessive Words .55 .55 .58 – External Underscores * * * * Long Identifier .59 .57 – Naming Convention Anomaly Number of Words .56 .59 – .55 .55 Numeric Identifier * * * * Short Identifier Name .56 .58 .62 – .56 .57 Type Encoding * * * Non-Dictionary Words .60 .64 .62 – .63 .69 .59 Extended 3 .64 .66 .59 .63 .59 Extended 5 .64 .65 .64 – .63 .72 .59 Extended 10 .63 .64 .64 – .61 .72 .61 Less-readable .67 .67 .67 – .66 .68 p < 0.001 p < 0.05 p >= 0.05 * No flaw Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 9 / 13
  10. 10. Identifier flaws and Cyclomatic Complexity >= 10 JasperReports JFreeChart Hibernate Freemind Tomcat Cactus jEdit Ant Capitalisation Anomaly .67 .72 .63 .64 .66 .61 .73 .75 Excessive Words .55 .55 .58 .65 .58 .60 External Underscores * * * * Long Identifier .56 .57 .68 .66 .58 .57 Naming Convention Anomaly .55 Number of Words .55 .61 .57 .60 .64 .58 .59 Numeric Identifier * * * * Short Identifier Name .63 .65 .57 .62 .62 .55 .60 .62 Type Encoding * * * Non-Dictionary Words .67 .70 .67 .74 .70 .64 .78 .76 Extended 3 .69 .70 .61 .73 .68 .64 .75 .75 Extended 5 .70 .69 .65 .75 .73 .66 .82 .76 Extended 10 .70 .70 .66 .76 .74 .66 .81 .77 p < 0.001 p < 0.05 p >= 0.05 * No flaw Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 10 / 13
  11. 11. Identifier flaws and Less-Readable methods JasperReports JFreeChart Hibernate Freemind Tomcat Cactus jEdit Ant Capitalisation Anomaly .62 .55 .61 .60 .62 .62 .63 .66 Excessive Words .59 .58 .61 .57 External Underscores * * * * Long Identifier .56 .58 .60 .58 .56 .56 Naming Convention Anomaly Number of Words .56 .60 .55 Numeric Identifier * * * * Short Identifier Name .57 Type Encoding * * * Non-Dictionary Words .65 .56 .61 .66 .65 .65 .62 .68 Extended 3 .62 .56 .58 .62 .60 .65 Extended 5 .64 .57 .60 .63 .63 .66 Extended 10 .65 .56 .58 .63 .65 .63 .68 p < 0.001 p < 0.05 p >= 0.05 * No flaw Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 11 / 13
  12. 12. Identifier flaws and Less-Maintainable methods JasperReports JFreeChart Hibernate Freemind Tomcat Cactus jEdit Ant Capitalisation Anomaly .78 .78 .76 .67 .67 .64 .81 .77 Excessive Words .59 .58 .67 .68 .62 .57 .63 .55 External Underscores * * * .57 * Long Identifier .57 .68 .67 .73 .71 .57 .61 .58 Naming Convention Anomaly .55 .57 .56 .55 Number of Words .57 .61 .62 .62 .65 .56 .59 .60 Numeric Identifier * * * * Short Identifier Name .59 .65 .62 .65 .66 .56 .61 .63 Type Encoding * * * Non-Dictionary Words .76 .77 .79 .82 .72 .72 .80 .78 Extended 3 .81 .76 .69 .83 .72 .71 .84 .80 Extended 5 .82 .76 .75 .85 .78 .74 .85 .80 Extended 10 .80 .77 .77 .85 .80 .74 .84 .80 p < 0.001 p < 0.05 p >= 0.05 * No flaw Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 12 / 13
  13. 13. Conclusions We found: Poor quality identifier names are associated with: more complex less readable less maintainable potentially more buggy code Natural language content of identifier names is a classifier for source code quality Identifier name length is a classifier for complexity and maintainability Opposite associations only in commercialised projects suggesting differences between open source and commercial code Simon Butler et al. (Open Univ., UK) The Influence of Identifiers on Code Quality CSMR’10 13 / 13

×