Successfully reported this slideshow.

Understanding Digits in Identifier Names: An Exploratory Study

0

Share

1 of 17
1 of 17

Understanding Digits in Identifier Names: An Exploratory Study

0

Share

Download to read offline

Presented at: The 1st International Workshop on Natural Language-based Software Engineering (NLBSE ‘22)

Date of Conference: May 2022
Conference Location: Virtual

The preprint is available at: https://www.peruma.me/publication/2022-nlbse-digits/2022-nlbse-digits.pdf

A video of the presentation is available at: https://youtu.be/ERD6GTFzOxY

Presented at: The 1st International Workshop on Natural Language-based Software Engineering (NLBSE ‘22)

Date of Conference: May 2022
Conference Location: Virtual

The preprint is available at: https://www.peruma.me/publication/2022-nlbse-digits/2022-nlbse-digits.pdf

A video of the presentation is available at: https://youtu.be/ERD6GTFzOxY

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Understanding Digits in Identifier Names: An Exploratory Study

  1. 1. The 1st International Workshop on Natural Language-based Software Engineering (NLBSE ‘22) Understanding Digits in Identifier Names An Exploratory Study Anthony Peruma and Christian D. Newman Source Code Analysis and Natural Language Lab
  2. 2. SUMMARY We explore the presence and purpose of digits in identifier names through an empirical study of 800 open-source Java systems 01
  3. 3. BACKGROUND Identifier names help developers understand the purpose of the identifier Names must be unambiguous and intent revealing in communicating the purpose and behavior of the code Developers can craft names using a variety of terms making name consistency challenging Prior studies focused on the words that make up identifiers, not digits, such as abbreviations, acronyms, and naming styles. 02
  4. 4. OUR GOAL Understand the part played by digits in identifier names by examining the structure of names containing digits and the semantics expressed by the digits. 03
  5. 5. IMPACT Findings from our study facilitate research and development of tools to aid in name recommendation and appraisal. 04
  6. 6. RESEARCH QUESTIONS 02 How does identifier renaming operations in the source code impact the existence of digits in an identifier's name? • Volume and characteristics • Digit preservation How do developers utilize digits in an identifier's name to convey meaning? • Qualitative examination of names • Taxonomy for the presence of digits 01 05
  7. 7. OUR CONTRIBUTIONS Taxonomy for the Presence of Digits Patterns & Trends of Digits in Names Discussion of Research Challenges 06
  8. 8. STUDY DESIGN 07 Dataset of rename refactorings from 800 Java projects (28,079 rename operations) Extract Identifiers With Digits Identifier Name Splitting Rename Exclusion Quantitative Analysis Qualitative Analysis 15,424 rename operations
  9. 9. EXPERIMENT RESULTS 08
  10. 10. RQ 1: The treatment of names with digits over time 09 Approach: ● Automated examination of the presence and/or absence of digits in an identifier’s name before and after a rename operation Findings: ● Digits are frequently preserved when renamed (e.g., node2 → node3) ○ 43.56% instances preserve digits ○ 33.29% instances remove digits ○ 23.15% instances add digits ● Digit preservation: ○ Most names contain only a single digit in the old & new name ■ 79.93% instances ○ Equal number of digits in the old & new name ■ 91.35% instances ○ The position of the digit is mostly preserved ■ 2nd position in name – 28.73% (e.g., shade2 → shade2Figure) 2 3
  11. 11. RQ 2: The meaning conveyed by digits in a name 10 Approach: ● Manual examination of 375 rename instances by the authors (stratified statistically significant sample) ○ Includes reviewing the surrounding code ○ Snowballing to locate examples of additional instances in the original dataset Findings: ● Taxonomy of 6 categories showing how digits convey meaning in a name: ○ Auto-Generated ○ Distinguisher ○ Synonym ○ Version Number ○ Specification ○ Domain/Technology
  12. 12. RQ 2: The meaning conveyed by digits in a name 10 Auto-Generated • Created by a code generation tool, or IDE; not easily comprehensible • Numbers may have a meaning based on the generation technique • E.g., LA18_6 Distinguisher • Usually the last token in the name • At least two identifiers having a lexically identical name • Avoid name collision at compilation • E.g., auditLog3 Synonym • At least one digit utilized in place of a word • The numbers 2 and 4 are very common example • E.g., convert2RList Domain/Technology • The digit that is part of the name of a domain term or technology • Digits themselves have no individual meaning • E.g., slf4jLogLevel Specification • Represents a specification • Acts as a way to uniquely identify concepts, behaviors, or characteristics. • E.g., arialRegular9Dark Version Number • Digit used to signify a version number • Indicates significant capabilities and limitations or the identifier • E.g., V1DozerTransformModel
  13. 13. DISCUSSION & CONCLUSION 08
  14. 14. KEY CHALLENGES ● Auto-generated code can skew findings; ○ Most likely you will need to run multiple iterations of your data collection/extraction process to isolate auto-generated identifiers ● The volume of auto-generated names can hinder data sampling activities as they may comprise of the majority of identifiers in the code ○ This can vary depending on the type of auto-generated code the project utilizes ● Automatically detecting auto-generated identifiers is not straightforward ○ Heuristics can help, but only partly and is human dependent ● Numbers can have different interpretations; with limited research in this specific area, we don't know if numbers hinder or help comprehension
  15. 15. KEY TAKEAWAYS 12 Digits Are Preserved Post-Rename 01 The Digits Found in Identifier Names Are Meaningful 02 Improve identifier name appraisals and recommendations when developers perform rename operations Utilize static analysis to determine if the digit is related to the code Build a catalog of technologies, standards, or domain terms in the project
  16. 16. IDENTIFIER NAMING STRUCTURE CATALOGUE A resource about what is scientifically known about naming identifiers 13 Part-of-Speech Tagset Linguistic Terminology Linguistic Antipatterns Common Naming Structures Naming Styles Available at: h t t p s : / / w w w . s c a n l . o r g
  17. 17. THANK YOU! For more of what we do, visit: https://www.scanl.org/

×