Mining Java Class Naming Conventions

           Simon Butler, Michel Wermelinger, Yijun Yu & Helen Sharp

                                      Centre for Research in Computing
                                            The Open University


                                          27 September 2011




           Centre for
           Research in Computing                                 m.a.wermelinger@open.ac.uk


Butler et al. (The Open University)     Mining Java Class Naming Conventions   27 September 2011   1/7
Class Identifier Names

         Despite the importance of
         class identifier names                                AbstractCollection           Set
         knowledge of their structure
         is limited
          adjective ∗ noun +
         approximation found to be                                           AbstractSet
         useful, but not universal
         What other part-of-speech
         patterns are commonly used?
         How are component words
                                                             EnumSet          HashSet         TreeSet
         repeated? How often?
         Are there project-specific
         naming conventions?


Butler et al. (The Open University)   Mining Java Class Naming Conventions          27 September 2011   2/7
Distribution of Java Classes in Inheritance Categories



                                                                     0.7
                                                                     0.6
                  Proportion of inheritance categories per project

                                                                     0.5
                                                                     0.4
                                                                     0.3
                                                                     0.2
                                                                     0.1
                                                                     0.0




                                                                           E0I0       E0I1       E0In       E1I0         E1I1   E1In



Butler et al. (The Open University)                                               Mining Java Class Naming Conventions             27 September 2011   3/7
Part-of-Speech Patterns
                Relative frequency of most common PoS patterns
                                             noun +
                                adjective +              verb +
                     noun +           +      adjective +
                                noun                     noun +
                                             noun +
             E0 I 0             0.85                 0.08                     0.01        0.01
             E0 I 1             0.73                 0.15                     0.02        0.02
             E0 I n             0.75                 0.15                     0.03        0.01
             E1 I 0             0.68                 0.12                     0.04        0.03
             E1 I 1             0.70                 0.15                     0.04        0.02
             E1 I n             0.75                 0.14                     0.04        0.02

       4 basic patterns account for 90% of class identifier names
       85% of E0 I0 class identifier names are composed of nouns
       The adjective ∗ noun + approximation includes 85% of class
       identifier names
Butler et al. (The Open University)    Mining Java Class Naming Conventions          27 September 2011   4/7
Component Word Inheritance
              Relative frequency distribution of name inheritance
                               Super Class Name                Interface Name
           Category             All    Fragment                 All Fragment        Both
           E0 I1                  -                    -       0.39          0.37        -
           E0 In                  -                    -       0.38          0.40        -
           E1 I0               0.23                 0.58          -             -        -
           E1 I1               0.14                 0.53       0.24          0.21     0.27
           E1 In               0.11                 0.50       0.15          0.25     0.18


       Fragments of super class name most commonly repeated
       Most common patterns:
               E0 I1 & E0 I1 : noun + interface name , noun + interface fragment
               E1 I0 : noun + super class fragment , noun + super class name
               E1 I1 & E1 In : noun + super class fragment ,
                interface name super class fragment , noun + super class name
Butler et al. (The Open University)   Mining Java Class Naming Conventions     27 September 2011   5/7
Case Study - Freemind

       652 class identifier names
       53 (8%) with uncommon PoS patterns
       Each class inspected with questions:
          1. Is the class identifier name a clear description of the class?
          2. Can the class identifier name be refactored to a more common PoS
             pattern?
          3. Can the class be refactored into classes that could be more
             conventionally named?
       We found:
               Class identifier names describing GUI actions initiated by the user, e.g.
               SelectAllAction ( verb determiner noun )
               Class identifier names that conform to local naming conventions
               7 class identifier names were candidates for name refactoring
               1 class was a candidate for refactoring



Butler et al. (The Open University)   Mining Java Class Naming Conventions   27 September 2011   6/7
Conclusions


       Contributions
               Identification of common PoS structures found in praxis
               Identification of common patterns of component word repetition
               Unconventional class names:
                       may conform to local naming conventions
                       may be candidates for refactoring
                       may indicate smells

       Practical Applications
               Recovery of class naming conventions
               Identification of unconventionally named classes
               Class identifier name recommendation systems




Butler et al. (The Open University)   Mining Java Class Naming Conventions   27 September 2011   7/7

Natural Language Analysis - Mining Java Class Naming Conventions

  • 1.
    Mining Java ClassNaming Conventions Simon Butler, Michel Wermelinger, Yijun Yu & Helen Sharp Centre for Research in Computing The Open University 27 September 2011 Centre for Research in Computing m.a.wermelinger@open.ac.uk Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 1/7
  • 2.
    Class Identifier Names Despite the importance of class identifier names AbstractCollection Set knowledge of their structure is limited adjective ∗ noun + approximation found to be AbstractSet useful, but not universal What other part-of-speech patterns are commonly used? How are component words EnumSet HashSet TreeSet repeated? How often? Are there project-specific naming conventions? Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 2/7
  • 3.
    Distribution of JavaClasses in Inheritance Categories 0.7 0.6 Proportion of inheritance categories per project 0.5 0.4 0.3 0.2 0.1 0.0 E0I0 E0I1 E0In E1I0 E1I1 E1In Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 3/7
  • 4.
    Part-of-Speech Patterns Relative frequency of most common PoS patterns noun + adjective + verb + noun + + adjective + noun noun + noun + E0 I 0 0.85 0.08 0.01 0.01 E0 I 1 0.73 0.15 0.02 0.02 E0 I n 0.75 0.15 0.03 0.01 E1 I 0 0.68 0.12 0.04 0.03 E1 I 1 0.70 0.15 0.04 0.02 E1 I n 0.75 0.14 0.04 0.02 4 basic patterns account for 90% of class identifier names 85% of E0 I0 class identifier names are composed of nouns The adjective ∗ noun + approximation includes 85% of class identifier names Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 4/7
  • 5.
    Component Word Inheritance Relative frequency distribution of name inheritance Super Class Name Interface Name Category All Fragment All Fragment Both E0 I1 - - 0.39 0.37 - E0 In - - 0.38 0.40 - E1 I0 0.23 0.58 - - - E1 I1 0.14 0.53 0.24 0.21 0.27 E1 In 0.11 0.50 0.15 0.25 0.18 Fragments of super class name most commonly repeated Most common patterns: E0 I1 & E0 I1 : noun + interface name , noun + interface fragment E1 I0 : noun + super class fragment , noun + super class name E1 I1 & E1 In : noun + super class fragment , interface name super class fragment , noun + super class name Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 5/7
  • 6.
    Case Study -Freemind 652 class identifier names 53 (8%) with uncommon PoS patterns Each class inspected with questions: 1. Is the class identifier name a clear description of the class? 2. Can the class identifier name be refactored to a more common PoS pattern? 3. Can the class be refactored into classes that could be more conventionally named? We found: Class identifier names describing GUI actions initiated by the user, e.g. SelectAllAction ( verb determiner noun ) Class identifier names that conform to local naming conventions 7 class identifier names were candidates for name refactoring 1 class was a candidate for refactoring Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 6/7
  • 7.
    Conclusions Contributions Identification of common PoS structures found in praxis Identification of common patterns of component word repetition Unconventional class names: may conform to local naming conventions may be candidates for refactoring may indicate smells Practical Applications Recovery of class naming conventions Identification of unconventionally named classes Class identifier name recommendation systems Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 7/7