SlideShare a Scribd company logo
1 of 76
Download to read offline
Aspect Mining
  an emerging research domain




                     Kim Mens
               http://www.info.ucl.ac.be/~km
Special thanks to Andy Kellens, Paolo Tonella & Kris Gybels
                              1
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            2
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            2
Aspects in existing
         systems?

‣ Aspects offer
    - Better modularisation and “separation of concerns”
    - Improved “ilities”
‣ Also useful for already existing systems
    - But how to migrate a legacy system to AOSD?


                            3
Migrating a System




        4
Migrating a System




Aspect Mining


                4
Migrating a System




 Aspect Mining

Aspect Refactoring
                     4
Why do we need aspect
       mining?
‣ Legacy systems
   - large, complex systems
   - not always clearly documented
   - need to find the crosscutting concerns (what?)
   - need to find the extent of the crosscutting
     concerns (where?)
   - program understanding/documentation
                          5
Why do we need aspect
       mining?
‣ Legacy systems
   - large, complex systems
   - not always clearly documented
   - need to find the crosscutting concerns (what?)
   - need to find the extent of the crosscutting
     concerns (where?)
   - program understanding/documentation
                          5
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            6
Different Kinds of
     Mining Approaches

1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques




                           7
Early Aspects

‣ Try to find the aspects in artifacts of the early phases
  of the software life-cycle :
    - requirements
    - design
‣ Less useful when trying to identify aspects in legacy
  code

                            8
Different Kinds of
     Mining Approaches

1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques




                           9
Special-Purpose
        Code Browsers
‣ Idea: help a developer in exploring a concern
‣ By browsing / navigating the source code
    - User provides interesting “seed” in the code
    - Tool suggests interesting related points to consider
‣ Iteratively, build up a model of the crosscutting concerns
‣ Examples:
    - FEAT, Aspect Browser, (Extended) Aspect Mining
      Tool, Prism, JQuery, Soul
                            10
Special-Purpose
        Code Browsers
‣ Idea: help a developer in exploring a concern
‣ By browsing / navigating the source code
    - User provides interesting “seed” in the code
    - Tool suggests interesting related points to consider
‣ Iteratively, build up a model of the crosscutting concerns
‣ Examples:
    - FEAT, Aspect Browser, (Extended) Aspect Mining
      Tool, Prism, JQuery, Soul
                            10
FEAT
‣ Create a “Concern Graph”, a representation of the
  concern mapping back to the source code.
‣ Consists out of a set of program entities
‣ Start out with a number of program entities (e.g. “all
  callers of the method print()”)
‣ Iteratively explore the relations and refine the
  concern graph
    - fan-in (callers, ...)
    - fan-out (callees, ...)
                               11
FEAT




 12
Special-Purpose
        Code Browsers

‣ Disadvantages :
   - Require quite some manual effort (does it scale?)
        ➡ Try to automate this process
   - Requires some preliminary knowledge about
     system (need for seeds)


                          13
Different Kinds of
     Mining Approaches

1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques




                          14
Automated Approaches

‣ Idea: Semi-automatically find possible aspect
  candidates in source code of a system
‣ No magic:
    - Manual filtering of the results
    - False positives/negatives
    - May require preliminary knowledge
‣ Possibility to use results as seed for browsers
                             15
Automated Approaches
‣ Hidden assumption of all techniques :
   - look for symptoms of scattering and tangling
   - either static (code) or dynamic (run-time)
‣ Discover these symptoms using a variety of data
  analysis techniques
   - inspired by data mining, code analysis, ...
   - but specifically (re)designed for aspect mining

                           16
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            17
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               18
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               18
Recurring Patterns
              [Breu&Krinke]

‣ Assumption: if certain method calls always happen
  before/after the same call, they might be part of an
  aspect
‣ “Reverse engineering” of manual implementation of
  before/after advice
‣ Technique: find pairs of methods which happen right
  before/after each other and check wether this pattern
  recurs in the code


                           19
Breu and Krinke propose an aspect mining technique n
                            (Dynamic Aspect Mining Tool) [13], which analyses pr
                            flecting the run-time behaviour of a system in search of
                   Recurring Patterns
                            tion patterns. To do so, they introduce the notion of ex
                     [Breu&Krinke]
                            between method invocations. Consider the following exa
                            trace, where the capitals represent method names:
                            B() {
                                     C() {
                                             G() {}
                                             H() {}
                                         }
                                }
                            A() {}
                           Breu and Krinke distinguish between four different exe
    Inherently dynamic (analysis of execution is called but: A), outside-after
                           outside-before (e.g., B trace), before
‣
        static : use control-flowlast call in C).(e.g., Gthese execution in C) and in
                           after B), inside-first
                                  graphs to calculateis the first call relations, th
                                                         calling relations
      -                    is the                 Using
      - hybrid : augmentrithm discovers aspect candidates based on recurring pa
                            dynamic information with static type
        information to improve results an execution relation occurs more than
                           invocations. If
                           uniformly (for instance, every invocation of method B i
    CCC needs to be rigorously implemented (calls always in same an aspe
‣                          invocation of method A), it is considered to be
    order)                 ensure that the aspect candidates are sufficiently cross-
                                       20
                           an extra requirement that the recurring relations shou
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               21
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               21
Clone Detection
           [Bruntink, Shepherd]

‣ Assumption: crosscutting concerns result in code
  duplication
‣ Certain CCCs are implemented using a recurring
  pattern, idiom
‣ Technique: find classes of related code
   - exact same
   - same structure; different literals

                            22
only the lines marked with ‘C’ are included in the clones.
   a cer-      CCFinder allows clones to start and end with little regard to
                             Clone Detection
 e clone       syntactic units. In contrast, Bauhaus’ ccdiml does not allow
  ce pre-
                       [Bruntink, Shepherd]
               this, due to its AST-based clone detection algorithm.
   differ-
 n order          M   C if (r != OK)
 , preci-         M   C{
                  M   C   ERXA_LOG(r, 0, (quot;PLXAmem_malloc failure.quot;));
re these          M   C
                  M   C   ERXA_LOG(VSXA_MEMORY_ERR, r,
                  M   C          (quot;%s: failed to allocated %d bytes.quot;,
etectors          M              func_name, toread));
overage           M
                  M         r = VSXA_MEMORY_ERR;
fraction          M     }
the first
                Figure 3. CCFinder clone covering memory error
thm de-
             source code analysis
       ‣        handling.
 in Fig-
 Finder,
        ‣ applicable if the CCCs are implemented as a pattern
                  Furthermore this clone class does not cover memory er-
 by the        ror handling code exclusively. In Figure 2(d), note that the
ollows:        precision obtained for the 23 clone class is roughly 82%.
                                          first
               Through inspection of the code we found that some of the
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               24
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               24
Formal Concept Analysis
•       Starts from
    •    a set of elements
    •    a set of properties of those elements

•       Determines concepts
    •    Maximal groups of elements and properties
    •    Group:
         •   Every element of the concept has those properties
         •   Every property of the concept holds for those elements
    •    Maximal
         •   No other element (outside the concept) has those same properties
         •   No other property (outside the concept) is shared by all elements
                                          25
Formal Concept Analysis

            object-                          static   dynamic
                       functional    logic
            oriented                         typing    typing

  C++          X           -           -       X         -

  Java         X           -           -       X         -


Smalltalk      X           -           -       -        X


Scheme         -           X           -       -        X

 Prolog        -           -          X        -        X




                                26
Formal Concept Analysis

            object-                          static   dynamic
                       functional    logic
            oriented                         typing    typing

  C++          X           -           -       X         -

  Java         X           -           -       X         -


Smalltalk      X           -           -       -        X


Scheme         -           X           -       -        X

 Prolog        -           -          X        -        X




                                26
Formal Concept Analysis

            object-                          static   dynamic
                       functional    logic
            oriented                         typing    typing

  C++          X           -           -       X         -

  Java         X           -           -       X         -


Smalltalk      X           -           -       -        X


Scheme         -           X           -       -        X

 Prolog        -           -          X        -        X




                                26
Formal Concept Analysis

            object-                          static   dynamic
                       functional    logic
            oriented                         typing    typing

  C++          X           -           -       X         -

  Java         X           -           -       X         -


Smalltalk      X           -           -       -        X


Scheme         -           X           -       -        X

 Prolog        -           -          X        -        X




                                26
Formal Concept Analysis




           27
Identifier Analysis (Mens &
               Tourwé)


‣ Assumption: methods belonging to the same concern
  are implemented using similar names
‣ Eg. methods implementing Undo concern will have
  “undo” in their signature
‣ Technique: Use Formal Concept Analysis with the
  methods as objects and the different substrings as
  properties


                           28
Identifier Analysis
                                figu    drawi   reque   remo   upda   chan   eve
                                                                                 …
                                 re      ng       st     ve     te     ge     nt
        drawingRequestUpdate
                                 -          X    X       -      X      -     -   …
       (DrawingChangeEvent e)
         figureRequestRemove
                                 X          -    X       X      -      -     -   …
        (FigureChangeEvent e)
         figureRequestUpdate
                                 X          -    X       -      X      -     -   …
        (FigureChangeEvent e)
         figureRequestRemove
                                 X          -    X       X      -      -     -   …
        (FigureChangeEvent e)
         figureRequestUpdate
                                 X          -    X       -      X      -     -   …
        (FigureChangeEvent e)

                 …              …           …    X       …      …      …     …   …



‣ Lots of concepts: filter irrelevant ones
‣ Group “similar” concepts
                                       29
Execution traces (ceccato &
               tonella)
‣ Assumption: use case scenario’s are a good indicator
  for crosscutting concerns
‣ Technique: Analyse execution trace using FCA with
  use cases as objects; methods called from within use
  case as attributes
‣ Concept specific to one use case is possible aspect if:
    - Methods belong to more than one class
      (scattering)
    - Methods of same class occur in multiple use cases
      (tangling)
                            30
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               31
Fan-in Analysis (Marin)


‣ Assumption: The fan-in metric is a good indicator of
  scattering
‣ “Methods which get called often from different
  contexts are possible crosscutting concerns”
‣ Technique: calculate the fan-in of each method, filter
  auxiliary methods/accessors and sort the resulting
  methods


                           32
Unique Methods (Gybels & kellens)



‣ Assumption: Some CCCs are implemented by calling
  a single entity from all over the code
‣ E.g. Logging
‣ Technique: devise a metric (unique method) which
  finds such crosscutting concerns



                          33
Unique Methods (Gybels & kellens)
                                   Logging
                                   log()       ClassB




         Unique Method
 A method without a return value
  which implements a message       ClassA    ClassC
implemented by no other method.

   Q: Why no return type?
   A: Use of value in base
        computation




                                   34
Unique Methods (Gybels & kellens)

            Class                          Selector              Calls
           Parcel                       #markAsDirty              23
     ParagraphEditor                     #resetTypeIn             19
         UIPainter            #broadCastPendingSelectionChange    18
ControllerCodeRegenerator                  #pushPC                15
   AbstractChangeList                  #updateSelection:          15
       PundleModel                     #updateAfterDo:            10
    ???????????????????????        #beGarbageCollectable          ??



                                  35
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               36
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               36
Cluster Analysis


‣ Input:
   - a set of objects
   - a distance measure between those objects
‣ Output:
   - groups of objects which are close (according to
     the distance function)


                          37
Aspect Mining Using Cluster
       Analysis (He, Shepherd)

‣ Distance function between two methods
‣ Call Clustering:
   - distance in function of times they occur together
     in same method (cfr. recurring execution traces)
‣ Method Clustering:
   - distance in function of commonalities in name of
     methods (cfr. identifier analysis)


                          38
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               39
Some Automated
              Approaches
‣ Pattern matching
  •    Analysing recurring patterns of execution traces
  •    Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
  ‣    Execution traces / identifiers
‣ Natural language processing on code
‣ AOP idioms
  ‣    Fan-in analysis / Detecting unique methods
‣ Cluster analysis
  ‣    method names / method invocations

‣ ... and many more ...
                               39
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            40
For each of the studied techniques, Table 2 shows the kind of data (static
or dynamic) analysed by that technique, as well as the kind of analysis
                    Comparison
performed (token-based or structural/behavioural).


                       Kind of input data         Kind of analysis
                       static  dynamic    token-based structural/behavioural
  Execution patterns     X        X            -                 X
   Dynamic analysis      -        X            -                 X
   Identifier analysis    X        -            X                 -
    Language clues       X        -            X                 -
    Unique methods       X        -            -                 X
  Method clustering      X        -            X                 -
     Call clustering     X        -            -                 X
    Fan-in analysis      X        -            -                 X
Clone detection (PDG) X           -            -                 X
Clone detection (token) X         -            X                 -
Clone detection (AST)    X        -            -                 X
  Table 2. Kind of input data and kind of analysis of each technique

            Kind of input data and kind of analysis
Most techniques work on statically available data. ‘Dynamic analysis’
reasons about execution traces and thus requires executability of the
                                 41
code under analysis. Only ‘Execution patterns’ works with both kinds
For each of the studied techniques, Table 2 shows the kind of data (static
or dynamic) analysed by that technique, as well as the kind of analysis
                    Comparison
performed (token-based or structural/behavioural).


                       Kind of input data         Kind of analysis
                       static  dynamic    token-based structural/behavioural
  Execution patterns     X        X            -                 X
   Dynamic analysis      -        X            -                 X
   Identifier analysis    X        -            X                 -

                              few ic
    Language clues       X        -            X                 -
    Unique methods       X        -            -                 X
                                nam
  Method clustering      X        -            X                 -
                             dy
     Call clustering     X        -            -                 X
    Fan-in analysis      X        -            -                 X
Clone detection (PDG) X           -            -                 X
Clone detection (token) X         -            X                 -
Clone detection (AST)    X        -            -                 X
  Table 2. Kind of input data and kind of analysis of each technique

            Kind of input data and kind of analysis
Most techniques work on statically available data. ‘Dynamic analysis’
reasons about execution traces and thus requires executability of the
                                 41
code under analysis. Only ‘Execution patterns’ works with both kinds
For each of the studied techniques, Table 2 shows the kind of data (static
or dynamic) analysed by that technique, as well as the kind of analysis
                    Comparison
performed (token-based or structural/behavioural).


                       Kind of input data         Kind of analysis
                       static  dynamic    token-based structural/behavioural
                                                                   X ok
                                                                   o
                                                                s lX
  Execution patterns     X        X            -
                                                              e
                                                            qu -
   Dynamic analysis      -        X            -
                                                          ni ns -
   Identifier analysis    X        -            X
                                                     ech oke ore
                              few ic                tt
    Language clues       X        -            X
                                               - me      t at m info
    Unique methods       X        -                                X
                                              SX nly a ok av.
                                nam
                                               o
  Method clustering      X        -                                -
                             dy                -o
                                                      e lo ./behX
     Call clustering     X        -
                                                  om uct
    Fan-in analysis      X        -            -                   X
                                               -S     tr
Clone detection (PDG) X           -                                X
                                                    s
Clone detection (token) X         -            X                   -
Clone detection (AST)    X        -            -                   X
  Table 2. Kind of input data and kind of analysis of each technique

            Kind of input data and kind of analysis
Most techniques work on statically available data. ‘Dynamic analysis’
reasons about execution traces and thus requires executability of the
                                 41
code under analysis. Only ‘Execution patterns’ works with both kinds
Comparison
                             Granularity        Symptoms
                        method code fragment scattering tangling
   Execution patterns     X           -          X         -
   Dynamic analysis       X           -          X         X
   Identifier analysis     X           -          X         -
    Language clues        X           -          X         -
    Unique methods        X           -          X         -
   Method clustering      X           -          X         -
     Call clustering      X           -          X         -
    Fan-in analysis       X           -          X         -
    Clone detection        -         X           X         -
ble 3. Granularity of and symptoms looked for by each techn
 Granularity of and symptoms looked for by each technique
                                 42
e 3 summarises the finest level of granularity (methods or code
Comparison
                             Granularity        Symptoms
                        method code fragment scattering tangling
   Execution patterns     X             -        X         -
   Dynamic analysis       X             -        X         X
   Identifier analysis     X             -        X         -
                                       wl
                                    elo - ve
    Language clues        X                      X         -
                                   b l-e
                          X few d-
    Unique methods                               X         -
                                  tho -
   Method clustering      X                      X         -
                                e
                          Xm
     Call clustering                    -        X         -
    Fan-in analysis       X             -        X         -
    Clone detection        -           X         X         -
ble 3. Granularity of and symptoms looked for by each techn
 Granularity of and symptoms looked for by each technique
                                 42
e 3 summarises the finest level of granularity (methods or code
Comparison
                             Granularity        Symptoms
                        method code fragment scattering tangling
   Execution patterns     X            -         X         -
   Dynamic analysis       X            -         X         X
   Identifier analysis     X            -         X         -
                                      wl
                                   elo - ve           ew ng
    Language clues        X                      X         -
                                  b l-e          X f li -
                          X few d-
    Unique methods
                                                 X tang -
                                 tho -
   Method clustering      X
                              me
     Call clustering      X            -         X         -
    Fan-in analysis       X            -         X         -
    Clone detection        -          X          X         -
ble 3. Granularity of and symptoms looked for by each techn
 Granularity of and symptoms looked for by each technique
                                 42
e 3 summarises the finest level of granularity (methods or code
both scattering and tangling into account, by requiring that the methods
   which occur in a single use-case scenario are implemented in multiple
   classes (scattering), but also that these methods occur in multiple use-
                      Comparison
   cases and thus are tangled with other concerns of the system.


        Technique             Largest case           Size case         Empirically
                                                                        validated
     Execution patterns        Graffiti         3100 methods/82KLOC            -
      Dynamic analysis        JHotDraw        2800 methods/18KLOC            -
     Identifier analysis       JHotDraw        2800 methods/18KLOC            -
       Language clues          PetStore               10KLOC                 -
      Unique methods        Smalltalk image 3400 classes/66000 methods       -
     Method clustering        JHotDraw        2800 methods/18KLOC            -
       Call clustering     Banking example          12 methods               -
       Fan-in analysis        JHotDraw        2800 methods/18KLOC            -
                            TomCat 5.5 API           172KLOC                 -
   Clone detection (PDG)       TomCat                 38KLOC                 -
Clone detection (AST/token) ASML C-Code               20KLOC                X
         Table 4. An assessment of the validation of each technique

     An assessment of the validation of each technique
   To provide more insights into the 43
                                      validation of the techniques, Table 4
   mentions the largest case on which each technique has been validated,
both scattering and tangling into account, by requiring that the methods
   which occur in a single use-case scenario are implemented in multiple
   classes (scattering), but also that these methods occur in multiple use-
                      Comparison
   cases and thus are tangled with other concerns of the system.


        Technique             Largest case           Size case         Empirically
                                                                        validated
     Execution patterns        Graffiti         3100 methods/82KLOC            -
      Dynamic analysis        JHotDraw        2800 methods/18KLOC            -
     Identifier analysis       JHotDraw        2800 methods/18KLOC            -
                                  on ?
       Language clues          PetStore               10KLOC                 -
                               mm ark
      Unique methods        Smalltalk image 3400 classes/66000 methods       -
                            co hm
     Method clustering        JHotDraw        2800 methods/18KLOC            -

                              enc
       Call clustering     Banking example          12 methods               -
                            b
       Fan-in analysis        JHotDraw        2800 methods/18KLOC            -
                            TomCat 5.5 API           172KLOC                 -
   Clone detection (PDG)       TomCat                 38KLOC                 -
Clone detection (AST/token) ASML C-Code               20KLOC                X
         Table 4. An assessment of the validation of each technique

     An assessment of the validation of each technique
   To provide more insights into the 43
                                      validation of the techniques, Table 4
   mentions the largest case on which each technique has been validated,
both scattering and tangling into account, by requiring that the methods
   which occur in a single use-case scenario are implemented in multiple
   classes (scattering), but also that these methods occur in multiple use-
                      Comparison
   cases and thus are tangled with other concerns of the system.


        Technique             Largest case           Size case         Empirically
                                                                        validated
     Execution patterns        Graffiti         3100 methods/82KLOC            -
      Dynamic analysis        JHotDraw        2800 methods/18KLOC            -
     Identifier analysis       JHotDraw        2800 methods/18KLOC            -
                                                       toy es
                                  on ?
       Language clues          PetStore               10KLOC                 -
                               mm ark              no pl
      Unique methods        Smalltalk image 3400 classes/66000 methods       -
                            co hm                     am
     Method clustering        JHotDraw        2800 methods/18KLOC            -
                                                   ex
                              enc
       Call clustering     Banking example          12 methods               -
                            b
       Fan-in analysis        JHotDraw        2800 methods/18KLOC            -
                            TomCat 5.5 API           172KLOC                 -
   Clone detection (PDG)       TomCat                 38KLOC                 -
Clone detection (AST/token) ASML C-Code               20KLOC                X
         Table 4. An assessment of the validation of each technique

     An assessment of the validation of each technique
   To provide more insights into the 43
                                      validation of the techniques, Table 4
   mentions the largest case on which each technique has been validated,
both scattering and tangling into account, by requiring that the methods
   which occur in a single use-case scenario are implemented in multiple
   classes (scattering), but also that these methods occur in multiple use-
                      Comparison
   cases and thus are tangled with other concerns of the system.


        Technique             Largest case           Size case         Empirically
                                                                        validated
     Execution patterns        Graffiti         3100 methods/82KLOC            -
      Dynamic analysis        JHotDraw        2800 methods/18KLOC            -

                                                                          ttle cal
     Identifier analysis       JHotDraw        2800 methods/18KLOC            -
                                                       toy es
                                  on ?                                  li i
       Language clues          PetStore               10KLOC                 -
                               mm ark              no pl                   pir tion
      Unique methods        Smalltalk image 3400 classes/66000 methods       -
                            co hm                                      em ida
                                                      am
     Method clustering        JHotDraw        2800 methods/18KLOC            -
                                                   ex
                              enc                                        val
       Call clustering     Banking example          12 methods               -
                            b
       Fan-in analysis        JHotDraw        2800 methods/18KLOC            -
                            TomCat 5.5 API           172KLOC                 -
   Clone detection (PDG)       TomCat                 38KLOC                 -
Clone detection (AST/token) ASML C-Code               20KLOC                X
         Table 4. An assessment of the validation of each technique

     An assessment of the validation of each technique
   To provide more insights into the 43
                                      validation of the techniques, Table 4
   mentions the largest case on which each technique has been validated,
that technique makes about how the crosscutting concerns are imple-
     mented. Table 5 summarises these assumptions in terms of preconditions
     that a system has to satisfy in order to find suitable aspect candidates
                    Comparison (4)
     with a given technique.


      Technique      Preconditions on crosscutting concerns in the analysed program
  Execution patterns Order of calls in context of crosscutting concern is always the same.
  Dynamic analysis At least one use case exists that exposes the crosscutting concern
                     and another one that does not.
  Identifier analysis Names of methods implementing the concern are alike.
   Language clues Context of concern contains keywords which are synonyms for the
                     crosscutting concern.
   Unique methods Concern is implemented by exactly one method.
  Method clustering Names of methods implementing the concern are alike.
    Call clustering  Concern is implemented by calls to same methods from different
                     modules.
   Fan-in analysis Concern is implemented in separate method which is called a high
                     number of times, or many methods implementing the concern call
                     the same method.
   Clone detection Concern is implemented by reusing a certain code fragment.
able 5. What conditions does the implementation of the concerns have to satisfy
  What conditions does the implementation of the concerns
der for a technique to find viable aspect candidates?
    have to satisfy in order to find viable aspect candidates?
                                       44
     ‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-
that technique makes about how the crosscutting concerns are imple-
     mented. Table 5 summarises these assumptions in terms of preconditions
     that a system has to satisfy in order to find suitable aspect candidates
                    Comparison (4)
     with a given technique.


      Technique      Preconditions on crosscutting concerns in the analysed program
  Execution patterns Order of calls in context of crosscutting concern is always the same.
  Dynamic analysis At least one use case exists that exposes the crosscutting concern

                                                 ake ns
                     and another one that does not.

                                              s m ptio e
  Identifier analysis Names of methods implementing the concern are alike.

                                           ue m
   Language clues Context of concern contains keywords which are synonyms for the
                                                      od
                                        iq su
                     crosscutting concern.
                                    chn t as rce c
   Unique methods Concern is implemented by exactly one method.
                                 Te en sou
  Method clustering Names of methods implementing the concern are alike.

                                   ffer t the
    Call clustering  Concern is implemented by calls to same methods from different
                                 di u
                     modules.
                                    bo
   Fan-in analysis Concern is implemented in separate method which is called a high
                                  a
                     number of times, or many methods implementing the concern call
                     the same method.
   Clone detection Concern is implemented by reusing a certain code fragment.
able 5. What conditions does the implementation of the concerns have to satisfy
  What conditions does the implementation of the concerns
der for a technique to find viable aspect candidates?
    have to satisfy in order to find viable aspect candidates?
                                       44
     ‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-
orously made use of naming conventions when implementing the cross-
   cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that
   methods which often get called together from within different contexts
                         Comparison
   are candidate aspects. The fan-in technique assumes that crosscutting
   concerns are implemented by methods which are called many times (large
   footprint), or by methods calling such methods.


    Technique        User Involvement
Execution patterns   Inspection of the resulting “recurring patterns”.
Dynamic analysis     Selection of use cases and manual interpretation of results.
Identifier analysis   Browsing of mined aspects using IDE integration.
 Language clues      Manual interpretation of resulting lexical chains.
 Unique methods      Inspection of the unique methods; eased by sorting on importance.
Method clustering    Browsing of mined aspects using IDE integration.
  Call clustering    Manual inspection of resulting clusters.
 Fan-in analysis     Selection of candidates from list of methods, sorted on highest fan-in.
 Clone detection     Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the different techniques require?


   Table 6 summarises the kind of involvement that is required from the
                 Which kind of user involvement
   user. None dothe existing techniques works fully automatic. All tech-
               of the different techniques require?
   niques require that their users browse through the resulting aspect can-
                                      45
   didates in order to find suitable aspects. Some require that the users
orously made use of naming conventions when implementing the cross-
   cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that
   methods which often get called together from within different contexts
                         Comparison
   are candidate aspects. The fan-in technique assumes that crosscutting
   concerns are implemented by methods which are called many times (large
   footprint), or by methods calling such methods.


    Technique        User Involvement

                                                      till
Execution patterns   Inspection of the resulting “recurring patterns”.
                                                   ts
Dynamic analysis     Selection of use cases and manual interpretation of results.
                                          for
                                      l ef red
Identifier analysis   Browsing of mined aspects using IDE integration.

                                    ua ui
 Language clues      Manual interpretation of resulting lexical chains.

                                  an eq
 Unique methods      Inspection of the unique methods; eased by sorting on importance.
                                  Mr
Method clustering    Browsing of mined aspects using IDE integration.
  Call clustering    Manual inspection of resulting clusters.
 Fan-in analysis     Selection of candidates from list of methods, sorted on highest fan-in.
 Clone detection     Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the different techniques require?


   Table 6 summarises the kind of involvement that is required from the
                 Which kind of user involvement
   user. None dothe existing techniques works fully automatic. All tech-
               of the different techniques require?
   niques require that their users browse through the resulting aspect can-
                                      45
   didates in order to find suitable aspects. Some require that the users
Taxonomy




                                    method
                                 fragments
                                                    AST-based
                                                  Clone Detection
                                                                                            Clone
                                                    PDG-based
                                                                                           Detection
                                                  Clone Detection

                                                   Token-based
                                                  Clone Detection


Structural /   Token-                                Method
Behavioral     Based                                Clustering

                                                     Identifier
                                                     Analysis
                                                    Language                               Clustering




                                                                  method-level
                                                      Clues

                                                     Unique
                                                     Methods

                                                                                           Concept
                                                      Fan-in
                                                                                           Analysis
                                                     Analysis

                                                       Call
                                                    Clustering
                                                    Execution
                                                     Patterns
                        Static
                                                     Dynamic
                                                     Analysis
                                                                                 Dynamic
                                             46
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
    1. Early aspects
    2. Advanced Browsers
    3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
                            47
Summary (1)

‣ Aspect Mining
   - Identification of crosscutting concerns in legacy
     systems
   - Aid developers when migrating an existing
     application to aspects
   - Browsing approaches and automated techniques


                          48
Summary(2)
‣ Different techniques are complementary
   - Different input (dynamic vs. static); granularity
   - Rely on different assumptions/symptoms
   - Make use of different analysis techniques
     • cluster analysis, FCA, heuristics, ...
‣ When mining a system, apply (a combination of)
  different techniques
                            49
Possible research
        directions
‣ Quantitative comparison of techniques:
   - Which aspects can be detected by which
     technique?
   - Common benchmark/analysis framework
‣ Cross-fertilization with other research domains
   - slicing, metrics, ...


                             50
Limitations

‣ There is no “silver bullet”
    - Still a lot of manual intervention required
    - False positives / false negatives still remain
    - Are we really mining for aspects?
      • or is it “joinpoint mining”?


                             51
Limitations
‣ Joinpoint mining?
    - how to obtain the pointcut definition?
    - how obtain the advice?
‣ Scalability
    - user involvement
‣ Validation
‣ Granularity
                          52
More reading...


‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code-
  Level Aspect Mining Techniques. Submitted to Transactions on Aspect-
  Oriented Software Development, Special Issue on Software Evolution. 2006.

‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T.
  TOURWE. Applying and Combining Three Different Aspect Mining
  Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September
  2006.




                                     53
More reading...


‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code-
  Level Aspect Mining Techniques. Submitted to Transactions on Aspect-
  Oriented Software Development, Special Issue on Software Evolution. 2006.

‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T.
  TOURWE. Applying and Combining Three Different Aspect Mining
  Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September
  2006.




                                     53

More Related Content

Similar to A Survey Of Aspect Mining Approaches

The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareAndrey Karpov
 
The Future of Automated Malware Generation
The Future of Automated Malware GenerationThe Future of Automated Malware Generation
The Future of Automated Malware GenerationStephan Chenette
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesMohamed BOUSSAA
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowDatabricks
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects 100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects Andrey Karpov
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsUniversität Rostock
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerAndrey Karpov
 
Unifi
Unifi Unifi
Unifi hangal
 
Improving DroidBox
Improving DroidBoxImproving DroidBox
Improving DroidBoxKelwin Yang
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone CivettaCocoaHeads France
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
 
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakova
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia KazakovaC++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakova
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakovacorehard_by
 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseStephan Chenette
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisChrisCoughlin9
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 sharePing Yan
 

Similar to A Survey Of Aspect Mining Approaches (20)

The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
 
The Future of Automated Malware Generation
The Future of Automated Malware GenerationThe Future of Automated Malware Generation
The Future of Automated Malware Generation
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects 100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementations
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
Unifi
Unifi Unifi
Unifi
 
Improving DroidBox
Improving DroidBoxImproving DroidBox
Improving DroidBox
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone Civetta
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
 
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakova
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia KazakovaC++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakova
C++ CoreHard Autumn 2018. Debug C++ Without Running - Anastasia Kazakova
 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive Defense
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data Analysis
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 share
 

More from kim.mens

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolutionkim.mens
 
Context-Oriented Programming
Context-Oriented ProgrammingContext-Oriented Programming
Context-Oriented Programmingkim.mens
 
Software Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented ProgrammingSoftware Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented Programmingkim.mens
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smellskim.mens
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristicskim.mens
 
Software Patterns
Software PatternsSoftware Patterns
Software Patternskim.mens
 
Code Refactoring
Code RefactoringCode Refactoring
Code Refactoringkim.mens
 
Domain Modelling
Domain ModellingDomain Modelling
Domain Modellingkim.mens
 
Object-Oriented Application Frameworks
Object-Oriented Application FrameworksObject-Oriented Application Frameworks
Object-Oriented Application Frameworkskim.mens
 
Towards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation FrameworkTowards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation Frameworkkim.mens
 
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty ApproachesTowards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty Approacheskim.mens
 
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software EngineeringBreaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineeringkim.mens
 
Context-oriented programming
Context-oriented programmingContext-oriented programming
Context-oriented programmingkim.mens
 
Basics of reflection
Basics of reflectionBasics of reflection
Basics of reflectionkim.mens
 
Advanced Reflection in Java
Advanced Reflection in JavaAdvanced Reflection in Java
Advanced Reflection in Javakim.mens
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in javakim.mens
 
Reflection in Ruby
Reflection in RubyReflection in Ruby
Reflection in Rubykim.mens
 
Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to Rubykim.mens
 
Introduction to Smalltalk
Introduction to SmalltalkIntroduction to Smalltalk
Introduction to Smalltalkkim.mens
 
A gentle introduction to reflection
A gentle introduction to reflectionA gentle introduction to reflection
A gentle introduction to reflectionkim.mens
 

More from kim.mens (20)

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolution
 
Context-Oriented Programming
Context-Oriented ProgrammingContext-Oriented Programming
Context-Oriented Programming
 
Software Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented ProgrammingSoftware Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented Programming
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smells
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristics
 
Software Patterns
Software PatternsSoftware Patterns
Software Patterns
 
Code Refactoring
Code RefactoringCode Refactoring
Code Refactoring
 
Domain Modelling
Domain ModellingDomain Modelling
Domain Modelling
 
Object-Oriented Application Frameworks
Object-Oriented Application FrameworksObject-Oriented Application Frameworks
Object-Oriented Application Frameworks
 
Towards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation FrameworkTowards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation Framework
 
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty ApproachesTowards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
 
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software EngineeringBreaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
 
Context-oriented programming
Context-oriented programmingContext-oriented programming
Context-oriented programming
 
Basics of reflection
Basics of reflectionBasics of reflection
Basics of reflection
 
Advanced Reflection in Java
Advanced Reflection in JavaAdvanced Reflection in Java
Advanced Reflection in Java
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in java
 
Reflection in Ruby
Reflection in RubyReflection in Ruby
Reflection in Ruby
 
Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to Ruby
 
Introduction to Smalltalk
Introduction to SmalltalkIntroduction to Smalltalk
Introduction to Smalltalk
 
A gentle introduction to reflection
A gentle introduction to reflectionA gentle introduction to reflection
A gentle introduction to reflection
 

Recently uploaded

NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Recently uploaded (20)

NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

A Survey Of Aspect Mining Approaches

  • 1. Aspect Mining an emerging research domain Kim Mens http://www.info.ucl.ac.be/~km Special thanks to Andy Kellens, Paolo Tonella & Kris Gybels 1
  • 2. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 2
  • 3. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 2
  • 4. Aspects in existing systems? ‣ Aspects offer - Better modularisation and “separation of concerns” - Improved “ilities” ‣ Also useful for already existing systems - But how to migrate a legacy system to AOSD? 3
  • 7. Migrating a System Aspect Mining Aspect Refactoring 4
  • 8. Why do we need aspect mining? ‣ Legacy systems - large, complex systems - not always clearly documented - need to find the crosscutting concerns (what?) - need to find the extent of the crosscutting concerns (where?) - program understanding/documentation 5
  • 9. Why do we need aspect mining? ‣ Legacy systems - large, complex systems - not always clearly documented - need to find the crosscutting concerns (what?) - need to find the extent of the crosscutting concerns (where?) - program understanding/documentation 5
  • 10. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 6
  • 11. Different Kinds of Mining Approaches 1.“Early aspect” discovery techniques 2.Advanced special-purpose browsers 3.(Semi-)automated code-level mining techniques 7
  • 12. Early Aspects ‣ Try to find the aspects in artifacts of the early phases of the software life-cycle : - requirements - design ‣ Less useful when trying to identify aspects in legacy code 8
  • 13. Different Kinds of Mining Approaches 1.“Early aspect” discovery techniques 2.Advanced special-purpose browsers 3.(Semi-)automated code-level mining techniques 9
  • 14. Special-Purpose Code Browsers ‣ Idea: help a developer in exploring a concern ‣ By browsing / navigating the source code - User provides interesting “seed” in the code - Tool suggests interesting related points to consider ‣ Iteratively, build up a model of the crosscutting concerns ‣ Examples: - FEAT, Aspect Browser, (Extended) Aspect Mining Tool, Prism, JQuery, Soul 10
  • 15. Special-Purpose Code Browsers ‣ Idea: help a developer in exploring a concern ‣ By browsing / navigating the source code - User provides interesting “seed” in the code - Tool suggests interesting related points to consider ‣ Iteratively, build up a model of the crosscutting concerns ‣ Examples: - FEAT, Aspect Browser, (Extended) Aspect Mining Tool, Prism, JQuery, Soul 10
  • 16. FEAT ‣ Create a “Concern Graph”, a representation of the concern mapping back to the source code. ‣ Consists out of a set of program entities ‣ Start out with a number of program entities (e.g. “all callers of the method print()”) ‣ Iteratively explore the relations and refine the concern graph - fan-in (callers, ...) - fan-out (callees, ...) 11
  • 18. Special-Purpose Code Browsers ‣ Disadvantages : - Require quite some manual effort (does it scale?) ➡ Try to automate this process - Requires some preliminary knowledge about system (need for seeds) 13
  • 19. Different Kinds of Mining Approaches 1.“Early aspect” discovery techniques 2.Advanced special-purpose browsers 3.(Semi-)automated code-level mining techniques 14
  • 20. Automated Approaches ‣ Idea: Semi-automatically find possible aspect candidates in source code of a system ‣ No magic: - Manual filtering of the results - False positives/negatives - May require preliminary knowledge ‣ Possibility to use results as seed for browsers 15
  • 21. Automated Approaches ‣ Hidden assumption of all techniques : - look for symptoms of scattering and tangling - either static (code) or dynamic (run-time) ‣ Discover these symptoms using a variety of data analysis techniques - inspired by data mining, code analysis, ... - but specifically (re)designed for aspect mining 16
  • 22. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 17
  • 23. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 18
  • 24. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 18
  • 25. Recurring Patterns [Breu&Krinke] ‣ Assumption: if certain method calls always happen before/after the same call, they might be part of an aspect ‣ “Reverse engineering” of manual implementation of before/after advice ‣ Technique: find pairs of methods which happen right before/after each other and check wether this pattern recurs in the code 19
  • 26. Breu and Krinke propose an aspect mining technique n (Dynamic Aspect Mining Tool) [13], which analyses pr flecting the run-time behaviour of a system in search of Recurring Patterns tion patterns. To do so, they introduce the notion of ex [Breu&Krinke] between method invocations. Consider the following exa trace, where the capitals represent method names: B() { C() { G() {} H() {} } } A() {} Breu and Krinke distinguish between four different exe Inherently dynamic (analysis of execution is called but: A), outside-after outside-before (e.g., B trace), before ‣ static : use control-flowlast call in C).(e.g., Gthese execution in C) and in after B), inside-first graphs to calculateis the first call relations, th calling relations - is the Using - hybrid : augmentrithm discovers aspect candidates based on recurring pa dynamic information with static type information to improve results an execution relation occurs more than invocations. If uniformly (for instance, every invocation of method B i CCC needs to be rigorously implemented (calls always in same an aspe ‣ invocation of method A), it is considered to be order) ensure that the aspect candidates are sufficiently cross- 20 an extra requirement that the recurring relations shou
  • 27. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 21
  • 28. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 21
  • 29. Clone Detection [Bruntink, Shepherd] ‣ Assumption: crosscutting concerns result in code duplication ‣ Certain CCCs are implemented using a recurring pattern, idiom ‣ Technique: find classes of related code - exact same - same structure; different literals 22
  • 30. only the lines marked with ‘C’ are included in the clones. a cer- CCFinder allows clones to start and end with little regard to Clone Detection e clone syntactic units. In contrast, Bauhaus’ ccdiml does not allow ce pre- [Bruntink, Shepherd] this, due to its AST-based clone detection algorithm. differ- n order M C if (r != OK) , preci- M C{ M C ERXA_LOG(r, 0, (quot;PLXAmem_malloc failure.quot;)); re these M C M C ERXA_LOG(VSXA_MEMORY_ERR, r, M C (quot;%s: failed to allocated %d bytes.quot;, etectors M func_name, toread)); overage M M r = VSXA_MEMORY_ERR; fraction M } the first Figure 3. CCFinder clone covering memory error thm de- source code analysis ‣ handling. in Fig- Finder, ‣ applicable if the CCCs are implemented as a pattern Furthermore this clone class does not cover memory er- by the ror handling code exclusively. In Figure 2(d), note that the ollows: precision obtained for the 23 clone class is roughly 82%. first Through inspection of the code we found that some of the
  • 31. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 24
  • 32. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 24
  • 33. Formal Concept Analysis • Starts from • a set of elements • a set of properties of those elements • Determines concepts • Maximal groups of elements and properties • Group: • Every element of the concept has those properties • Every property of the concept holds for those elements • Maximal • No other element (outside the concept) has those same properties • No other property (outside the concept) is shared by all elements 25
  • 34. Formal Concept Analysis object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X 26
  • 35. Formal Concept Analysis object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X 26
  • 36. Formal Concept Analysis object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X 26
  • 37. Formal Concept Analysis object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X 26
  • 39. Identifier Analysis (Mens & Tourwé) ‣ Assumption: methods belonging to the same concern are implemented using similar names ‣ Eg. methods implementing Undo concern will have “undo” in their signature ‣ Technique: Use Formal Concept Analysis with the methods as objects and the different substrings as properties 28
  • 40. Identifier Analysis figu drawi reque remo upda chan eve … re ng st ve te ge nt drawingRequestUpdate - X X - X - - … (DrawingChangeEvent e) figureRequestRemove X - X X - - - … (FigureChangeEvent e) figureRequestUpdate X - X - X - - … (FigureChangeEvent e) figureRequestRemove X - X X - - - … (FigureChangeEvent e) figureRequestUpdate X - X - X - - … (FigureChangeEvent e) … … … X … … … … … ‣ Lots of concepts: filter irrelevant ones ‣ Group “similar” concepts 29
  • 41. Execution traces (ceccato & tonella) ‣ Assumption: use case scenario’s are a good indicator for crosscutting concerns ‣ Technique: Analyse execution trace using FCA with use cases as objects; methods called from within use case as attributes ‣ Concept specific to one use case is possible aspect if: - Methods belong to more than one class (scattering) - Methods of same class occur in multiple use cases (tangling) 30
  • 42. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 31
  • 43. Fan-in Analysis (Marin) ‣ Assumption: The fan-in metric is a good indicator of scattering ‣ “Methods which get called often from different contexts are possible crosscutting concerns” ‣ Technique: calculate the fan-in of each method, filter auxiliary methods/accessors and sort the resulting methods 32
  • 44. Unique Methods (Gybels & kellens) ‣ Assumption: Some CCCs are implemented by calling a single entity from all over the code ‣ E.g. Logging ‣ Technique: devise a metric (unique method) which finds such crosscutting concerns 33
  • 45. Unique Methods (Gybels & kellens) Logging log() ClassB Unique Method A method without a return value which implements a message ClassA ClassC implemented by no other method. Q: Why no return type? A: Use of value in base computation 34
  • 46. Unique Methods (Gybels & kellens) Class Selector Calls Parcel #markAsDirty 23 ParagraphEditor #resetTypeIn 19 UIPainter #broadCastPendingSelectionChange 18 ControllerCodeRegenerator #pushPC 15 AbstractChangeList #updateSelection: 15 PundleModel #updateAfterDo: 10 ??????????????????????? #beGarbageCollectable ?? 35
  • 47. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 36
  • 48. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 36
  • 49. Cluster Analysis ‣ Input: - a set of objects - a distance measure between those objects ‣ Output: - groups of objects which are close (according to the distance function) 37
  • 50. Aspect Mining Using Cluster Analysis (He, Shepherd) ‣ Distance function between two methods ‣ Call Clustering: - distance in function of times they occur together in same method (cfr. recurring execution traces) ‣ Method Clustering: - distance in function of commonalities in name of methods (cfr. identifier analysis) 38
  • 51. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 39
  • 52. Some Automated Approaches ‣ Pattern matching • Analysing recurring patterns of execution traces • Clone detection (token/PDG/AST-based) ‣ Formal concept analysis ‣ Execution traces / identifiers ‣ Natural language processing on code ‣ AOP idioms ‣ Fan-in analysis / Detecting unique methods ‣ Cluster analysis ‣ method names / method invocations ‣ ... and many more ... 39
  • 53. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 40
  • 54. For each of the studied techniques, Table 2 shows the kind of data (static or dynamic) analysed by that technique, as well as the kind of analysis Comparison performed (token-based or structural/behavioural). Kind of input data Kind of analysis static dynamic token-based structural/behavioural Execution patterns X X - X Dynamic analysis - X - X Identifier analysis X - X - Language clues X - X - Unique methods X - - X Method clustering X - X - Call clustering X - - X Fan-in analysis X - - X Clone detection (PDG) X - - X Clone detection (token) X - X - Clone detection (AST) X - - X Table 2. Kind of input data and kind of analysis of each technique Kind of input data and kind of analysis Most techniques work on statically available data. ‘Dynamic analysis’ reasons about execution traces and thus requires executability of the 41 code under analysis. Only ‘Execution patterns’ works with both kinds
  • 55. For each of the studied techniques, Table 2 shows the kind of data (static or dynamic) analysed by that technique, as well as the kind of analysis Comparison performed (token-based or structural/behavioural). Kind of input data Kind of analysis static dynamic token-based structural/behavioural Execution patterns X X - X Dynamic analysis - X - X Identifier analysis X - X - few ic Language clues X - X - Unique methods X - - X nam Method clustering X - X - dy Call clustering X - - X Fan-in analysis X - - X Clone detection (PDG) X - - X Clone detection (token) X - X - Clone detection (AST) X - - X Table 2. Kind of input data and kind of analysis of each technique Kind of input data and kind of analysis Most techniques work on statically available data. ‘Dynamic analysis’ reasons about execution traces and thus requires executability of the 41 code under analysis. Only ‘Execution patterns’ works with both kinds
  • 56. For each of the studied techniques, Table 2 shows the kind of data (static or dynamic) analysed by that technique, as well as the kind of analysis Comparison performed (token-based or structural/behavioural). Kind of input data Kind of analysis static dynamic token-based structural/behavioural X ok o s lX Execution patterns X X - e qu - Dynamic analysis - X - ni ns - Identifier analysis X - X ech oke ore few ic tt Language clues X - X - me t at m info Unique methods X - X SX nly a ok av. nam o Method clustering X - - dy -o e lo ./behX Call clustering X - om uct Fan-in analysis X - - X -S tr Clone detection (PDG) X - X s Clone detection (token) X - X - Clone detection (AST) X - - X Table 2. Kind of input data and kind of analysis of each technique Kind of input data and kind of analysis Most techniques work on statically available data. ‘Dynamic analysis’ reasons about execution traces and thus requires executability of the 41 code under analysis. Only ‘Execution patterns’ works with both kinds
  • 57. Comparison Granularity Symptoms method code fragment scattering tangling Execution patterns X - X - Dynamic analysis X - X X Identifier analysis X - X - Language clues X - X - Unique methods X - X - Method clustering X - X - Call clustering X - X - Fan-in analysis X - X - Clone detection - X X - ble 3. Granularity of and symptoms looked for by each techn Granularity of and symptoms looked for by each technique 42 e 3 summarises the finest level of granularity (methods or code
  • 58. Comparison Granularity Symptoms method code fragment scattering tangling Execution patterns X - X - Dynamic analysis X - X X Identifier analysis X - X - wl elo - ve Language clues X X - b l-e X few d- Unique methods X - tho - Method clustering X X - e Xm Call clustering - X - Fan-in analysis X - X - Clone detection - X X - ble 3. Granularity of and symptoms looked for by each techn Granularity of and symptoms looked for by each technique 42 e 3 summarises the finest level of granularity (methods or code
  • 59. Comparison Granularity Symptoms method code fragment scattering tangling Execution patterns X - X - Dynamic analysis X - X X Identifier analysis X - X - wl elo - ve ew ng Language clues X X - b l-e X f li - X few d- Unique methods X tang - tho - Method clustering X me Call clustering X - X - Fan-in analysis X - X - Clone detection - X X - ble 3. Granularity of and symptoms looked for by each techn Granularity of and symptoms looked for by each technique 42 e 3 summarises the finest level of granularity (methods or code
  • 60. both scattering and tangling into account, by requiring that the methods which occur in a single use-case scenario are implemented in multiple classes (scattering), but also that these methods occur in multiple use- Comparison cases and thus are tangled with other concerns of the system. Technique Largest case Size case Empirically validated Execution patterns Graffiti 3100 methods/82KLOC - Dynamic analysis JHotDraw 2800 methods/18KLOC - Identifier analysis JHotDraw 2800 methods/18KLOC - Language clues PetStore 10KLOC - Unique methods Smalltalk image 3400 classes/66000 methods - Method clustering JHotDraw 2800 methods/18KLOC - Call clustering Banking example 12 methods - Fan-in analysis JHotDraw 2800 methods/18KLOC - TomCat 5.5 API 172KLOC - Clone detection (PDG) TomCat 38KLOC - Clone detection (AST/token) ASML C-Code 20KLOC X Table 4. An assessment of the validation of each technique An assessment of the validation of each technique To provide more insights into the 43 validation of the techniques, Table 4 mentions the largest case on which each technique has been validated,
  • 61. both scattering and tangling into account, by requiring that the methods which occur in a single use-case scenario are implemented in multiple classes (scattering), but also that these methods occur in multiple use- Comparison cases and thus are tangled with other concerns of the system. Technique Largest case Size case Empirically validated Execution patterns Graffiti 3100 methods/82KLOC - Dynamic analysis JHotDraw 2800 methods/18KLOC - Identifier analysis JHotDraw 2800 methods/18KLOC - on ? Language clues PetStore 10KLOC - mm ark Unique methods Smalltalk image 3400 classes/66000 methods - co hm Method clustering JHotDraw 2800 methods/18KLOC - enc Call clustering Banking example 12 methods - b Fan-in analysis JHotDraw 2800 methods/18KLOC - TomCat 5.5 API 172KLOC - Clone detection (PDG) TomCat 38KLOC - Clone detection (AST/token) ASML C-Code 20KLOC X Table 4. An assessment of the validation of each technique An assessment of the validation of each technique To provide more insights into the 43 validation of the techniques, Table 4 mentions the largest case on which each technique has been validated,
  • 62. both scattering and tangling into account, by requiring that the methods which occur in a single use-case scenario are implemented in multiple classes (scattering), but also that these methods occur in multiple use- Comparison cases and thus are tangled with other concerns of the system. Technique Largest case Size case Empirically validated Execution patterns Graffiti 3100 methods/82KLOC - Dynamic analysis JHotDraw 2800 methods/18KLOC - Identifier analysis JHotDraw 2800 methods/18KLOC - toy es on ? Language clues PetStore 10KLOC - mm ark no pl Unique methods Smalltalk image 3400 classes/66000 methods - co hm am Method clustering JHotDraw 2800 methods/18KLOC - ex enc Call clustering Banking example 12 methods - b Fan-in analysis JHotDraw 2800 methods/18KLOC - TomCat 5.5 API 172KLOC - Clone detection (PDG) TomCat 38KLOC - Clone detection (AST/token) ASML C-Code 20KLOC X Table 4. An assessment of the validation of each technique An assessment of the validation of each technique To provide more insights into the 43 validation of the techniques, Table 4 mentions the largest case on which each technique has been validated,
  • 63. both scattering and tangling into account, by requiring that the methods which occur in a single use-case scenario are implemented in multiple classes (scattering), but also that these methods occur in multiple use- Comparison cases and thus are tangled with other concerns of the system. Technique Largest case Size case Empirically validated Execution patterns Graffiti 3100 methods/82KLOC - Dynamic analysis JHotDraw 2800 methods/18KLOC - ttle cal Identifier analysis JHotDraw 2800 methods/18KLOC - toy es on ? li i Language clues PetStore 10KLOC - mm ark no pl pir tion Unique methods Smalltalk image 3400 classes/66000 methods - co hm em ida am Method clustering JHotDraw 2800 methods/18KLOC - ex enc val Call clustering Banking example 12 methods - b Fan-in analysis JHotDraw 2800 methods/18KLOC - TomCat 5.5 API 172KLOC - Clone detection (PDG) TomCat 38KLOC - Clone detection (AST/token) ASML C-Code 20KLOC X Table 4. An assessment of the validation of each technique An assessment of the validation of each technique To provide more insights into the 43 validation of the techniques, Table 4 mentions the largest case on which each technique has been validated,
  • 64. that technique makes about how the crosscutting concerns are imple- mented. Table 5 summarises these assumptions in terms of preconditions that a system has to satisfy in order to find suitable aspect candidates Comparison (4) with a given technique. Technique Preconditions on crosscutting concerns in the analysed program Execution patterns Order of calls in context of crosscutting concern is always the same. Dynamic analysis At least one use case exists that exposes the crosscutting concern and another one that does not. Identifier analysis Names of methods implementing the concern are alike. Language clues Context of concern contains keywords which are synonyms for the crosscutting concern. Unique methods Concern is implemented by exactly one method. Method clustering Names of methods implementing the concern are alike. Call clustering Concern is implemented by calls to same methods from different modules. Fan-in analysis Concern is implemented in separate method which is called a high number of times, or many methods implementing the concern call the same method. Clone detection Concern is implemented by reusing a certain code fragment. able 5. What conditions does the implementation of the concerns have to satisfy What conditions does the implementation of the concerns der for a technique to find viable aspect candidates? have to satisfy in order to find viable aspect candidates? 44 ‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-
  • 65. that technique makes about how the crosscutting concerns are imple- mented. Table 5 summarises these assumptions in terms of preconditions that a system has to satisfy in order to find suitable aspect candidates Comparison (4) with a given technique. Technique Preconditions on crosscutting concerns in the analysed program Execution patterns Order of calls in context of crosscutting concern is always the same. Dynamic analysis At least one use case exists that exposes the crosscutting concern ake ns and another one that does not. s m ptio e Identifier analysis Names of methods implementing the concern are alike. ue m Language clues Context of concern contains keywords which are synonyms for the od iq su crosscutting concern. chn t as rce c Unique methods Concern is implemented by exactly one method. Te en sou Method clustering Names of methods implementing the concern are alike. ffer t the Call clustering Concern is implemented by calls to same methods from different di u modules. bo Fan-in analysis Concern is implemented in separate method which is called a high a number of times, or many methods implementing the concern call the same method. Clone detection Concern is implemented by reusing a certain code fragment. able 5. What conditions does the implementation of the concerns have to satisfy What conditions does the implementation of the concerns der for a technique to find viable aspect candidates? have to satisfy in order to find viable aspect candidates? 44 ‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-
  • 66. orously made use of naming conventions when implementing the cross- cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that methods which often get called together from within different contexts Comparison are candidate aspects. The fan-in technique assumes that crosscutting concerns are implemented by methods which are called many times (large footprint), or by methods calling such methods. Technique User Involvement Execution patterns Inspection of the resulting “recurring patterns”. Dynamic analysis Selection of use cases and manual interpretation of results. Identifier analysis Browsing of mined aspects using IDE integration. Language clues Manual interpretation of resulting lexical chains. Unique methods Inspection of the unique methods; eased by sorting on importance. Method clustering Browsing of mined aspects using IDE integration. Call clustering Manual inspection of resulting clusters. Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in. Clone detection Browsing and manual interpretation of the discovered clones. Table 6. Which kind of user involvement do the different techniques require? Table 6 summarises the kind of involvement that is required from the Which kind of user involvement user. None dothe existing techniques works fully automatic. All tech- of the different techniques require? niques require that their users browse through the resulting aspect can- 45 didates in order to find suitable aspects. Some require that the users
  • 67. orously made use of naming conventions when implementing the cross- cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that methods which often get called together from within different contexts Comparison are candidate aspects. The fan-in technique assumes that crosscutting concerns are implemented by methods which are called many times (large footprint), or by methods calling such methods. Technique User Involvement till Execution patterns Inspection of the resulting “recurring patterns”. ts Dynamic analysis Selection of use cases and manual interpretation of results. for l ef red Identifier analysis Browsing of mined aspects using IDE integration. ua ui Language clues Manual interpretation of resulting lexical chains. an eq Unique methods Inspection of the unique methods; eased by sorting on importance. Mr Method clustering Browsing of mined aspects using IDE integration. Call clustering Manual inspection of resulting clusters. Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in. Clone detection Browsing and manual interpretation of the discovered clones. Table 6. Which kind of user involvement do the different techniques require? Table 6 summarises the kind of involvement that is required from the Which kind of user involvement user. None dothe existing techniques works fully automatic. All tech- of the different techniques require? niques require that their users browse through the resulting aspect can- 45 didates in order to find suitable aspects. Some require that the users
  • 68. Taxonomy method fragments AST-based Clone Detection Clone PDG-based Detection Clone Detection Token-based Clone Detection Structural / Token- Method Behavioral Based Clustering Identifier Analysis Language Clustering method-level Clues Unique Methods Concept Fan-in Analysis Analysis Call Clustering Execution Patterns Static Dynamic Analysis Dynamic 46
  • 69. Overview ‣ What is aspect mining and why do we need it ? ‣ Different kinds of mining approaches : 1. Early aspects 2. Advanced Browsers 3. Source-code mining ‣ Overview of automated approaches ‣ Comparison of automated code mining techniques ‣ Conclusion: limitations of aspect mining 47
  • 70. Summary (1) ‣ Aspect Mining - Identification of crosscutting concerns in legacy systems - Aid developers when migrating an existing application to aspects - Browsing approaches and automated techniques 48
  • 71. Summary(2) ‣ Different techniques are complementary - Different input (dynamic vs. static); granularity - Rely on different assumptions/symptoms - Make use of different analysis techniques • cluster analysis, FCA, heuristics, ... ‣ When mining a system, apply (a combination of) different techniques 49
  • 72. Possible research directions ‣ Quantitative comparison of techniques: - Which aspects can be detected by which technique? - Common benchmark/analysis framework ‣ Cross-fertilization with other research domains - slicing, metrics, ... 50
  • 73. Limitations ‣ There is no “silver bullet” - Still a lot of manual intervention required - False positives / false negatives still remain - Are we really mining for aspects? • or is it “joinpoint mining”? 51
  • 74. Limitations ‣ Joinpoint mining? - how to obtain the pointcut definition? - how obtain the advice? ‣ Scalability - user involvement ‣ Validation ‣ Granularity 52
  • 75. More reading... ‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code- Level Aspect Mining Techniques. Submitted to Transactions on Aspect- Oriented Software Development, Special Issue on Software Evolution. 2006. ‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T. TOURWE. Applying and Combining Three Different Aspect Mining Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September 2006. 53
  • 76. More reading... ‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code- Level Aspect Mining Techniques. Submitted to Transactions on Aspect- Oriented Software Development, Special Issue on Software Evolution. 2006. ‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T. TOURWE. Applying and Combining Three Different Aspect Mining Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September 2006. 53