A Survey Of Aspect Mining Approaches

Aspect Mining
an emerging research domain

Kim Mens
http://www.info.ucl.ac.be/~km
Special thanks to Andy Kellens, Paolo Tonella & Kris Gybels
1

Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining
2

Aspects in existing
systems?

‣ Aspects offer
- Better modularisation and “separation of concerns”
- Improved “ilities”
‣ Also useful for already existing systems
- But how to migrate a legacy system to AOSD?

3

Migrating a System

4

Migrating a System

Aspect Mining

4

Migrating a System

Aspect Mining

Aspect Refactoring
4

Why do we need aspect
mining?
‣ Legacy systems
- large, complex systems
- not always clearly documented
- need to ﬁnd the crosscutting concerns (what?)
- need to ﬁnd the extent of the crosscutting
concerns (where?)
- program understanding/documentation
5

Overview
1. Early aspects
6

Different Kinds of
Mining Approaches

1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques

7

Early Aspects

‣ Try to ﬁnd the aspects in artifacts of the early phases
of the software life-cycle :
- requirements
- design
‣ Less useful when trying to identify aspects in legacy
code

8

Different Kinds of
Mining Approaches


9

Special-Purpose
Code Browsers
‣ Idea: help a developer in exploring a concern
‣ By browsing / navigating the source code
- User provides interesting “seed” in the code
- Tool suggests interesting related points to consider
‣ Iteratively, build up a model of the crosscutting concerns
‣ Examples:
- FEAT, Aspect Browser, (Extended) Aspect Mining
Tool, Prism, JQuery, Soul
10

FEAT
‣ Create a “Concern Graph”, a representation of the
concern mapping back to the source code.
‣ Consists out of a set of program entities
‣ Start out with a number of program entities (e.g. “all
callers of the method print()”)
‣ Iteratively explore the relations and reﬁne the
concern graph
- fan-in (callers, ...)
- fan-out (callees, ...)
11

Special-Purpose
Code Browsers

‣ Disadvantages :
- Require quite some manual effort (does it scale?)
➡ Try to automate this process
- Requires some preliminary knowledge about
system (need for seeds)

13

Different Kinds of
Mining Approaches


14

Automated Approaches

‣ Idea: Semi-automatically ﬁnd possible aspect
candidates in source code of a system
‣ No magic:
- Manual ﬁltering of the results
- False positives/negatives
- May require preliminary knowledge
‣ Possibility to use results as seed for browsers
15

Automated Approaches
‣ Hidden assumption of all techniques :
- look for symptoms of scattering and tangling
- either static (code) or dynamic (run-time)
‣ Discover these symptoms using a variety of data
analysis techniques
- inspired by data mining, code analysis, ...
- but speciﬁcally (re)designed for aspect mining

16

Overview
1. Early aspects
17

Some Automated
Approaches
‣ Pattern matching
• Analysing recurring patterns of execution traces
• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis
‣ Execution traces / identiﬁers
‣ Natural language processing on code
‣ AOP idioms
‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis
‣ method names / method invocations

‣ ... and many more ...
18

Recurring Patterns
[Breu&Krinke]

‣ Assumption: if certain method calls always happen
before/after the same call, they might be part of an
aspect
‣ “Reverse engineering” of manual implementation of
before/after advice
‣ Technique: ﬁnd pairs of methods which happen right
before/after each other and check wether this pattern
recurs in the code

19

Breu and Krinke propose an aspect mining technique n
(Dynamic Aspect Mining Tool) [13], which analyses pr
flecting the run-time behaviour of a system in search of
Recurring Patterns
tion patterns. To do so, they introduce the notion of ex
[Breu&Krinke]
between method invocations. Consider the following exa
trace, where the capitals represent method names:
B() {
C() {
G() {}
H() {}
}
}
A() {}
Breu and Krinke distinguish between four different exe
Inherently dynamic (analysis of execution is called but: A), outside-after
outside-before (e.g., B trace), before
‣
static : use control-flowlast call in C).(e.g., Gthese execution in C) and in
after B), inside-first
graphs to calculateis the first call relations, th
calling relations
- is the Using
- hybrid : augmentrithm discovers aspect candidates based on recurring pa
dynamic information with static type
information to improve results an execution relation occurs more than
invocations. If
uniformly (for instance, every invocation of method B i
CCC needs to be rigorously implemented (calls always in same an aspe
‣ invocation of method A), it is considered to be
order) ensure that the aspect candidates are sufficiently cross-
20
an extra requirement that the recurring relations shou

Some Automated
Approaches
‣ AOP idioms

21

Clone Detection
[Bruntink, Shepherd]

‣ Assumption: crosscutting concerns result in code
duplication
‣ Certain CCCs are implemented using a recurring
pattern, idiom
‣ Technique: ﬁnd classes of related code
- exact same
- same structure; different literals

22

only the lines marked with ‘C’ are included in the clones.
a cer- CCFinder allows clones to start and end with little regard to
Clone Detection
e clone syntactic units. In contrast, Bauhaus’ ccdiml does not allow
ce pre-
[Bruntink, Shepherd]
this, due to its AST-based clone detection algorithm.
differ-
n order M C if (r != OK)
, preci- M C{
M C ERXA_LOG(r, 0, (quot;PLXAmem_malloc failure.quot;));
re these M C
M C ERXA_LOG(VSXA_MEMORY_ERR, r,
M C (quot;%s: failed to allocated %d bytes.quot;,
etectors M func_name, toread));
overage M
M r = VSXA_MEMORY_ERR;
fraction M }
the ﬁrst
Figure 3. CCFinder clone covering memory error
thm de-
source code analysis
‣ handling.
in Fig-
Finder,
‣ applicable if the CCCs are implemented as a pattern
Furthermore this clone class does not cover memory er-
by the ror handling code exclusively. In Figure 2(d), note that the
ollows: precision obtained for the 23 clone class is roughly 82%.
ﬁrst
Through inspection of the code we found that some of the

Some Automated
Approaches
‣ AOP idioms

24

Formal Concept Analysis
• Starts from
• a set of elements
• a set of properties of those elements

• Determines concepts
• Maximal groups of elements and properties
• Group:
• Every element of the concept has those properties
• Every property of the concept holds for those elements
• Maximal
• No other element (outside the concept) has those same properties
• No other property (outside the concept) is shared by all elements
25


object- static dynamic
functional logic
oriented typing typing

C++ X - - X -

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

Prolog - - X - X

26


27

Identiﬁer Analysis (Mens &
Tourwé)

‣ Assumption: methods belonging to the same concern
are implemented using similar names
‣ Eg. methods implementing Undo concern will have
“undo” in their signature
‣ Technique: Use Formal Concept Analysis with the
methods as objects and the different substrings as
properties

28

Identiﬁer Analysis
figu drawi reque remo upda chan eve
…
re ng st ve te ge nt
drawingRequestUpdate
- X X - X - - …
(DrawingChangeEvent e)
figureRequestRemove
X - X X - - - …
(FigureChangeEvent e)
figureRequestUpdate
X - X - X - - …
figureRequestRemove
X - X X - - - …
figureRequestUpdate
X - X - X - - …

… … … X … … … … …

‣ Lots of concepts: ﬁlter irrelevant ones
‣ Group “similar” concepts
29

Execution traces (ceccato &
tonella)
‣ Assumption: use case scenario’s are a good indicator
for crosscutting concerns
‣ Technique: Analyse execution trace using FCA with
use cases as objects; methods called from within use
case as attributes
‣ Concept speciﬁc to one use case is possible aspect if:
- Methods belong to more than one class
(scattering)
- Methods of same class occur in multiple use cases
(tangling)
30

Some Automated
Approaches
‣ AOP idioms

31

Fan-in Analysis (Marin)

‣ Assumption: The fan-in metric is a good indicator of
scattering
‣ “Methods which get called often from different
contexts are possible crosscutting concerns”
‣ Technique: calculate the fan-in of each method, ﬁlter
auxiliary methods/accessors and sort the resulting
methods

32

Unique Methods (Gybels & kellens)

‣ Assumption: Some CCCs are implemented by calling
a single entity from all over the code
‣ E.g. Logging
‣ Technique: devise a metric (unique method) which
ﬁnds such crosscutting concerns

33

Logging
log() ClassB

Unique Method
A method without a return value
which implements a message ClassA ClassC
implemented by no other method.

Q: Why no return type?
A: Use of value in base
computation

34


Class Selector Calls
Parcel #markAsDirty 23
ParagraphEditor #resetTypeIn 19
UIPainter #broadCastPendingSelectionChange 18
ControllerCodeRegenerator #pushPC 15
AbstractChangeList #updateSelection: 15
PundleModel #updateAfterDo: 10
??????????????????????? #beGarbageCollectable ??

35

Some Automated
Approaches
‣ AOP idioms

36

Cluster Analysis

‣ Input:
- a set of objects
- a distance measure between those objects
‣ Output:
- groups of objects which are close (according to
the distance function)

37

Aspect Mining Using Cluster
Analysis (He, Shepherd)

‣ Distance function between two methods
‣ Call Clustering:
- distance in function of times they occur together
in same method (cfr. recurring execution traces)
‣ Method Clustering:
- distance in function of commonalities in name of
methods (cfr. identiﬁer analysis)

38

Some Automated
Approaches
‣ AOP idioms

39

Overview
1. Early aspects
40

For each of the studied techniques, Table 2 shows the kind of data (static
or dynamic) analysed by that technique, as well as the kind of analysis
Comparison
performed (token-based or structural/behavioural).

Kind of input data Kind of analysis
static dynamic token-based structural/behavioural
Execution patterns X X - X
Dynamic analysis - X - X
Identiﬁer analysis X - X -
Language clues X - X -
Unique methods X - - X
Method clustering X - X -
Call clustering X - - X
Fan-in analysis X - - X
Clone detection (PDG) X - - X
Clone detection (token) X - X -
Clone detection (AST) X - - X
Table 2. Kind of input data and kind of analysis of each technique

Kind of input data and kind of analysis
Most techniques work on statically available data. ‘Dynamic analysis’
reasons about execution traces and thus requires executability of the
41
code under analysis. Only ‘Execution patterns’ works with both kinds

Comparison

Execution patterns X X - X
Dynamic analysis - X - X

few ic
Unique methods X - - X
nam
dy
Call clustering X - - X
Clone detection (PDG) X - - X

41

Comparison

X ok
o
s lX
Execution patterns X X -
e
qu -
Dynamic analysis - X -
ni ns -
Identiﬁer analysis X - X
ech oke ore
few ic tt
Language clues X - X
- me t at m info
Unique methods X - X
SX nly a ok av.
nam
o
Method clustering X - -
dy -o
e lo ./behX
Call clustering X -
om uct
-S tr
Clone detection (PDG) X - X
s

41

Comparison
Granularity Symptoms
method code fragment scattering tangling
Execution patterns X - X -
Dynamic analysis X - X X
Unique methods X - X -
Call clustering X - X -
Fan-in analysis X - X -
Clone detection - X X -
ble 3. Granularity of and symptoms looked for by each techn
Granularity of and symptoms looked for by each technique
42
e 3 summarises the ﬁnest level of granularity (methods or code

Comparison
wl
elo - ve
Language clues X X -
b l-e
X few d-
Unique methods X -
tho -
Method clustering X X -
e
Xm
Call clustering - X -
42

Comparison
wl
elo - ve ew ng
Language clues X X -
b l-e X f li -
X few d-
Unique methods
X tang -
tho -
Method clustering X
me
Call clustering X - X -
42

both scattering and tangling into account, by requiring that the methods
which occur in a single use-case scenario are implemented in multiple
classes (scattering), but also that these methods occur in multiple use-
Comparison
cases and thus are tangled with other concerns of the system.

Technique Largest case Size case Empirically
validated
Execution patterns Graﬃti 3100 methods/82KLOC -
Dynamic analysis JHotDraw 2800 methods/18KLOC -
Identiﬁer analysis JHotDraw 2800 methods/18KLOC -
Language clues PetStore 10KLOC -
Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -
Call clustering Banking example 12 methods -
Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -
Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique

An assessment of the validation of each technique
To provide more insights into the 43
validation of the techniques, Table 4
mentions the largest case on which each technique has been validated,

Comparison

validated
on ?
mm ark
co hm

enc
b


Comparison

validated
toy es
on ?
mm ark no pl
co hm am
ex
enc
b


Comparison

validated

ttle cal
toy es
on ? li i
mm ark no pl pir tion
co hm em ida
am
ex
enc val
b


that technique makes about how the crosscutting concerns are imple-
mented. Table 5 summarises these assumptions in terms of preconditions
that a system has to satisfy in order to find suitable aspect candidates
Comparison (4)
with a given technique.

Technique Preconditions on crosscutting concerns in the analysed program
Execution patterns Order of calls in context of crosscutting concern is always the same.
Dynamic analysis At least one use case exists that exposes the crosscutting concern
and another one that does not.
Identifier analysis Names of methods implementing the concern are alike.
Language clues Context of concern contains keywords which are synonyms for the
crosscutting concern.
Unique methods Concern is implemented by exactly one method.
Method clustering Names of methods implementing the concern are alike.
Call clustering Concern is implemented by calls to same methods from different
modules.
Fan-in analysis Concern is implemented in separate method which is called a high
number of times, or many methods implementing the concern call
the same method.
Clone detection Concern is implemented by reusing a certain code fragment.
able 5. What conditions does the implementation of the concerns have to satisfy
What conditions does the implementation of the concerns
der for a technique to find viable aspect candidates?
have to satisfy in order to find viable aspect candidates?
44
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-

that technique makes about how the crosscutting concerns are imple-
mented. Table 5 summarises these assumptions in terms of preconditions
that a system has to satisfy in order to find suitable aspect candidates
Comparison (4)
with a given technique.

Technique Preconditions on crosscutting concerns in the analysed program
Execution patterns Order of calls in context of crosscutting concern is always the same.
Dynamic analysis At least one use case exists that exposes the crosscutting concern

ake ns
and another one that does not.

s m ptio e
Identifier analysis Names of methods implementing the concern are alike.

ue m
Language clues Context of concern contains keywords which are synonyms for the
od
iq su
crosscutting concern.
chn t as rce c
Unique methods Concern is implemented by exactly one method.
Te en sou
Method clustering Names of methods implementing the concern are alike.

ffer t the
Call clustering Concern is implemented by calls to same methods from different
di u
modules.
bo
Fan-in analysis Concern is implemented in separate method which is called a high
a
number of times, or many methods implementing the concern call
the same method.
Clone detection Concern is implemented by reusing a certain code fragment.
able 5. What conditions does the implementation of the concerns have to satisfy
What conditions does the implementation of the concerns
der for a technique to find viable aspect candidates?
have to satisfy in order to find viable aspect candidates?
44
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-

orously made use of naming conventions when implementing the cross-
cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that
methods which often get called together from within different contexts
Comparison
are candidate aspects. The fan-in technique assumes that crosscutting
concerns are implemented by methods which are called many times (large
footprint), or by methods calling such methods.

Technique User Involvement
Execution patterns Inspection of the resulting “recurring patterns”.
Dynamic analysis Selection of use cases and manual interpretation of results.
Identifier analysis Browsing of mined aspects using IDE integration.
Language clues Manual interpretation of resulting lexical chains.
Unique methods Inspection of the unique methods; eased by sorting on importance.
Method clustering Browsing of mined aspects using IDE integration.
Call clustering Manual inspection of resulting clusters.
Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.
Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the different techniques require?

Table 6 summarises the kind of involvement that is required from the
Which kind of user involvement
user. None dothe existing techniques works fully automatic. All tech-
of the different techniques require?
niques require that their users browse through the resulting aspect can-
45
didates in order to find suitable aspects. Some require that the users

orously made use of naming conventions when implementing the cross-
cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume that
methods which often get called together from within different contexts
Comparison
are candidate aspects. The fan-in technique assumes that crosscutting
concerns are implemented by methods which are called many times (large
footprint), or by methods calling such methods.

Technique User Involvement

till
Execution patterns Inspection of the resulting “recurring patterns”.
ts
Dynamic analysis Selection of use cases and manual interpretation of results.
for
l ef red
Identifier analysis Browsing of mined aspects using IDE integration.

ua ui
Language clues Manual interpretation of resulting lexical chains.

an eq
Unique methods Inspection of the unique methods; eased by sorting on importance.
Mr
Method clustering Browsing of mined aspects using IDE integration.
Call clustering Manual inspection of resulting clusters.
Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.
Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the different techniques require?

Table 6 summarises the kind of involvement that is required from the
Which kind of user involvement
user. None dothe existing techniques works fully automatic. All tech-
of the different techniques require?
niques require that their users browse through the resulting aspect can-
45
didates in order to find suitable aspects. Some require that the users

Taxonomy

method
fragments
AST-based
Clone Detection
Clone
PDG-based
Detection
Clone Detection

Token-based
Clone Detection

Structural / Token- Method
Behavioral Based Clustering

Identifier
Analysis
Language Clustering

method-level
Clues

Unique
Methods

Concept
Fan-in
Analysis
Analysis

Call
Clustering
Execution
Patterns
Static
Dynamic
Analysis
Dynamic
46

Overview
1. Early aspects
47

Summary (1)

‣ Aspect Mining
- Identiﬁcation of crosscutting concerns in legacy
systems
- Aid developers when migrating an existing
application to aspects
- Browsing approaches and automated techniques

48

Summary(2)
‣ Different techniques are complementary
- Different input (dynamic vs. static); granularity
- Rely on different assumptions/symptoms
- Make use of different analysis techniques
• cluster analysis, FCA, heuristics, ...
‣ When mining a system, apply (a combination of)
different techniques
49

Possible research
directions
‣ Quantitative comparison of techniques:
- Which aspects can be detected by which
technique?
- Common benchmark/analysis framework
‣ Cross-fertilization with other research domains
- slicing, metrics, ...

50

Limitations

‣ There is no “silver bullet”
- Still a lot of manual intervention required
- False positives / false negatives still remain
- Are we really mining for aspects?
• or is it “joinpoint mining”?

51

Limitations
‣ Joinpoint mining?
- how to obtain the pointcut deﬁnition?
- how obtain the advice?
‣ Scalability
- user involvement
‣ Validation
‣ Granularity
52

More reading...

‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code-
Level Aspect Mining Techniques. Submitted to Transactions on Aspect-
Oriented Software Development, Special Issue on Software Evolution. 2006.

‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T.
TOURWE. Applying and Combining Three Different Aspect Mining
Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September
2006.

53

A Survey Of Aspect Mining Approaches

Recommended

Recommended

More Related Content

Similar to A Survey Of Aspect Mining Approaches

Similar to A Survey Of Aspect Mining Approaches (20)

More from kim.mens

More from kim.mens (20)

Recently uploaded

Recently uploaded (20)

A Survey Of Aspect Mining Approaches