Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants
1. Documenting the feature implementations
mined from the OO source code
of a collection of software variants
Rafat AL-MSIE’DEEN, Abdelhak-Djamel SERIAI, Marianne HUCHARD,
Christelle URTADO and Sylvain VAUTTIER
LIRMM / CNRS and Montpellier 2 University - Montpellier - France
LGI2P / Ecole des Mines d’Alès - Nîmes - France
2. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Outline
Context of software product line (reverse) engineering
The big picture of our proposal for software product line reverse
engineering
Overview of the feature documentation process
Used techniques (in a nutshell)
Step by step feature documentation process on an example
Conclusion and perspectives
3. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Undisciplined development of software product variants
Software product variants are often developed in an undisciplined
manner.
ad-hoc reuse: copying and modifying previous software code (clone and own).
Results is a set of software products that:
Implicitly share some (but not all) code,
Are hard to understand, maintain, evolve.
Developing new variants still requires efforts.
Context
4. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Benefits of software product line engineering
Expliciting shared (common) features and specific (variable) ones is a
plus.
There is an abstract model of the developed products
increases understandability
Products are easier to maintain and evolve
single point of maintenance
Future products are easier to develop
disciplined reuse
Software product line engineering
Domain engineering
A repository for reusable software feature implementations
A model of valid software products
– FODA feature model
Application engineering
A software application development process based on feature selection
Context
5. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Software product line engineering
My product line
Circle
Square
The triangles
The rectangles
Rectangle
Right triangleEquilateral triangle
Parallelogram
Feature implementation repository
FODA feature model
mandatory
optional
alternative (xor)
or
Domain engineering
Software product variants’ code
(applications)
Application engineering
Context
6. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Software product line (reverse) engineering
My product line
Circle
Square
The rectangles
Rectangle
Equilateral triangle
Parallelogram
Feature implementation repository
FODA feature model
mandatory
optional
alternative (xor)
or
Domain engineering
Software product variants’ code
(applications)
Application engineering
The triangles
Right triangle
New product
derivation
Software product line
reverse engineering
FODA: Feature-oriented domain analysis
Context
7. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALESBig picture
Software product line reverse engineering process
Automatically builds a « canonical » feature model from:
the object-orientes source code of software variants,
the software variant use-cases, if available.
Assumes:
software code is object-oriented,
software code statically reflects the features it implements,
no pre-compiling, macros, parameterizable code, etc.
software code respects best practices relatively to naming (names for source
code elements are relevant),
a given feature is always implemented with identical code,
feature implementations are disjoint,
features are functional.
8. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Software product line reverse engineering process
Software product
variants’ code
STEP 1
Feature mining
Common features’
implementations
Variable features’
implementations
Circle
Square
Rectangle
Equilateral triangle
Parallelogram
Right triangle
STEP 2
Feature
documentation
My product line
Circle
Square
XOR
Rectangle
Equilateral triangle
Parallelogram
OR
Right triangle
« Canonical » FODA feature model
with constraints
Named and documented
common and variable features
STEP 3
Feature model building
use-case-1
use-case-2
Use cases
and their descriptions
Big picture
9. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Feature documentation
Feature documentation
automatically assigning a meaningful name to previously mined feature
implementations
can be based either on:
source code analysis (using the most frequent words extracted from the feature’s
OO source code elements, i.e. identifiers),
use-case names, when available (assuming a functional feature to use-case
correspondance).
Automatically assigning a use-case name to a feature implementation
amounts to automatically identify the use-case that corresponds to a
feature implementation.
Overview
10. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Feature documentation process
A three step process:
eliminate unsignificant use-cases to search from (to reduce search space for
next step),
making small groups of use-cases and their corresponding features: hybrid
blocks
among each hybrid block, compute textual similarity between the features’
code and the use-cases
use these similarities to build the input of a clustering technique that
associates a use-case (therefore its name) to each feature implementation.
Using three techniques:
Relational Concept Analysis (RCA) for step 1,
Latent semantic indexing (LSI) for step 2,
Formal concept analysis (FCA) for step 3.
Overview
11. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
FCA and RCA in a nutshell
Techniques
Formal Concept Analysis Concept Lattices
[Barbut & Monjardet 1970 ; Ganter & Wille 1999]
Extract abstractions from a set of objects described by attibutes
Max. set of objects (extent) that share a max. set of attributes (intent)
x
a4 a5a3a2a1
xxxo4
xxo3
xxo2
xo1
Binary context
({o4},{a1,a3,a5})({o3},{a1,a3,a4})
({o1,o2,o3,o4},{})
({o3,o4},{a1,a3})
({},{a1,a2,a3,a4,fa})
({o2,o3,o4},{a3})({o1,o3,o4},{a1})
Concept lattice
({o2},{a2,a3})
12. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALESTechniques
FCA and RCA in a nutshell
Technique to simplify the reading
Represent only attibutes (resp. objects) when they first appear from top down
(resp. botttom up): simplified intent (resp. extent).
Technique to tame complexity
consider a sub-order by removing concepts that have an empty simplified
intent and empty simplified extent: The lattice becomes an AOC-poset.
Relational Concept Analysis
An iterative version of RCA in which objects are described by attributes and
relations.
Generates a relational lattice family
13. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
LSI in a nutshell
LSI is an information retrieval technique that compute lexical similarity
between documents.
It is based on the occurrence of terms in documents.
The number of terms (k) is a parameter of this method.
The Term Frequency - Inverse Document Frequency (TF-IDF) weighting
scheme is applied.
Cosine similarity is computed.
Before their analysis, texts are pre-processed:
Stop words, articles, punctuation marks, numbers are filtered out.
All text is lower cased.
Text is stemmed (using WordNet API).
Techniques
14. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
The use-case diagrams of the 2nd and 4th MTG software variants
Mobile tourist guide (MTG) software variants
Step by step
The previously mined feature implementations
presence in each MTG software variant
(part of the relational context family)
15. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Relational Context Family (RCF)
Step by step
Use case presence in each MTG
software variant
(part of the relational context family) Use case and feature implementation co-occurrence
(part of the relational context family)
16. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Concept Lattice Family (CLF)
s
Step by step
17. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Generating the hybrid blocks from the CLF
Step by step
18. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Documents: Feature implementations
Hybrid Block_1
public
viewMap(){
int a = 0;
while (a > 5)
{
if (a != 20) {
} else {
a = 30;
}}}
View Map
Queries: Use-cases and their descriptions
Constructing a raw corpus from each hybrid block
Step by step
19. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Measuring an hybrid block similarity based on LSI
Step by step
20. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Measuring an hybrid block similarity based on LSI
Step by step
21. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Clustering use-cases described by features
Step by step
22. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Conclusion
Software product
variants’ code
STEP 1
Feature mining
Common features’
implementations
Variable features’
implementations
Circle
Square
Rectangle
Equilateral triangle
Parallelogram
Right triangle
STEP 2
Feature
documentation
My product line
Circle
Square
XOR
Rectangle
Equilateral triangle
Parallelogram
OR
Right triangle
« Canonical » FODA feature model
with constraints
STEP 3
Feature model building
use-case-1
use-case-2
Use cases
and their descriptions
Conclusion
23. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Conclusion
Software product
variants’ code
STEP 1
Feature mining
Common features’
implementations
Variable features’
implementations
Circle
Square
Rectangle
Equilateral triangle
Parallelogram
Right triangle
STEP 2
Feature
documentation
My product line
Circle
Square
XOR
Rectangle
Equilateral triangle
Parallelogram
OR
Right triangle
« Canonical » FODA feature model
with constraints
STEP 3
Feature model building
use-case-1
use-case-2
Use cases
and their descriptions
Use
1) RCA to reduce search space,
2) LSI to compute similarity between use cases and feature implementations
3) and, FCA to build the feature to use case correspondance (clustering).
Conclusion
24. Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES
Conclusion
Related contributions
Feature mining (step 1) @ last year’s SEKE
Tool implementation
Experiments on 3 real use cases
Feature model building (and constraint identification)
Perspectives
Extend the scope
Consider feature evolution
Add dynamic code analysis
Improve the techniques
Search based algorithms as an alternative clustering technique
Automatically identify junctions between feature implementations
Build a « more » hierarchical feature model
Experiment at a wider scale.
Conclusion
25. Documenting the feature implementations
mined from the OO source code
of a collection of software variants
Rafat AL-MSIE’DEEN, Abdelhak-Djamel SERIAI, Marianne HUCHARD,
Christelle URTADO and Sylvain VAUTTIER
Contact info:
Christelle.Urtado@mines-ales.fr
http://www.lgi2p.mines-ales.fr/~urtado