Model-Driven Development (MDD) is a software development methodology that focuses on the creation and maintenance of domain models as the primary form of expression in the development cycle. One of the fundamental characteristics of such approach is the reuse of software artifacts through their model representation. However, software reuse is impaired by the fact that current systems lack an efficient way to search through the model repositories as many of the current solutions don't tackle the relationships between model artifacts. These relationships are instead important to better satisfy the user information need in a model-driven development scenario.
This thesis aims to define a model-driven methodology for creating model search engines. As opposed to many related works, this methodology is metamodel-independent and exploits the metamodel of the searched project models in order to obtain more precise results. A prototype has been implemented to support such methodology.
We address two case studies that deal with the indexing and the retrieving of models from two different collections of UML and WebML projects respectively. Each case study involves several experiments adopting different indexing strategies. Finally, after having manually built the ground truth for each repository, we performed various tests using established Information Retrieval measures like DCG, MRR, MAP, Precision and Recall in order to evaluate the results.
Exploring the Future Potential of AI-Enabled Smartphone Processors
Model driven retrieval of model repositories
1. Politecnico di Milano
POLO TERRITORIALE DI COMO
Master of Science in Computer Engineering
Model-Driven Retrieval
of Model Repositories
Master graduation thesis by:
Supervisor: Prof. Marco Brambilla Stefano Celentano, ID: 755287
Assistant Supervisor: Prof. Alessandro Bozzon Lorenzo Furrer, ID: 750213
2. Model-Driven Retrieval of Model Repositories 2
Introduction
• Software models retrieval is essential for the paradigm of
Model-Driven Development (MDD)
• Current systems lack efficient and standardized
methodologies
• The metamodel is not taken into account
• Our contributions:
• A methodology for model-driven retrieval of model repositories that
takes into account the metamodels
• The development of a prototype
for such methodology
• Two case studies
• Evaluation of different test
configurations
3. Model-Driven Retrieval of Model Repositories 3
Outline
• Model retrieval approaches
• MDD and Metamodeling
• Our Approach:
Introduction & Methodology
• Abstract Solution
• Design Dimensions
• Indexing Strategies
• Prototype Architecture
• Case Studies
• UML Case Study
• WebML Case Study
• Tests and evaluation Prototype & Case studies
• Future works
4. Model-Driven Retrieval of Model Repositories 4
Model Retrieval Approaches
• Text-based
• Model representation: unstructured document
(bag of words) (e.g., Vector Space Model, Tf-idf)
• Query type: keyword-based
• Matching algorithm: standard IR similarity
measures (e.g., cosine similarity)
• Content-based
• Model representation: model structure is
taken into account (e.g., graph-based)
• Query type: search by example
• Matching algorithm: ad-hoc algorithms
(depends on the model representation)
5. Model-Driven Retrieval of Model Repositories 5
Model-Driven Development and
Meta-
Metamodeling metamodel
• A fundamental concept: Metamodel
«metamodel»
• MOF (Meta-Object
Model
Facility)
Instance
7. Model-Driven Retrieval of Model Repositories 7
Our Approach (2/3): Design Dimensions
• Segmentation Granularity
• Whole project
• Subproject
• Project concept
• Index structure
• Flat
• Weighted
• Multi-field
• Hybrid (e.g., multi-field index
containing weighted terms)
• Query type
• Keyword-based search
• Search by example
• Result presentation
• Snippet visualization
• Faceted search
8. Model-Driven Retrieval of Model Repositories 8
Our Approach (3/3): Indexing Strategies
Segmentation Index Index terms
Granularity weights
Experiment A Whole project Flat NO
Experiment B Metamodel Multi-field NO
concept
Experiment C Metamodel Multi-field Assigned
concept according to the
metamodel
concept
Experiment D* Metamodel Multi-field Assigned
concept according to the
metamodel
concept
* The indexing phase includes a graph-based algorithm that enriches the document
representation of a model element with information that are extracted from its
neighboring elements.
9. Model-Driven Retrieval of Model Repositories 9
Prototype Architecture
Configurator
Data
Source
BPEL
Crawler Router Queue Listener
pipeline
BPEL
Processor
• Based on SMILA: an
extensible framework for
building search solutions
to access unstructured
Analyzers
information.
• Uses Apache Solr: a
Index
scalable search platform Apache
featuring full-text search. Solr
10. Model-Driven Retrieval of Model Repositories 10
Prototype Architecture
Configurator
Data
Source
BPEL
Crawler Router Queue Listener
pipeline
BPEL
Processor
• Based on SMILA: an
extensible framework for
building search solutions
to access unstructured
Analyzers
information.
• Uses Apache Solr: a
Index
scalable search platform Apache
featuring full-text search. Solr
11. Model-Driven Retrieval of Model Repositories 11
Case Studies
• UML Class Diagram
• 84 meta-models from AtlanMod
• Small size
• General purpose
• WebML
• 12 real-life industrial projects
• Large size
• Large quantity of concepts
• Domain specific
12. Model-Driven Retrieval of Model Repositories 12
UML Case Study: Experiment A
• Granularity: Project
• Index: Flat
Content Field:
location commentsBefore
commentsAfter entries
predicates name type
allFields fields predicate
name expression field
value LocatedElement
Query Entry Field
Predicate Expression
13. Model-Driven Retrieval of Model Repositories 13
UML Case Study: Experiment B
• Granularity: Class
• Index: Multi-Field
ProjectName Field:
BQL
ClassName Field:
Entry
AttributeNames Field:
name type allFields fields
predicate
14. Model-Driven Retrieval of Model Repositories 14
UML Case Study: Experiment C
• Granularity: Class
• Index: Multi-Field, Weighted
ProjectName Field:
BQL|1.0
ClassName Field:
Entry|1.7
AttributeNames Field:
name|1.0 type|1.0
allFields|1.0 fields|1.5
predicate|1.6
15. Model-Driven Retrieval of Model Repositories 15
UML Case Study: Experiment D
• Granularity: Class
• Index: Multi-Field, Weighted
ProjectName Field:
BQL|1.0
ClassName Field:
Entry|1.7
AttributeNames Field:
name|0.75 location|0.9
commentsBefore|0.9
commentsAfter|0.9 name|1.0
type|1.0 allFields|1.0
predicate|1.6 fields|1.3
Predicate|0.765 Query|0.816
#HOP = 1 Field|0.85 LocatedElement|0.9
16. Model-Driven Retrieval of Model Repositories 16
WebML Case Study: Experiment B
• Granularity: Area
• Index: Multi-Field
AreaName Field:
Book requests
Content Field:
Book requests Create
book ConnectUserToBook
New book request New
book User Book request
list
17. Model-Driven Retrieval of Model Repositories 17
WebML Case Study: Experiment C
• Granularity: Area
• Index: Multi-Field, Weighted
AreaName Field:
Book|1.2 requests|1.2
Content Field:
Create|1.0 book |1.0
ConnectUserToBook|1.0
New|1.1 book|1.1
request|1.1 New|1.0
Book|1.0 User|1.0
Book |1.1request|1.1
list|1.1
18. Model-Driven Retrieval of Model Repositories 18
Tests and Evaluation: Meta-queries
Meta-queries Type of Information need
searched
document
1 Project All projects related to one specific topic
2 Project All projects related to one general topic
3 Pattern Searches for a pattern by using as query
string the terms belonging to different classes
connected by some relation
4 Class Searching for a class using as query string
all (or some) of the terms belonging to a
class
5 Class Searching for a class using as query string
some of the terms belonging to a class and
some terms related to the project
19. Model-Driven Retrieval of Model Repositories 19
UML Experiment A (Project Granularity, Flat Index)
• DCG and iDCG are very
close in the first 3 positions.
• ALWAYS able to retrieve
the most relevant document
in the first position.
20. Model-Driven Retrieval of Model Repositories 20
Other UML Experiments
• Weighted experiment is always
better than the non-weighted one.
• Both Experiments B and C are
close to the ideal curve in the first
positions.
• Experiment D is supposed to
answer a different user need than
the one captured by the used
ground truth.
21. Model-Driven Retrieval of Model Repositories 21
WebML Experiments
• Experiments B and C perform
identically up to the third position.
• After that, the experiment using
weights performs always slightly
better than the non-weighted one.
22. Model-Driven Retrieval of Model Repositories 22
Conclusions
• The system has been tested with both a general purpose and a
domain specific modeling language.
• Good performances in the first rank positions.
• Performances of the weighted case are always better or equal than
the others, albeit slightly.
• The prototype has shown good results in retrieving documents that
are relevant in terms of conceptual and terminological similarity.
• Structural similarity is difficult to capture in a text-based search.
Future Directions
• Integrating a content-based solution
• Metamodel integration
• Testing more configurations
• Weight training
Editor's Notes
The tests for the UML case study involve different types of keyword-basedquery. Each type, that in the following is called “meta-query”, hasdifferent characteristics in terms of the document that is searched by thequery (e.g., project, class) and in terms of the information need that isexpressed through the query (e.g., the user may want to search a specificproject or all the projects related to a topic). We first outlined a set of fivemeta-queries, then we chose two of them. For each of these, we built a setof ten instances that we used to test the UML case.The tests for the WebML case involve a set of ten queries.
Tentativoditenerecontodellastruttura del modello in unacosa text-based. Nonmigliorarilevare in termini dirilevanza ma è utile in casi “esplorativi”.Besides retrieving the relevant classeswith respect to a query, Experiment D retrieves their neighboring classestoo, which are not necessarily relevant to that query. These neighboringclasses are present among the results because they have imported termsthat are part of the query string. Since their “content” field is larger due to the imported terms, those neighboring classes are penalized by the Field-Norm and, at the same time, the truly relevant classes are ranked in a higher position, therefore the results are better. To conclude, the FieldNorm helpswhen it penalizes classes that are retrieved only because they are neighboring of relevant classes, but it provides misleading results when it penalizes the relevant classes due to the larger size of their “content” field after the import algorithm.