CAiSE 2014 An adapter-based approach for M2T transformations
1. An Adapter-Based
Approach to Co-evolve
Generated SQL in M2T
Transformations
Jokin García1
, Oscar Díaz1
and Jordi Cabot2
Onekin1
University of the Basque Country, Spain
AtlanMod2
Ecole des Mines, Nantes (France)
Thessaloniki - 19th
of june, 2014
CAiSE
3. 3
Problem statement: context
Software components on top of platforms
Dependencies
Platform evolution is a common situation.
DB
API
APIApplication
13. 13
Process outline
Code
(MediaWiki DB)
New MediaWiki
schema
Old MediaWiki
schema
Domain
model
Difference
model
New schema
model
Old schema
model
Don't worry:
all in one click
15. 15
Process: Schema Modification
Operators (SMO)
SMO % of
usage
Change
type
Adaptation
Create table 8.9 NBC New comment in the transformation on the existence of this
table in the new version
Drop table 3.3 BRC Delete statement associated to the table
Rename table 1.1 BRC Update name
Copy table 2.2 NBC (None)
Add column 38.7 NBC/
BRC
For insert statements: if the attribute is Not Null, add the new
column in the statement with a default value (from the DB if
there is available or according to the type if there is not)
Drop column 26.4 BRC Delete the column and the value in the statement
Rename column 16 BRC Update name
Copy column 0.4 BRC Like add column case
Move column 1.5 BRC Like drop column + add column cases
16. 16
Process: Adaptation
Plaform-specific, schema-independent
Replace all “println” instructions with “printSQL”
Import “printSQL” library
ZQL extension
For each printSQL invocation:
Iterate over the changes reported in the Difference model
Checks if any of the changes impacts the current statement.
Needed information to adapt the statement is retrieved and
added to a list of parameters: the statement, affected table,
column, …
A function that adapts the statement is called and new
statement is printed.
18. 18
Roles
Producer
Injector for target platform
Implement adapter as a library for transformation
Consumer
Import adapter library in the transformation
Execute the batch
19. 19
Evaluation
Manual Cost = D + P * #Impacts
D: Detection time
P: Propagation time
Assisted Cost = C + V * #Impacts
C: Configuration time
V: Verification time
21. 21
Dump changes from code to
transformation
Assist manual propagation
Record generation with change to be done and where (line and column
in the transformation)
M2T
transformation
M2T
transformation’HOT
print(“select * from …”) printSQL(“select * from …”, line, column)
RECORD:
#Added columns cl_type, cl_sortkey_prefix and cl_collation
#transformation line: 12, column: 11
INSERT INTO categorylinks (cl_from, cl_to, cl_sortkey, cl_timestamp, cl_type,
cl_sortkey_prefix, cl_collation) VALUES (@pageId, ‘Softwareproject’,
‘House_Testing’, (DATE_FORMAT(CURRENT_TIMESTAMP(), ‘%Y%m%d%k
%i%s’), ‘page’, ‘’, ‘0’);
22. 22
Conclusions
Mechanism to adapt code generated by M2T transformations
to platform evolution
Apply in a specific case study
Premises: platform instability and transformation coupling
23. 23
Issues and future work
Generalization: other platforms
Methodology for adapter development
25. 25
Process: Adaptation
1- Iterate over the changes reported in the Difference
model
2- check that the deleted
column's table corresponds
with the table name of the
statement
3- the statement,
the table name and
the removed
column are added
to a list of
parameters
4- outputs an SQL statement without the removed
column, using a function with the list of parameters
that modifies the expression
Good afternoon. It’s an honor for me to present my work entitled “An adapter based…”
The structure I will use in the presentation is the following:
First, I am going to focus on the problem statement, putting the problem into context.
Secondly, I am going to motivate the problem with an specific scenario, that will be used along the presentation
Then, I am going to explain what solution I am proposing to the problem.
Later, I am going to show the results of the evaluation I carried out.
And finally, I am going to end with the conclusions and future work
context: Broadly speaking, software components, often, do not work in isolation, but are built on top of platforms that provide some functionality. While this offers numerous advantages, it also creates dependencies.
The evolution of these platforms is a common situation.
One paradigmatic platform is a database. In the database world, the evolution of schemas has always been a concern. This is the platform that has been used in this work
Another example would be APIs, which evolution can make client applications get outdated.
Two characteristics of platforms that make them problematic, are:
On the one hand, the perpetual beta phenomenon. This means that developers have to work with software components that are in a beta version as if they were in production level. This increases the frequency of releases, and therefore the number of the required co-evolution actions.
- On the other hand, the platform is often an external dependency, i.e., it belongs to a different organization. Changes in these components are usually out of the control of the rest of partners, and might be accompanied by poor documentation, lost communication with the partner responsible for the change, and so on. This rules out the possibility of tracking platform upgrades to be later replicated.
M2T transformations are composed of static and dynamic parts. They interleave target platform code, instructions from the transformation language (conditional and iteratation instructions) and references to the input model.
Platform-specific code (information of the tables, columns, ... ) is embedded in the transformation
Therefore, there is domain variability but not platform variability.
• In a database scenario, transformations do not specify but construct SQL scripts. The SQL script is dynamically generated once references to the input model are resolved. In the transformation, there are references to a model which metamodel is unknown a priori, that will be resolved at runtime.
problem: Forward Engineering advocates for code to be generated dynamically through M2T transformations that target a specific platform. In this setting, platform evolution can leave the transformation, and hence the generated code, outdated.
Where is platform-dependent information? In the transformation. MDA guide: 10 years later
http://modeling-languages.com/anybody-using-both-mda-platform-independent-and-platform-specific-models/
Solution: It is proposed, to do transformations more resilient to changes to add an adaptability mechanism for the most vulnerable parts of the transformation, those which are dependent on the platform.
The solution it is proposed for this problem is to use the well-known adapter pattern from Object Oriented with M2T transformations.
We actually suffered this problem in a project. In the CAiSE of 2012 it was presented a tool called Wikiwhirl.
WikiWhirl abstracts wiki structure in terms of mindmaps, where refactoring operations of the wiki (WikiWhirl expressions) are expressed as reshapes of mindmaps. Since wikis end up being supported as DBs, WikiWhirl expressions are transformed into SQL scripts.
WikiWhirl is a Domain-Specific Language (DSL) built on top of MediaWiki. WikiWhirl is interpreted, i.e. a WikiWhirl model (an expression described along the WikiWhirl syntax) delivers an SQL script that is enacted. The matter is that this SQL script goes along the MediaWiki DB schema. If this schema changes, the script might break apart. Since MediaWiki is part of the Wikimedia Foundation, we do not have control upon what and when MediaWiki releases are delivered. And release frequency can be large, which introduces a heavy maintenance burden upon WikiWhirl.
In other words, That mindmap (upper part of the figure) will be manipulated by the user in order to refactor the wiki; and finally that changes will be propagated to the actual wiki, in the bottom of the figure. This begs the question of how to make WikiWhirl be resilient to MediaWiki upgrades.
This database scenario is used as a paradigmatic example of platform evolution. A well-known database has been used: Mediawiki database, which is used by the Wikipedia.
MediaWiki is a wiki engine, currently used by almost 40,000 wikis. In a 4½ year period, the MediaWiki DB had 171 schema upgrades. This gives us an idea of the importance and frequency of changes.
This slide shows a snippet of the transformation from the mindmap to the wiki. These statements are built upon the DB schema of MediaWiki, and in so doing, create an external dependency of WikiWhirl w.r.t. MediaWiki. We can see the table and column names in the prints
To tackle the mentioned problem, data manipulation requests (i.e. insert, delete, update, select) are re-directed to the adapter during the transformation. The adapter outputs the code according to the latest schema release.
In this way, the main source of instability (i.e. schema upgrades) is isolated in the adapter. We do not need to change the transformation.
The process overview would be the following.
First, DB schemas (i.e. New schema, Old Schema) are injected as Ecore models with Schemol tool(step 1); next, the schema difference is computed (i.e. Difference model) with EMFCompare (step 2); finally, this schema difference feeds the adapter used by the transformation (i.e. MOFScript program).
As everything is implemeted in Java and Ant, the whole process can be executed with a batch file.
Inject and compare example: As we can see, “trackbacks” and “math” tables have been removed. “user_options” column has been removed from “user” table and three new columns have been added to “categorylinks” table.
As we can see, both schema versions are transformed into models with Schemol, and after comparing with EMFCompare, a difference model is retrieved.
The Difference model is described as a set of DB operators. Curino et al. proved that a set of (eleven) Schema Modification Operators (SMO) can completely describe a complex schema evolution scenario (in fact, they did the experiment for the Wikipedia). The Table indicates the frequency of these change for the MediaWiki case. Most frequent changes (e.g. 'create table', 'add column', 'drop column' or 'rename column') can be identified from schema differences. Complex changes (e.g. 'distribute table' or 'merge table') are a sequence of simple changes. Fortunately, as we can see in the “change type” column, most of the changes are NBC or BRC, which means that human intervention is not required for their adaptation.
For each SMO, there is an adaptation action that restores the consistency. For instance, if a column is removed, that column will be removed as well from the statements.
NBC: Non-Breaking Changes. Changes that do not affect the code or transformation
BRC: Breaking Resolvable Changes: Changes that can be automatically propagated
BUC: Breaking Unresolvable Changes: changes that require human intervention to propagate the changes.
In [Model transformation co-evolution: a semi-automatic approach] we propose some rules (implemented in a M2M transformation) to relate simple changes to build complex ones. For instance, there is a move column case if the same column is deleted from a table and added to another table. Unfortunately, distribute table and merge table cases cannot be automatically detected and therefore are not included in the table. This kind of changes tend to be scarce. For MediaWiki, 'distribute table' never occurred while 'merge table' accounts for 1,5% of the total changes.
The adapter is platform-specific, and will only adapt SQL code, but it is schema-agnostic (it does not matter what type of DB schema has to be managed).
3-The approach mainly consists of replacing the “print” statements with invocations to the adapter (e.g. printSQL function). On the invocation, the adapter checks whether the <SQL statement> acts upon a table that is being subject to change. If so, the adapter returns a piece of SQL code compliant with the new DB schema.
The adaptations are implemented in a library that has to be imported in the M2T transformation. This library contains functions that adapt the statements to the last version of the platform, leaving the transformation untouched. The adapter is implicitly called by the transformation at runtime, adapting the statements to the changes.
Implementation wise, the adapter has two inputs: the Difference model and the model for the new schema model (to obtain the full description of new attributes, if applicable). The ZQL open-source SQL parser is used to parse SQL statements to Java structures. This parser is extended to account for adaptation functions to modify the statements (e.g. removeColumn). The snippet provides a glimpse of the adapter for the case “remove column”. The structure is similar for the other adaptations too. It starts by iterating over the changes reported in the Difference model (line 5). Next, it checks (line 6) that the deleted column's table corresponds with the table name of the statement (retrieved in lines 3-4). Then, all, the statement, the table name and the removed column are added to a list of parameters (lines 7-10). Finally, the adapter outputs an SQL statement without the removed column, using a function with the list of parameters that modifies the expression (lines 12-13).
Going back to our scenario, this would an example of the output:
1. the introduction of three new attributes in the “categorylinks” table, namely, cl_type, cl_sortkey_prefix and cl_collation. Accordingly, the adapter modifies SQL insert/update statements where new columns which are 'Not Null' are initialized with their default values;
2. the deletion of tables “math” and “trackback”. This causes the affected printSQL statements to be left as a comment;
3. the deletion of column “user_options” in the “user” table. Consequently, the affected printSQL statements, output the SQL but removing the affected column. In addition, a comment is introduced to note this fact (lines 8-13 below).
I want to make notice that there are two roles. On the one hand the adapter producers are those that implements the adapter for a specific platform. They have to implement both the injector and the adapter.
On the other hand the consumers are transformation developers that simply use the adapter.
It has been done an evaluation that compares the performance of the approach with the manual adaptation. The experiment was conducted by 8 PhD students. This evaluation is done from the point of view of the consumer.
They were two groups: one of them had to do the adaptation manually and the other using the adapter:
- Manual: participants had to check the MediaWiki website, navigate through the hyperlinks, and collect those changes that might impact the code. The experiment outputted an average of 38' for D_Mediawiki. I found that this was the most cumbersome task. But it will depend on the scenario. Next, the designer peers at the code, updates it, and checks the generated code. On average, this accounts for 4' for a single update (i.e. PBR)
-Assisted: Participants conducted two tasks: (1) configuration of the batch that launches the assisted adaptation, and (2), verification of the generated SQL script.
(Some developers check what have been generated by the transformation and others not)
To compute the profitability of the approach for another platform, it is suggested to apply specific constant values (D, P, V) to cost equations.
D: the time estimated for detecting whether the new MediaWiki release impacts the transformation,
P: the time needed to Propagate a single change to the MOFScript code, and
#Impacts: the number of instructions in the transformation Impacted by the upgrade.
D very much depends on the documentation available.
C: the time needed to Configure the batch;
V: the time needed to Verify that a single automatically adapted instruction is correct and to alter it, if applicable.
The cost reduction rests on the existence of an infrastructure, namely, the adapter and the batch. The adapter is domain-agnostic, and hence, can be reused in other domains. On these grounds, I do not consider the adapter as part of the development effort. As I said, the evaluation is from the point of view of the consumer. However, there is a cost of familiarizing with the tool, that includes the configuration of the batch (e.g. DB settings, file paths and the like), and above all, the learning time. We estimated this accounts for 120' (reflected as the upfront investment for the assisted approach in Figure). Penalize
On these grounds, the breakeven is reached after the third release.
After some evolution iterations, the developer may decide that she wants to transfer the changes done by the adapter to the transformation itself.
How can we dump the adaptations done in the code into the transformation?
What I propose is a semi-automatic solution: to do an impact analysis of the changes in the platform so the developer can adapt the transformation.
In the last step of the process (3) apart from adapting the generated code, it is created a record of the changes done, in case the developer wants to update the transformation itself so it can serve as an aid. This record contains the platform change, the transformation position affected by it (line and column) and the new statement
In order to do this impact analysis, first it is needed to add as parameters in the prints the line and column of them in the transformation. This is done automatically using a Higher-Order Transformation.
[Subir código y screencast de esto]
- It is advocated for a preventive approach where the transformation is engineered for schema instability. In this sense it has been presented a mechanism to adapt generated code in a Forward Engineering scenario to platform evolution.
- It has been applied in a specific case study (Mediawiki and Wikiwhirl)
- The suitability of the approach boils down to two main factors: the DB schema instability and the transformation coupling
-The main issue for me is that it has been proposed an adaptability technique and tried it in one platform. The question that arises is if it could be used with other platforms. For instance, with API evolution, XML configuration files and so on. I am more of synthetic thinking than analytic thinking, so I start from an example and then try to abstract.
- Related with the previous issue: For the role of the producer, it is needed a generic methodology that defines the steps needed to develop an adapter for any domain.
This evaluation has limitations: the number of participants is too small for what is recommended, and that there are few DB iterations. Regarding the participant number (8) I could not find more people with the required knowledge.
Than you for listening. Now it’s question time, I give the floor to you.
The adapter is platform-specific, and will only adapt SQL code, but it is domain-agnostic (it does not matter what type of DB schema has to be managed).
3-The approach mainly consists of replacing the “print” statements with invocations to the adapter (e.g. printSQL function). On the invocation, the adapter checks whether the <SQL statement> acts upon a table that is being subject to change. If so, the adapter returns a piece of SQL code compliant with the new DB schema.
The adaptations are implemented in a library that has to be imported in the M2T transformation. This library contains functions that adapt the statements to the last version of the platform, leaving the transformation untouched. The adapter is implicitly called by the transformation at runtime, adapting the statements to the changes.
Implementation wise, the adapter has two inputs: the Difference model and the model for the new schema model (to obtain the full description of new attributes, if applicable). The ZQL open-source SQL parser is used to parse SQL statements to Java structures. This parser is extended to account for adaptation functions to modify the statements (e.g. removeColumn). The snippet provides a glimpse of the adapter for the case “remove column”. The structure is similar for the other adaptations too. It starts by iterating over the changes reported in the Difference model (line 5). Next, it checks (line 6) that the deleted column's table corresponds with the table name of the statement (retrieved in lines 3-4). Then, all, the statement, the table name and the removed column are added to a list of parameters (lines 7-10). Finally, the adapter outputs an SQL statement without the removed column, using a function with the list of parameters that modifies the expression (lines 12-13).
More information on this traceability model and its visualization will be given in the ICMT next month