This PhD thesis proposes a method for involving end-users in domain-specific language (DSL) development. The method combines agile and model-driven development approaches. It includes stages for analysis, design, and validation. In the analysis stage, end-users provide requirements through user stories, usage scenarios, and a domain model. The design stage specifies syntax and semantics based on these requirements. Validation tests the DSL with end-users. The goal is to guide DSL development throughout the lifecycle while gathering domain experts' knowledge and feedback.
Involving End-Users in DSL Development for Genetic Analysis
1. Centro de Investigación ProS
An Agile Model-Driven Method for Involving
End-Users in DSL Development
MªJosé Villanueva del Pozo
PhD Thesis, 8th of January of 2016
Advisors:
Dr. Óscar Pastor López
Dr. Francisco Valverde Giromé
2. Index
2
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future Work
5. Motivation
5
“Software languages that target small domains and whose
language constructs are formed by domain concepts”
Villanueva, 2016
Are a solution for improving understanding in
software development
“Small languages that offer expressive power focused on a
particular problem domain”
Van Deursen, 2000
8. Motivation
Industrial motivation from
“We have challenges to analyse genetic data”
“We require to use state-of-the-art analytic tools”
“We need a tool highly customizable to each diagnosis”
8 1Instituto de Medicina Genómica. www.imegen.es
A unique tool is an unsustainable solution. They need an
infrastructure to continuously evolve their genetic analysis
A DSL for specifying genetic analysis
We are not experts neither genetics nor bioinformatics
1
9. Motivation
Academic motivation
“We want to develop a DSL for supporting genetic analysis”
“We don’t have enough knowledge about genetics”
“The collaboration of geneticists is essential”
“Geneticists don’t have enough development knowledge”
9
We require to involve geneticists in the DSL development process
Follow a DSL development method to involve end-users
Current approaches do not take end-users into account
10. Index
10
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future Work
11. PhD Goals
Propose a DSL development approach to involve end-users
1. To support complex application domains
2. A DSL for supporting genetic analysis
11
1
+
2
12. PhD Goals: Research Questions
RQ1. Is it essential to involve end-users in the development of
a DSL for a complex application domain?
→ Analyse a complex application domain and Illustrate the need
to involve end-users in DSL development
RQ2. Which are the available approaches to involve end-users
in DSL development?
→ Analyse state-of-the-art DSL development approaches that
involve end-users
12
13. PhD Goals: Research Questions
RQ3. How can we provide a methodological approach to
involve end-users in DSL development?
→ Propose a new method to involve end-users in DSL
development
RQ4. How can we validate that the solution proposed is a
suitable solution to involve end-users in DSL development?
→ Validate the proposed method and apply it with geneticists
13
14. Index
14
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demo
7. Conclusions and Future Work
15. State of the art
1. Foundations DSL development: Methodologies, guidelines, and
best practices
• Van Deursen et al. (2000): Terminology
• Spinellis (2001): Design patterns for DSL development
• Mernik et al.(2005): Stages for DSL development
• Voelter et al. (2008): Conceptual foundations, design and
implementation of DSLs
• Strembeck et al.(2009): systematic approach for guiding DSL developers
15
16. State of the art
2. DSL development approaches that take end-users into
account:
• Take into account end-user preferences during
development
• Apply an agile process to gather early feedback from end-
users
• Involve end-users in development activities
16
17. State of the art
1. Perez et al. (2011): Best practices from EUD
2. Nishino (2011): Cognitive dimensions and feature heuristics
3. Barisik et al. (2012): Goal-question-metric approach
4. Wuest et al. (2013): Sketching environment
5. Cho et al. (2012): Sketches, shape selection, and questions
6. Kuhrman et al. (2013): A DSL that uses sketches and views
7. Sanchez-Cuadrado et al. (2012): Sketches
8. Canovas et al. (2013): A collaborative infrastructure
17
18. State of the art
Analysis criteria:
Methodological support: All stages of DSL development
lifecycle are addressed
End-user involvement: Whether end-users are involved in
DSL development tasks and whether best practices from EUD
domain are applied
18
19. State of the art
19
Stage Activity Criteria 1 2 3 4 5 6 7 8
Analysis
Domain Analysis
Support S x x S S S S x
EU Inv x x x S S S S x
Domain Model Specification
Support PS x x x x x x x
EU Inv x x x x x x x x
Design
Abstract Syntax Specification
Support S x x S S S S S
EU Inv x x x S S x x S
Concrete Syntax Specification
Support S x x S S S x S
EU Inv x x x S S S x S
Semantic Restrictions
Specification
Support x x x PS S S x x
EU Inv x x x x S x x x
Behavioral Semantics
Specification
Support S x x x x x x x
EU Inv x x x x x x x x
Testing DSL infrastructure testing
Support x PS PS x x x x x
EU Inv x x PS x x x x x
Maintenance New requirements addition
Support x x x x x x x x
EU Inv x x x x x x x x
S: Supported PS: Partially Supported X: Not supported
20. State of the Art
Need for a proposal to fulfil the following requirements:
• Requirement 1: Guidance throughout the complete DSL
development life-cycle.
• Requirement 2: Feasibility of the DSL development time.
• Requirement 3: Gathering domain experts’ knowledge in
all the stages in which they can collaborate.
20
21. Index
21
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future Work
22. Method: Foundations
An agile model-driven method for involving end-users
22
Proposed in
Mernik et al. (2005)
23. Method: Foundations
Combination of MDD and Agile practices:
• Efficiency to the process
• Interface for end-users to provide feedback about certain
DSL artefacts
• Propagation of end-users’ feedback along the different DSL
artefacts
23
24. Method: Foundations
Combination of MDD and Agile practices
24
Conceptual modelling Model transformations
Iterative
development
User stories
TDD
Scenarios
Product Backlog
Architectural
Envisioning Acceptance
Tests
26. Method: Illustrative Example
Diagnose Diabetes Mellitus Type 2 (Analysis 1)
Read Variations genotypes from VCF file Patient1.vcf
Annotate Variations with gene, transcripts, polyphen
Filter Variations by genes {ABCC8, CAPN10, KCNJ11, … ,
GPD2, MNTR1B}
Filter Variations by predicted effect polyphen damaging
Report Variations with gene, predicted_effect
26
27. Method: The Analysis Stage
27
A
2.1 Iteration
Planning
Understand the domain and make
domain knowledge explicit
Decision
Design
Implementation
Testing
Analysis
2.2 DSL
Requirements
Specification
2.3 Domain
Modelling
28. Method: The Analysis Stage
28
A
2.1 Iteration
Planning
Product Backlog
Classification Requirements
Previous Iterations (Done)
Annotate Variations with Gene
Filter Variations by Gene
Report Variations’ Properties
Report Variations’ Gene
Current Iteration (To do)
Read Genotypes of several samples from a VCF File
Annotate Variations with Transcripts Names
Annotate Variations with POLYPHEN predicted effect
Filter Variations by POLYPHEN predicted effect
Report Variation’s POLYPHEN predicted effect
Decision
Design
Implementation
Testing
Analysis
29. Method: The Analysis Stage
29
Decision
Design
Implementation
Testing
2.2 DSL
Requirements
Specification
Analysis
User Story Filter Variations by Polyphen predicted effect
Description As a geneticist, I want to filter the sample’s variations by the predicted effect by
POLYPHEN (probably_damaging, possibly_damaging, benign), so that I can see only
the variations that pass the filter”
Role Mandatory Action Goal
Geneticist No Filter sample’s variations by a set of
POLYPHEN predicted effects (benign,
possibly_damaging, probably_damaging)
Seeing only the
variations that pass
the filter
Acceptance Test AT1
Description As a geneticist, given the variations chr2:g.136438366A>G {}, chr11:g.111959693G>T
{probably damaging}, chr17:g.41245471C>T {benign}, when I filter the variations by
the POLYPHEN predicted effect possibly damaging I will see the variation
chr11:g.111959693G>T
Role Input Action Response
Geneticist chr2:g.136438366A>G {}
chr11:g.111959693G>T {probably
damaging}
chr17:g.41245471C>T {benign}
Filter by
POLYPHEN
damaging
chr11:g.111959693G>T
{probably damaging}
Mechanism M1: End-User requirement templates
User Story: Need of end-users
Acceptance Test: Real example of this need
30. Method: The Analysis Stage
30
Usage Scenario Usage Scenario Diabetes Mellitus Type 2 (Analysis 1)
Description In order to research the diabetes mellitus type 2 disease:
I want to read the genotypes of several samples from a VCF file.
I want to annotate the variations with their genes, with all the names of the
transcripts that they hit, and the score and predicted effect of POLYPHEN.
I want to filter the variations by the diabetes genes “ABCC8, CAPN10,KCNJ11,
GCGR, SLC2A2, HNF4A, INS, INSR, PPARG, TCFl2, ADIPOQ, AKT2, PAX4,
MAPK81p1, GPD2, MNTR1B”, and by “possibly damaging” or “probably
damaging” variations according to POLYPHEN.
I want to create a report with the variations main properties, their genes,
their transcript names, and their POLYPHEN predictions.
Decision
Design
Implementation
Testing
Analysis
Dependency DP1
Description When I filter variations by POLYPHEN predicted effects, if variations have not been
annotated with POLYPHEN predicted effect, I will see the error “Variations must be
annotated with POLYPHEN predicted effect before filtering”
Precondition Action Error Message
Annotate variations with
POLYPHEN predicted effect
Filter variations by a set
of POLYPHEN predicted
effects
“Variations must be annotated with
POLYPHEN predicted effect before
filtering”
2.2 DSL
Requirements
Specification
Dependency: Relationships between user stories
Usage Scenario: Real example of several user stories
Mechanism M1: End-User requirement templates
31. Method: The Analysis Stage
31
User Story Filter by Polyphen predicted effect
Description As DSL user, I want to order a filter by a list of POLYPHEN predicted effects, so that
variations can be filtered by these predicted effects
Role Mandatory Action Goal
DSL user No Write Filter and a list of
POLYPHEN predicted effects
Variations can be filtered by
these predicted effects
Acceptance Test
Dependency
Usage Scenarios
Decision
Design
Implementation
Testing
Analysis
Mechanism M1: Language requirement templates
2.2 DSL
Requirements
Specification
User Story Filter Variations by Polyphen predicted effect
Description As a geneticist, I want to filter the sample’s variations by the predicted effect by
POLYPHEN (probably_damaging, possibly_damaging, benign), so that I can see only
the variations that pass the filter”
Role Mandatory Action Goal
Geneticist No Filter sample’s variations by a set of
POLYPHEN predicted effects (benign,
possibly_damaging, probably_damaging)
Seeing only the
variations that pass
the filter
32. Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
As a DSL user, I want to order a filter by a list of POLYPHEN predicted effects,
so that variations can be filtered by these predicted effects
Method: The Analysis Stage
32
A
2.3 Domain
Modelling
Feature Model Concepts Model Vocabulary
Decision
Design
Implementation
Testing
Analysis
ACTION
33. User Story
As a DSL user, I want to order a filter by a list of POLYPHEN predicted effects,
so that variations can be filtered by these predicted effects
Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
Method: The Analysis Stage
33
2.3 Domain
Modelling
Feature Model Concepts Model Vocabulary
Decision
Design
Implementation
Testing
Analysis
ACTION AND GOAL
Variation
Predicted
Effect
Algorithm name
Effect
34. Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
As a DSL user, I want to order a filter by a list of POLYPHEN predicted effects,
so that variations can be filtered by these predicted effects
Method: The Analysis Stage
34
2.3 Domain
Modelling
Feature Model Concepts Model Vocabulary
Decision
Design
Implementation
Testing
Analysis
ACTION
relationship
Variation
Predicted
Effect
Algorithm name
Effect
*
35. Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
As a DSL user, I want to order a filter by a list of POLYPHEN predicted effects,
so that variations can be filtered by these predicted effects
Method: The Analysis Stage
35
2.3 Domain
Modelling
Feature Model Concepts Model Vocabulary
Decision
Design
Implementation
Testing
Analysis
Variation
Predicted
Effect
Algorithm name
Effect
Variation: Each of the
nucleotides that the
sample has different in
regards to a reference
sequence
Predicted Effect: Result
of the execution of a
prediction algorithm that
assesses the effect of the
variation in an individual
36. Method: The Design Stage
36
3.1 Syntax
Preferences
Design
Artefacts that specify syntax and semantics
Decision
Analysis
Implementation
Testing
3.2 Abstract and
Concrete Syntax
3.3 Semantic
Restrictions
3.3 Behavioral
Semantics
37. Method: The Design Stage
37
3.1 Syntax
Preferences
Design
Decision to be made: Internal vs External
Decision
Analysis
Implementation
Testing
Existing
Language?
1
Programming
Libraries
2
Learn New
Language
4
Syntax
Freedom
3
38. Method: The Design Stage
38
A
3.2 Abstract and
Concrete Syntax
Feature
Model
Concepts
Model
Relationships
Abstract Syntax Metamodel
*
PredictedEffectF
Predicted Effect
AlgorithmName
Effect
Filter
Gene
*
disjoint
Filter
Gene Predicted
effect
Variation
Predicted
Effect
Algorithm name
Effect
Design
Decision
Analysis
Implementation
Testing
39. Method: The Design Stage
39
3.2 Abstract and
Concrete Syntax
Usage Scenario:
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Annotate Variations with POLYPHEN
Filter Variations by predicted effect
POLYPHEN damaging
GeneticAnalysis.Annotation(POLYPHEN)
GeneticAnalysis.Filter(POLYPHEN,effect,
damaging)
<Annotate> </POLYPHEN</Annotate>
<Filter>
<POLYPHEN>
<effect>damaging </effect>
</POLYPHEN>
</ Filter >
Syntax 2
Syntax n
…
Syntax 1
Mechanism M2: Syntax Questionnaire
Design
Decision
Analysis
Implementation
Testing
Favorite
Syntax
41. Method: The Design Stage
41
3.3 Semantic
Restrictions
Integrity Constraint
when PredictedEffectF
if PredictedEffectA exists
then “ok”
else
“Variations must be annotated
with POLYPHEN predicted
effect before filtering”
Feature Model Variation Analysis
Predicted
Effect
dependency
Annotate Filter
Predicted
Effect
Dependency:
When I write filter and a list of POLYPHEN predicted effect, If
annotated with POLYPHEN predicted effect has not been written, I
will see the error “Variations must be annotated with POLYPHEN
predicted effect before filtering”
Design
Decision
Analysis
Implementation
Testing
42. Method: The Design Stage
42
3.3 Behavioral
Semantics
User Story:
As a DSL user, I want to order a filter by a list of POLYPHEN predicted
effects, so that variations can be filtered by these predicted effects
Mechanism M3: Semantic Templates
Design
Decision
Analysis
Implementation
Testing
User Story Filter Variations by predicted effect POLYPHEN
Service Identifier Ensembl Filter VEP
Source description Galaxy
Inputs Description Type Constant Value
Input File that gathers the
variations
DataFile (VCF) False -
FilterCriteria Evaluation expression
that indicates the
polyphen criteria to
filter
String False Examples: “Polyphen
is benign” “Polyphen
is possibly_damaging”
Outputs Description Type Visibility
annotated_vcf File that gathers the
annotated variations
DataFile (VCF) True
43. Method: The Implementation Stage
43
A
4.1 Test
Specification
4.2 DSL
Infrastructure
Implementation
Implementation
DSL infrastructure for using the DSL
Model-driven development MDD (design models)
&
Test-driven development TDD (tests)
Decision
Analysis
Design
Testing
45. Method: The Testing Stage
45
5.1
Demonstration
Testing
Decision
Analysis
Design
Implementation
End-users test the current DSL
release and provide feedback about
it
5.2 DSL
Infrastructure
Testing
46. Method: The Testing Stage
46
5.1
Demonstration
Usage Scenario
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Mechanism M4: Demonstration
1. Demonstration of one usage scenario
Testing
Decision
Analysis
Design
Implementation
47. Method: The Testing Stage
47
5.1
Demonstration
A
Testing
2. Description of editor help and shortcuts
Decision
Analysis
Design
Implementation
Usage Scenario
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Mechanism M4: Demonstration
48. Method: The Testing Stage
48
5.1
Demonstration
A
3. Explanation of error messages
Testing
Decision
Analysis
Design
Implementation
Usage Scenario
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Mechanism M4: Demonstration
49. Method: The Testing Stage
49
5.1
Demonstration
4. Generation of the artefacts
Testing
Decision
Analysis
Design
Implementation
Usage Scenario
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Mechanism M4: Demonstration
50. Method: The Testing Stage
50
A
5.2 DSL
Infrastructure
Testing
Questions for testing Requirements
Coverage
Did you find any erroneous step/instruction?
Did you find in the language any step that contains come erroneous aspect?
Did you miss any essential step/instruction?
Questions for testing Syntax
Expressivity
Would you add, change, remove or reorder any word of the language?
Is the language easy to understand?
Is the language intuitive to use?
Coverage Did you find a combination of words that were incorrect but they could be written with the
DSL?
Questions for testing Semantic restrictions
Expressivity Did you find any error message that you did not understand?
Coverage
Did you find a combination of constructs that were incorrect but they could by written with
the DSL?
Did you find any step that was dependent of another one but it could be written without
satisfying that dependency?
Questions for testing Behavioral Semantics
Completeness
Do you know any new software that suits better to implement a step/instruction?
Did you find any error after executing the generated artefact?
Testing
Decision
Analysis
Design
Implementation
Mechanism M5: Testing Questionnaire
51. Index
51
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future work
52. Validation
Researching experts opinion (Wieringa, 2012)
Goal: Validate whether the mechanisms proposed M1-M5 are
suitable to involve end-users in DSL development
52
53. Solution Validation
Experiment Methodology: State-of-the-art guidelines
• Experimentation in Software Engineering (Wohlin et al.): For
planning, scoping, executing and analysing data.
• The Method Evaluation Model (Moody et al.): For metrics and
measurement instruments.
Participants: 3 Geneticists from
Experiment design: One factor-one treatment
• Factor: Approach to involve end-users in DSL development
• Treatment: The set of mechanisms proposed (M1-M5)
• All subjects applied the same treatment
53
2
2Instituto de Investigación Sanitaria INCLIVA. www.incliva.es
54. Validation
Description
Response
Variables
Metric
Measurement
procedure
RQ1
Are end-users satisfied with the
feedback provided through the
involving mechanisms?
End-users’
Satisfaction
PEOU and PU
Satisfaction
Questionnaire
RQ2
Are developers satisfied with
the feedback gathered through
the involving mechanism to
build the DSL?
Developers’
Satisfaction
Comprehension
questions, degree of
agreement, and
undetected errors.
Observation, recording,
and analysis of subjects’
feedback and
anecdotes.
RQ3
How long does the application
of the mechanisms for involving
end-users take?
Time Minutes
Measurement of time
spent
54
Research questions of the experiment:
55. Validation
Experiment Procedure
55
Meeting 1 Meeting 2
Experimental Objects
Describe
domain
Select
experimental
objects
Present the
experiment
Gather details about
experimental objects
Run experiment
Apply Mechanism
Mi
EO1. Read VCF file.
EO2. Annotate variations with effect prediction.
EO3. Filter variations by effect prediction.
EO4. Annotate variations sample frequency.
56. Validation: Results
RQ1: End-users’ perspective
High satisfaction with the mechanisms: High (positive) values of PEOU and PU
RQ2: Developers’ perspective
High satisfaction with the mechanisms: Low values of comprehension question,
moderate values of degree of agreement and low values of undetected errors
56
Mechanism Limitation
M1 Small errors are easy to miss
Close relationships with other requirements easy to miss
M2 Questions about abstract syntax are not friendly to provide feedback
Set of questions not enough to reach an agreement
M3 Unclear field in the semantic template
M4 Feedback should be encouraged during demonstration not at the end
M5 Some ambiguous questions
57. Validation: Results
RQ3: Time spent
• Mechanisms M1, M3 and M5 are the hardest for end-users.
• The participation of end-users for one iteration that addresses four
DSL requirements is approximately of 2h30m.
57
58. Validation: Conclusions
Assess the satisfaction of end-users with the
mechanisms M1-M5 proposed in the method.
Found limitations that allow us to propose
improvements in the method.
Results are not statistically significant and cannot be
generalized.
Valuable opinions that originate from experts of the
industrial environment.
58
59. Index
59
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future Work
61. Index
61
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and future work
62. Conclusions: PhD Contributions
1. Analysis of the problem of involving end-users in DSL
development.
2. State-of-the-art regarding DSL development approaches
that involve end-users.
3. An innovative method to involve end-users in DSL
development.
4. A DSL for supporting genetic analysis.
5. Validation of the proposed method together with
geneticists.
62
63. Conclusions: Research Publications
Type Forum Ranking C1 C2 C3 C4 C5
Regular paper RCIS (2010) Core B - - - -
Short paper CAiSE Forum (2011) - - - - -
Book chapter CAiSE Forum Selected
Papers Springer - - - -
Short paper Bioinformatics (2011) - - - - -
Doctoral consortium RCIS (2012) Core B - -
Regular paper ENASE (2013) Core B - -
Regular paper ISD (2013) Core A - -
Oral communication CONBIOPREVAL - - - - -
Workshop paper COBI (2015) - - - -
Journal Article
(submitted)
Journal of Software and
Systems (2016) JCR - -
63
64. Conclusions: R&D Collaborations
• Analysis of the problem in a real environment
• Design of the mechanisms of the method
• Feedback about initial versions of the method
• Validation of the method through an empirical experiment
• Application of the method to develop a DSL for genetic
analysis.
64
3
3GEM Biosoft. www.gembiosoft.es
65. Conclusions: Lessons Learned
Collaboration with geneticists: Developing a DSL for a
complex application domain requires the participation of
domain experts.
Combination of MDD and agile principles allows getting
feedback from end-users and propagate this feedback to the
whole development lifecycle.
The conducted empirical experiment and application of the
method to develop the DSL for genetic analysis demonstrate
that the proposed method can be applied in practice.
65
66. Future Work
Method improvements:
• Supporting the gathering of non-functional requirements
• Supporting internal and graphical DSLs.
• Exploring semantic specification alternatives.
• Providing tool support for the method
• Detailing the Deployment and Maintenance stages
Validation of the current version of the DSL with geneticists.
66
Good morning, thanks for the presentation and thank you to the members of the tribunal for being here today for the presentation of my PhD thesis: An agile Model-Driven Method for Involving End-Users in DSL development.
This thesis has been supervised by the Dr. Óscar Pastor López and the Dr. Francisco Valverde Giromé.
This is the index I am going to follow for this presentation.
First I want give you the sense of what is this PhD about before starting in the second point to describe the details of the problem and the PhD goals.
After describing the problem, I will provide an overview of the state of the art to find whether the problem is already solved or not.
Since the problem would not be not solved (I’m providing now a heads-up), I will explain the solution that we have proposed in this PHD. Then, I will explain how we have validated the solution and how we have applied it in practice.
Finally, I will state the conclusions and the future work.
First, I want to explain what is this PhD is about. This PhD is about software languages and how they have traditionally helped us to communicate with computers.
Initially, communicating with computers meant writing programs in binary code. However, the difficulty of this task has lead us to constantly seek for better abstractions that allows us to describe more complex programs easily. First, using assembly language, and then, using general purpose language as python.
But the interest for higher abstractions is not unique among software developers. Similar procedures are applied in other domains such as genetics. In this domain, geneticists try to understand the behavior of the DNA first by codifying it with letters that represent nucleotides and aminoacides. And then, by means of pathways that represent the interactions among their chemical bases.
For software developers, the use of general purpose languages was not enough and in the seek of even higher abstractions for improving the understanding of software development, the software engineering community proposed domain-specific languages (DSLs), which are according to Van Deursen.
Examples of DSLs for software development are:
SQL, a textual DSL for describing operations over databases, and UML Activity Diagram, a visual DSL for describing the requirements of a system by means of activities to accomplish.
Traditionally, developers have been developing DSLs that facilitate their tasks while developing software. This kind of DSLs target technical/technology domains such as database management or web applications development.
However, with time, DSLs have gained interest not only among pure software developers but in experts of application domains such as seismology, genetics, or aviation. Here is where the development is a challenge.
For technical domains, the domain experts and the developers are the same, or at least, acquiring such domain knowledge is not that difficult for a software developer.
However, for application domains, domain experts are not the developers, and there is a huge knowledge gap between them.
In this PhD, we aim to reduce this knowledge gap between experts and developers while developing DSLs for complex application domains.
The motivation of this thesis originated from Industry, when we started collaborating with IMEGEN, a Valencian SME whose expertise is the analysis of genetic and genomic diagnoses.
In summary, what they have told us is “We have problems to analyze genetic data”, “We need to use state-of-the-art analytic tools”, and “We need a tool highly customizable to our needs”
As a result of this collaboration, our diagnosis is that a new genetic tools is an unsustainable solution because the domain is in constant evolution. They needed an infrastructure for customization of their genetic analysis.
Therefore a possible solution worth to explore is the creation of a DSL for supporting genetic analysis. But how can we develop this DSL if we are not experts in genetic analysis
‘casta-maisa-bol
‘castamai-saison
This problem become an academic problem, because we wanted to develop a DSL for supporting genetic analysis, but we don’t have genetic knowledge. In order to fulfill this lack, we need the collaboration of geneticists to develop this DSL, but they don’t have development knowledge.
Our diagnosis is that, we need guidance to involve geneticists in the development of this DSL
As a solution, we could follow a DSL development approach to involve end-users in DSL development.
The problem is that traditional DSL development approaches do not take into account end-users.
Once we know what is this PhD about, we provide further details about the problem addressed in this PhD.
As a consequence, the goal of this PhD is to propose a DSL development approach to involve end-users.
An approach that allows us to involve geneticists in the development of this DSL, but also, an approach that can be used by future developers to involve end-users of any other complex application domain.
Once we know the problem of the PhD and the PhD goals, we seek in the state of the art whether this problem is already solved.
While seeking in the state of the art of DSL development, we found, methodologies, guidelines, and best practices. However, few consider end-users during the development process.
From them, we selected the ones that take into account end-users and we categorized them into three categories. First, the ones that take into account the preferences of end-users during DSL development, although they do not involve them in the process. Examples of this are the work of Perez et al, and the work of Nishino. Second, the ones that apply an agile process to get early feedback from end-users . Examples of this kind are Sadilek and Barisik et al.
Third, the ones that involve end-users in the DSL development. Examples of this kind are: Wuest et al., Cho et al., Sanchez-Cuadrado et al, Canovas et al.
Only take into account end-users: Two of the approaches focus on taking into account end-users but they are not involved in the process (Perez et al. and Nishino).
Most supported stages: The rest of approaches involve end-users in some activities of the analysis and design stages.
Not supported activities: None approach involves end-users in behavioural semantics specification or in the maintenance stage.
Partial supported activities: Only one approach involves end-users in the testing stage and is only for asking about usability (Barisik et al.).
Completeness: None approach can be applied in practice from the Decision stage to the Maintenance stage.
Therefore, next we describe our proposed solution.
Our solution to fulfill the three aformentioned requirements is an agile model-driven method to involve end-users in DSL development.
The approach to build these method consists in adopting the stages and patterns for DSL development from Mernik et al., so we could build the stages and steps of the method.
The guidelines of Strembeck and Voelter et al. (focusing on MDD practices) to propose the steps and artefacts of the method.
And observing the agile practices from agile method such as XP, Scrum and Agile Modeling, in order to propose a set of mechanisms for involving end-users.
Our solution to fulfill the three aformentioned requirements is an agile model-driven method to involve end-users in DSL development.
The approach to build these method consists in adopting the stages and patterns for DSL development from Mernik et al., so we could build the stages and steps of the method.
The guidelines of Strembeck and Voelter et al. (focusing on MDD practices) to propose the steps and artefacts of the method.
And observing the agile practices from agile method such as XP, Scrum and Agile Modeling, in order to propose a set of mechanisms for involving end-users.
Our solution to fulfill the three aformentioned requirements is an agile model-driven method to involve end-users in DSL development.
The approach to build these method consists in adopting the stages and patterns for DSL development from Mernik et al., so we could build the stages and steps of the method.
The guidelines of Strembeck and Voelter et al. (focusing on MDD practices) to propose the steps and artefacts of the method.
And observing the agile practices from agile method such as XP, Scrum and Agile Modeling, in order to propose a set of mechanisms for involving end-users.
Following the agile practices “iteration planning” and “incremental design”, we organized the development process as an iterative cycle made of the stages Analysis Design, Implementation and Testing.
Decision is left outside the cycle because the decisión to develop a DSL is only addressed once. Deployment and maintenance are also outside the cycle because they are only addressed when there is an stable versión of the DSL that is worth to try by the end-users.
This could be a posible structure to specify this trip. A set of Language constructs that the travel agent can specify so that the system books the hotel, the restaurant, and the tickets for the museum and the disco.
We are going to use this example to explain the stages, steps, artefacts, and mechanisms for involving end-users of the method proposed.
Once we have decided to develop this DSL, in the analysis stage, we must understand the domain and make end-users knowledge explicit.
First, in the iteration planning, we must plan what requirements to address in the current iteration.
In order to keep track of the requirements during the development we use a product backlog, which organizes the requirements so that end-users can check the current state of the DSL anytime.
For the example, in the current iteration we place the booking of an hotel by location, a restaurant by name, museum tickets and disco tickets.
Once we have decided to develop this DSL, in the analysis stage, we must understand the domain and make end-users knowledge explicit.
First, in the iteration planning, we must plan what requirements to address in the current iteration.
In order to keep track of the requirements during the development we use a product backlog, which organizes the requirements so that end-users can check the current state of the DSL anytime.
For the example, in the current iteration we place the booking of an hotel by location, a restaurant by name, museum tickets and disco tickets.
Taking into account the product backlog of the current iteration, in the step “DSL Requirements Specification” the developers and the end-users collaborate to describe:
User Stories, which describe how an end-user with a role needs to play an action to achieve a goal. For the user story “Book hotel by Name”, the role is the travel agent, the action is to book a hotel close to a certain location of a range of days and the goal is to achieve that clients have accommodation close to their preferred location during their holidays.
Acceptance Tests, which describe real examples of this need, how an end-user with a role, given an specific input context, executes an action and obtains a response,
Usage Scenarios, which describe an example of a set of these needs together.
These three elements are agile practices to describe end-user requirements in an easy way closer to end-users: using natural Language and a predefined structure. These two templates represent the first mechanism (M1) for involving end-users.
These templates describe the requirements of the end-users nor language requirements. Since domain experts are not language developers, we cannot ask them to describe language requirements.
Instead, the developers must obtain the language requirements that derive from the previous described end-user requirements. In order to describe them we use the same template but fulfilling the information related with language concerns.
For the example, we can see that the user story is…. In the previous slide, we described the need to book an hotel, but now we are describing the need to describe how to book an hotel.
The same applies for the acceptance tests and the usage scenarios.
Once we have obtained the language requirements, in the domain modeling step we make explicit this knowledge by means of a domain model made of a feature model, a concepts model, and a vocabulary.
In order to obtain these models, we use the user stories.
For the feature model, we used the action of the user story to create a feature in the feature model. This feature is a summary of this action.
For the concepts model, we create a concept for each domain concept that is both in the action and the goal of the user story.
Next, we create the relationships between the feature model and the concepts model by observing those concepts that were obtained from the action of the user story.
For the vocabulary, end-users collaborate to define each of the concepts obtained.
After the analysis stage, in the Design stage we must create the artefacts that specify the syntax and semantics of the DSL.
According to the patterns of Mernik et al., we must decide between using an internal or external approach for the syntax design. Internal means to use an existing language to build the new DSL while external means to create a new language from scratch.
In order to make this decision, we propose a set of questions to ask end-users about their preferences (Fowler). These questions ask end-users whether they know an existing language, whether they need to use existing programming libraries, whether they need any kind of freedom for the new syntax, and whether they mind to learn a new language. Also, we ask them about which of these aspects are more important for them
With their responses, the developers decided between an internal or a external solution to design the DSL.
From this point, the method only supports external DSLs. As we already explained in the document, this decision was driven by the context of the PhD in regards to the DSL for genetic analysis.
After the analysis stage, in the Design stage we must create the artefacts that specify the syntax and semantics of the DSL.
According to the patterns of Mernik et al., we must decide between using an internal or external approach for the syntax design. Internal means to use an existing language to build the new DSL while external means to create a new language from scratch.
In order to make this decision, we propose a set of questions to ask end-users about their preferences (Fowler). These questions ask end-users whether they know an existing language, whether they need to use existing programming libraries, whether they need any kind of freedom for the new syntax, and whether they mind to learn a new language. Also, we ask them about which of these aspects are more important for them
With their responses, the developers decided between an internal or a external solution to design the DSL.
From this point, the method only supports external DSLs. As we already explained in the document, this decision was driven by the context of the PhD in regards to the DSL for genetic analysis.
Disjoint specialization: When feature childs are single choice
Overlapping specialization: When feature childs are multiple choice
Once we have decided the approach to design the DSL, we must specify the abstract and concrete syntax.
In order to design the syntax we build first the abstract syntax metamodel. To build this model have proposed a set of guidelines that project the information from the analysis models into entities and relationships of the abstract syntax metamodel.
After designing the abstract syntax metamodel, we must design the concrete syntax.
In order to design this syntax we use one usage scenario for the analysis to propose several syntaxes.
Once we have designed these syntaxes, we handle a questionnaire to end-users to ask about which is their preferred one, if they would prefer to propose a new one, and also to provide feedback about the specific structure.
From their responses, the developers select the favorite syntax.
This questionnaire is the mechanism M2 proposed to involve end-usres.
Once we know the end-user preferences, we build the concrete syntax grammar that describes the syntax chosen for end-users.
In order to build this grammar, we propose a set of guidelines that project the information of the abstract syntax metamodel into production rules of the grammar.
The next step is to design the semantic restrictions.
In order to design them we propose to describe them as restrictions in the abstract syntax metamodel and in the concrete syntax grammar.
In order to create them. We have propose a set of guidelines to project the information of the acceptance tests and the feature model into an integrity constraint of the abstract syntax metamodel. We describe them using a when-if-else structure.
Finally, in order to describe behavioral semantics, we propose to describe them by means of services.
First, according to the architectural envisioning agile practices, together with end-users it is defined a technological strategy.
Then, we propose that end-users and developers collaborate to specify these services in the implementation target chosen.
To do so, for each of the user story of the analysis we propose to create the following template.
For the example, imagine that we have chosen scripts in Unix as the technological implementation strategy.
For the user story “book hotel by location”, we specify that the service identifier is BookAccommodation.pl, which is a desktop program written in perl provided by Booking.com.
Then, together with the travel agents, we describe the inputs and outputs. Inputs are defined with the fields, description, type, constast, and value. For example, this service has an input that is name type, which is the type of accommodation, its type is an enumeration, and in this case, since we want to book an hotel, it will be always constant and its value will be “Hotel”
As output, this service shows a message with the confirmation about the booking, which we indicate that must be shown by indicating yes in its visibility.
This template is the mechanism M3 for involving end-users.
Once the design artefacts have been created, in the implementation stage, the developers create the complete infrastructure that supports the usage of the DSL, that is, understands specifications written using the DSL and obtains a set of generated artefacts that implement the behavior corresponding to that specification.
The DSL infrastructure is made by a Parser, a Validator and a Generator.
In order to implement these elements, we will apply both MDD using the design models and the agile practice TDD. Hence, the first step of this stage is to specify the set of tests that will be used for the TDD approach.
For the TDD we will create three type of tests: syntax tests that check the parser, semantic tests that check the validator and the generator, and target platform tests that check the correctness of the generated artefacts.
First, we create the syntax tests, which are responsable to check if the parser is able to understand specifications writen using the syntax.
Initially, these tests were supposed to be used to generate the parser applying TDD, however, we found a tecnological approach that already implemented the parser applying MDD from the abstract syntax metamodel and the concrete syntax grammar. This approach implements both models, and runs automatically a generation engine in order to create the source code of the parser.
Once this infrastructure is implemented, the end-users test the current DSL release and provide feedback about it.
Once this infrastructure is implemented, the end-users test the current DSL release and provide feedback about it.
In order to fulfill this main goal, we must accomplish four objectives.
First, we create the syntax tests, which are responsable to check if the parser is able to understand specifications writen using the syntax.
Initially, these tests were supposed to be used to generate the parser applying TDD, however, we found a tecnological approach that already implemented the parser applying MDD from the abstract syntax metamodel and the concrete syntax grammar. This approach implements both models, and runs automatically a generation engine in order to create the source code of the parser.
Then, we create the Semantic Tests, which test that the semantic restrictions are checked and the corresponding errors arise when these restrictions are violated.
In order to implement the validator, we apply TDD with these tests. We run all tests, and if some test does not suceed, we chose them, and program the corresponding validation rules that makes the tests to suceed. When all the tests succeed it means that the validator source code is complete.
And finally, Target platform tests, check the behavior of the generated artefacts according to the needs expressed by the end-users.
In order to implement these fragments, we apply TDD with these tests. We run all tests, and if some test does not suceed, we chose them, and program the corresponding fragments that makes the tests to suceed. In order to program this source code weuse the semantic templates. When all the tests succeed the set of fragments is complete.
And finally, Target platform tests, check the behavior of the generated artefacts according to the needs expressed by the end-users.
In order to implement these fragments, we apply TDD with these tests. We run all tests, and if some test does not suceed, we chose them, and program the corresponding fragments that makes the tests to suceed. In order to program this source code weuse the semantic templates. When all the tests succeed the set of fragments is complete.