Involving End-Users in DSL Development for Genetic Analysis

Centro de Investigación ProS
An Agile Model-Driven Method for Involving
End-Users in DSL Development
MªJosé Villanueva del Pozo
PhD Thesis, 8th of January of 2016
Advisors:
Dr. Óscar Pastor López
Dr. Francisco Valverde Giromé

Index
2
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future Work

Motivation
3
Binary Code Assembly
+_ Abstraction Level
Software languages help us to communicate with computers

Motivation
4
DNA
+_ Abstraction Level
Nucleotides & aminoacids Pathway
A similar abstraction procedure is applied in other domains

Motivation
5
“Software languages that target small domains and whose
language constructs are formed by domain concepts”
Villanueva, 2016
Are a solution for improving understanding in
software development
“Small languages that offer expressive power focused on a
particular problem domain”
Van Deursen, 2000

Motivation
 Examples of DSLs for software development:
6
Activity Diagram
(Visual)
(Textual)

Motivation
7
Domain Experts
ARE
Developers
Domain Experts
ARE NOT
Developers
GeneticsSeismology
APPLICATION DOMAINS
Big Data
management
Web
development
TECHNICAL DOMAINS

Motivation
 Industrial motivation from
 “We have challenges to analyse genetic data”
 “We require to use state-of-the-art analytic tools”
 “We need a tool highly customizable to each diagnosis”
8 1Instituto de Medicina Genómica. www.imegen.es
A unique tool is an unsustainable solution. They need an
infrastructure to continuously evolve their genetic analysis
A DSL for specifying genetic analysis
We are not experts neither genetics nor bioinformatics
1

Motivation
 Academic motivation
 “We want to develop a DSL for supporting genetic analysis”
 “We don’t have enough knowledge about genetics”
 “The collaboration of geneticists is essential”
 “Geneticists don’t have enough development knowledge”
9
We require to involve geneticists in the DSL development process
Follow a DSL development method to involve end-users
Current approaches do not take end-users into account

Index
10
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration

PhD Goals
 Propose a DSL development approach to involve end-users
1. To support complex application domains
2. A DSL for supporting genetic analysis
11
1
+
2

PhD Goals: Research Questions
RQ1. Is it essential to involve end-users in the development of
a DSL for a complex application domain?
→ Analyse a complex application domain and Illustrate the need
to involve end-users in DSL development
RQ2. Which are the available approaches to involve end-users
in DSL development?
→ Analyse state-of-the-art DSL development approaches that
involve end-users
12

PhD Goals: Research Questions
RQ3. How can we provide a methodological approach to
involve end-users in DSL development?
→ Propose a new method to involve end-users in DSL
development
RQ4. How can we validate that the solution proposed is a
suitable solution to involve end-users in DSL development?
→ Validate the proposed method and apply it with geneticists
13

Index
14
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demo

State of the art
1. Foundations DSL development: Methodologies, guidelines, and
best practices
• Van Deursen et al. (2000): Terminology
• Spinellis (2001): Design patterns for DSL development
• Mernik et al.(2005): Stages for DSL development
• Voelter et al. (2008): Conceptual foundations, design and
implementation of DSLs
• Strembeck et al.(2009): systematic approach for guiding DSL developers
15

State of the art
2. DSL development approaches that take end-users into
account:
• Take into account end-user preferences during
development
• Apply an agile process to gather early feedback from end-
users
• Involve end-users in development activities
16

State of the art
1. Perez et al. (2011): Best practices from EUD
2. Nishino (2011): Cognitive dimensions and feature heuristics
3. Barisik et al. (2012): Goal-question-metric approach
4. Wuest et al. (2013): Sketching environment
5. Cho et al. (2012): Sketches, shape selection, and questions
6. Kuhrman et al. (2013): A DSL that uses sketches and views
7. Sanchez-Cuadrado et al. (2012): Sketches
8. Canovas et al. (2013): A collaborative infrastructure
17

State of the art
Analysis criteria:
 Methodological support: All stages of DSL development
lifecycle are addressed
 End-user involvement: Whether end-users are involved in
DSL development tasks and whether best practices from EUD
domain are applied
18

State of the art
19
Stage Activity Criteria 1 2 3 4 5 6 7 8
Analysis
Domain Analysis
Support S x x S S S S x
EU Inv x x x S S S S x
Domain Model Specification
Support PS x x x x x x x
EU Inv x x x x x x x x
Design
Abstract Syntax Specification
Support S x x S S S S S
EU Inv x x x S S x x S
Concrete Syntax Specification
Support S x x S S S x S
EU Inv x x x S S S x S
Semantic Restrictions
Specification
Support x x x PS S S x x
EU Inv x x x x S x x x
Behavioral Semantics
Specification
Support S x x x x x x x
Testing DSL infrastructure testing
Support x PS PS x x x x x
EU Inv x x PS x x x x x
Maintenance New requirements addition
Support x x x x x x x x
S: Supported PS: Partially Supported X: Not supported

State of the Art
 Need for a proposal to fulfil the following requirements:
• Requirement 1: Guidance throughout the complete DSL
development life-cycle.
• Requirement 2: Feasibility of the DSL development time.
• Requirement 3: Gathering domain experts’ knowledge in
all the stages in which they can collaborate.
20

Index
21
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration

Method: Foundations
 An agile model-driven method for involving end-users
22
Proposed in
Mernik et al. (2005)

Method: Foundations
 Combination of MDD and Agile practices:
• Efficiency to the process
• Interface for end-users to provide feedback about certain
DSL artefacts
• Propagation of end-users’ feedback along the different DSL
artefacts
23

Method: Foundations
 Combination of MDD and Agile practices
24
Conceptual modelling Model transformations
Iterative
development
User stories
TDD
Scenarios
Product Backlog
Architectural
Envisioning Acceptance
Tests

Method: Overview
25
Iteration planning & Incremental design

Method: Illustrative Example
Diagnose Diabetes Mellitus Type 2 (Analysis 1)
Read Variations genotypes from VCF file Patient1.vcf
Annotate Variations with gene, transcripts, polyphen
Filter Variations by genes {ABCC8, CAPN10, KCNJ11, … ,
GPD2, MNTR1B}
Filter Variations by predicted effect polyphen damaging
Report Variations with gene, predicted_effect
26

Method: The Analysis Stage
27
A
2.1 Iteration
Planning
Understand the domain and make
domain knowledge explicit
Decision
Design
Implementation
Testing
Analysis
2.2 DSL
Requirements
Specification
2.3 Domain
Modelling

28
A
2.1 Iteration
Planning
Product Backlog
Classification Requirements
Previous Iterations (Done)
Annotate Variations with Gene
Filter Variations by Gene
Report Variations’ Properties
Report Variations’ Gene
Current Iteration (To do)
Read Genotypes of several samples from a VCF File
Annotate Variations with Transcripts Names
Annotate Variations with POLYPHEN predicted effect
Filter Variations by POLYPHEN predicted effect
Report Variation’s POLYPHEN predicted effect
Decision
Design
Implementation
Testing
Analysis

29
Decision
Design
Implementation
Testing
2.2 DSL
Requirements
Specification
Analysis
User Story Filter Variations by Polyphen predicted effect
Description As a geneticist, I want to filter the sample’s variations by the predicted effect by
POLYPHEN (probably_damaging, possibly_damaging, benign), so that I can see only
the variations that pass the filter”
Role Mandatory Action Goal
Geneticist No Filter sample’s variations by a set of
POLYPHEN predicted effects (benign,
possibly_damaging, probably_damaging)
Seeing only the
variations that pass
the filter
Acceptance Test AT1
Description As a geneticist, given the variations chr2:g.136438366A>G {}, chr11:g.111959693G>T
{probably damaging}, chr17:g.41245471C>T {benign}, when I filter the variations by
the POLYPHEN predicted effect possibly damaging I will see the variation
chr11:g.111959693G>T
Role Input Action Response
Geneticist chr2:g.136438366A>G {}
chr11:g.111959693G>T {probably
damaging}
chr17:g.41245471C>T {benign}
Filter by
POLYPHEN
damaging
chr11:g.111959693G>T
{probably damaging}
Mechanism M1: End-User requirement templates
User Story: Need of end-users
Acceptance Test: Real example of this need

30
Usage Scenario Usage Scenario Diabetes Mellitus Type 2 (Analysis 1)
Description In order to research the diabetes mellitus type 2 disease:
I want to read the genotypes of several samples from a VCF file.
I want to annotate the variations with their genes, with all the names of the
transcripts that they hit, and the score and predicted effect of POLYPHEN.
I want to filter the variations by the diabetes genes “ABCC8, CAPN10,KCNJ11,
GCGR, SLC2A2, HNF4A, INS, INSR, PPARG, TCFl2, ADIPOQ, AKT2, PAX4,
MAPK81p1, GPD2, MNTR1B”, and by “possibly damaging” or “probably
damaging” variations according to POLYPHEN.
I want to create a report with the variations main properties, their genes,
their transcript names, and their POLYPHEN predictions.
Decision
Design
Implementation
Testing
Analysis
Dependency DP1
Description When I filter variations by POLYPHEN predicted effects, if variations have not been
annotated with POLYPHEN predicted effect, I will see the error “Variations must be
annotated with POLYPHEN predicted effect before filtering”
Precondition Action Error Message
Annotate variations with
POLYPHEN predicted effect
Filter variations by a set
of POLYPHEN predicted
effects
“Variations must be annotated with
POLYPHEN predicted effect before
filtering”
2.2 DSL
Requirements
Specification
Dependency: Relationships between user stories
Usage Scenario: Real example of several user stories
Mechanism M1: End-User requirement templates

31
User Story Filter by Polyphen predicted effect
Description As DSL user, I want to order a filter by a list of POLYPHEN predicted effects, so that
variations can be filtered by these predicted effects
DSL user No Write Filter and a list of
POLYPHEN predicted effects
Variations can be filtered by
these predicted effects
Acceptance Test
Dependency
Usage Scenarios
Decision
Design
Implementation
Testing
Analysis
Mechanism M1: Language requirement templates
2.2 DSL
Requirements
Specification
User Story Filter Variations by Polyphen predicted effect
Description As a geneticist, I want to filter the sample’s variations by the predicted effect by
POLYPHEN (probably_damaging, possibly_damaging, benign), so that I can see only
the variations that pass the filter”
Geneticist No Filter sample’s variations by a set of
POLYPHEN predicted effects (benign,
possibly_damaging, probably_damaging)
Seeing only the
variations that pass
the filter

Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
As a DSL user, I want to order a filter by a list of POLYPHEN predicted effects,
so that variations can be filtered by these predicted effects
32
A
2.3 Domain
Modelling
Feature Model Concepts Model Vocabulary
Decision
Design
Implementation
Testing
Analysis
ACTION

User Story
Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
33
2.3 Domain
Modelling
Decision
Design
Implementation
Testing
Analysis
ACTION AND GOAL
Variation
Predicted
Effect
Algorithm name
Effect

Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
34
2.3 Domain
Modelling
Decision
Design
Implementation
Testing
Analysis
ACTION
relationship
Variation
Predicted
Effect
Algorithm name
Effect
*

Filter
Gene Predicted
effect
Annotate
Predicted
effect
Variation Analysis
Genetic Analysis
User Story
35
2.3 Domain
Modelling
Decision
Design
Implementation
Testing
Analysis
Variation
Predicted
Effect
Algorithm name
Effect
Variation: Each of the
nucleotides that the
sample has different in
regards to a reference
sequence
Predicted Effect: Result
of the execution of a
prediction algorithm that
assesses the effect of the
variation in an individual

Method: The Design Stage
36
3.1 Syntax
Preferences
Design
Artefacts that specify syntax and semantics
Decision
Analysis
Implementation
Testing
3.2 Abstract and
Concrete Syntax
3.3 Semantic
Restrictions
3.3 Behavioral
Semantics

37
3.1 Syntax
Preferences
Design
Decision to be made: Internal vs External
Decision
Analysis
Implementation
Testing
Existing
Language?
1
Programming
Libraries
2
Learn New
Language
4
Syntax
Freedom
3

38
A
3.2 Abstract and
Concrete Syntax
Feature
Model
Concepts
Model
Relationships
Abstract Syntax Metamodel
*
PredictedEffectF
Predicted Effect
AlgorithmName
Effect
Filter
Gene
*
disjoint
Filter
Gene Predicted
effect
Variation
Predicted
Effect
Algorithm name
Effect
Design
Decision
Analysis
Implementation
Testing

39
3.2 Abstract and
Concrete Syntax
Usage Scenario:
As a DSL user, I want to annotate the variations with their predicted effect of POLYPHEN and I
want to filter the variations by the polyphen predicted effect damaging
Annotate Variations with POLYPHEN
Filter Variations by predicted effect
POLYPHEN damaging
GeneticAnalysis.Annotation(POLYPHEN)
GeneticAnalysis.Filter(POLYPHEN,effect,
damaging)
<Annotate> </POLYPHEN</Annotate>
<Filter>
<POLYPHEN>
<effect>damaging </effect>
</POLYPHEN>
</ Filter >
Syntax 2
Syntax n
…
Syntax 1
Mechanism M2: Syntax Questionnaire
Design
Decision
Analysis
Implementation
Testing
Favorite
Syntax

40
3.2 Abstract
and Concrete
Syntax
Abstract Syntax Model Concrete Syntax Grammar
Design
Decision
Analysis
Implementation
Testing
Filter:= “Filter variations by” Gene |PredictedEffectF
Gene:= “Gene”
PredictedEffectF:= “Predicted effect” PredictedEffect*
PredictedEffect:= ALGORITHM EFFECT
Enum ALGORITHM= POLYPHEN
Enum EFFECT= Damaging

41
3.3 Semantic
Restrictions
Integrity Constraint
when PredictedEffectF
if PredictedEffectA exists
then “ok”
else
“Variations must be annotated
with POLYPHEN predicted
effect before filtering”
Feature Model Variation Analysis
Predicted
Effect
dependency
Annotate Filter
Predicted
Effect
Dependency:
When I write filter and a list of POLYPHEN predicted effect, If
annotated with POLYPHEN predicted effect has not been written, I
will see the error “Variations must be annotated with POLYPHEN
predicted effect before filtering”
Design
Decision
Analysis
Implementation
Testing

42
3.3 Behavioral
Semantics
User Story:
As a DSL user, I want to order a filter by a list of POLYPHEN predicted
effects, so that variations can be filtered by these predicted effects
Mechanism M3: Semantic Templates
Design
Decision
Analysis
Implementation
Testing
User Story Filter Variations by predicted effect POLYPHEN
Service Identifier Ensembl Filter VEP
Source description Galaxy
Inputs Description Type Constant Value
Input File that gathers the
variations
DataFile (VCF) False -
FilterCriteria Evaluation expression
that indicates the
polyphen criteria to
filter
String False Examples: “Polyphen
is benign” “Polyphen
is possibly_damaging”
Outputs Description Type Visibility
annotated_vcf File that gathers the
annotated variations
DataFile (VCF) True

Method: The Implementation Stage
43
A
4.1 Test
Specification
4.2 DSL
Infrastructure
Implementation
Implementation
DSL infrastructure for using the DSL
Model-driven development MDD (design models)
&
Test-driven development TDD (tests)
Decision
Analysis
Design
Testing

44
Acceptance Tests
(Language
requirements)
Implementation
Decision
Analysis
Design
Testing
Semantic
Restrictions
4.1 Test
Specification
Acceptance Tests
(End-user
requirements)
Semantic
Templates
Target platform
code fragments

Method: The Testing Stage
45
5.1
Demonstration
Testing
Decision
Analysis
Design
Implementation
End-users test the current DSL
release and provide feedback about
it
5.2 DSL
Infrastructure
Testing

46
5.1
Demonstration
Usage Scenario
Mechanism M4: Demonstration
1. Demonstration of one usage scenario
Testing
Decision
Analysis
Design
Implementation

47
5.1
Demonstration
A
Testing
2. Description of editor help and shortcuts
Decision
Analysis
Design
Implementation
Usage Scenario

48
5.1
Demonstration
A
3. Explanation of error messages
Testing
Decision
Analysis
Design
Implementation
Usage Scenario

49
5.1
Demonstration
4. Generation of the artefacts
Testing
Decision
Analysis
Design
Implementation
Usage Scenario

50
A
5.2 DSL
Infrastructure
Testing
Questions for testing Requirements
Coverage
Did you find any erroneous step/instruction?
Did you find in the language any step that contains come erroneous aspect?
Did you miss any essential step/instruction?
Questions for testing Syntax
Expressivity
Would you add, change, remove or reorder any word of the language?
Is the language easy to understand?
Is the language intuitive to use?
Coverage Did you find a combination of words that were incorrect but they could be written with the
DSL?
Questions for testing Semantic restrictions
Expressivity Did you find any error message that you did not understand?
Coverage
Did you find a combination of constructs that were incorrect but they could by written with
the DSL?
Did you find any step that was dependent of another one but it could be written without
satisfying that dependency?
Questions for testing Behavioral Semantics
Completeness
Do you know any new software that suits better to implement a step/instruction?
Did you find any error after executing the generated artefact?
Testing
Decision
Analysis
Design
Implementation
Mechanism M5: Testing Questionnaire

Index
51
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and Future work

Validation
 Researching experts opinion (Wieringa, 2012)
 Goal: Validate whether the mechanisms proposed M1-M5 are
suitable to involve end-users in DSL development
52

Solution Validation
 Experiment Methodology: State-of-the-art guidelines
• Experimentation in Software Engineering (Wohlin et al.): For
planning, scoping, executing and analysing data.
• The Method Evaluation Model (Moody et al.): For metrics and
measurement instruments.
 Participants: 3 Geneticists from
 Experiment design: One factor-one treatment
• Factor: Approach to involve end-users in DSL development
• Treatment: The set of mechanisms proposed (M1-M5)
• All subjects applied the same treatment
53
2
2Instituto de Investigación Sanitaria INCLIVA. www.incliva.es

Validation
Description
Response
Variables
Metric
Measurement
procedure
RQ1
Are end-users satisfied with the
feedback provided through the
involving mechanisms?
End-users’
Satisfaction
PEOU and PU
Satisfaction
Questionnaire
RQ2
Are developers satisfied with
the feedback gathered through
the involving mechanism to
build the DSL?
Developers’
Satisfaction
Comprehension
questions, degree of
agreement, and
undetected errors.
Observation, recording,
and analysis of subjects’
feedback and
anecdotes.
RQ3
How long does the application
of the mechanisms for involving
end-users take?
Time Minutes
Measurement of time
spent
54
 Research questions of the experiment:

Validation
 Experiment Procedure
55
Meeting 1 Meeting 2
Experimental Objects
Describe
domain
Select
experimental
objects
Present the
experiment
Gather details about
experimental objects
Run experiment
Apply Mechanism
Mi
EO1. Read VCF file.
EO2. Annotate variations with effect prediction.
EO3. Filter variations by effect prediction.
EO4. Annotate variations sample frequency.

Validation: Results
RQ1: End-users’ perspective
 High satisfaction with the mechanisms: High (positive) values of PEOU and PU
RQ2: Developers’ perspective
 High satisfaction with the mechanisms: Low values of comprehension question,
moderate values of degree of agreement and low values of undetected errors
56
Mechanism Limitation
M1 Small errors are easy to miss
Close relationships with other requirements easy to miss
M2 Questions about abstract syntax are not friendly to provide feedback
Set of questions not enough to reach an agreement
M3 Unclear field in the semantic template
M4 Feedback should be encouraged during demonstration not at the end
M5 Some ambiguous questions

Validation: Results
RQ3: Time spent
• Mechanisms M1, M3 and M5 are the hardest for end-users.
• The participation of end-users for one iteration that addresses four
DSL requirements is approximately of 2h30m.
57

Validation: Conclusions
 Assess the satisfaction of end-users with the
mechanisms M1-M5 proposed in the method.
 Found limitations that allow us to propose
improvements in the method.
 Results are not statistically significant and cannot be
generalized.
 Valuable opinions that originate from experts of the
industrial environment.
58

Index
59
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration

Index
61
1. PhD Motivation
2. PhD Goals
3. State of the Art
4. Method
5. Validation
6. Demonstration
7. Conclusions and future work

Conclusions: PhD Contributions
1. Analysis of the problem of involving end-users in DSL
development.
2. State-of-the-art regarding DSL development approaches
that involve end-users.
3. An innovative method to involve end-users in DSL
development.
4. A DSL for supporting genetic analysis.
5. Validation of the proposed method together with
geneticists.
62

Conclusions: Research Publications
Type Forum Ranking C1 C2 C3 C4 C5
Regular paper RCIS (2010) Core B  - - - -
Short paper CAiSE Forum (2011) -  - - - -
Book chapter CAiSE Forum Selected
Papers Springer  - - - -
Short paper Bioinformatics (2011) -  - - - -
Doctoral consortium RCIS (2012) Core B    - -
Regular paper ENASE (2013) Core B -    -
Regular paper ISD (2013) Core A -    -
Oral communication CONBIOPREVAL - - - -  -
Workshop paper COBI (2015) -   - - -
Journal Article
(submitted)
Journal of Software and
Systems (2016) JCR -  -  
63

Conclusions: R&D Collaborations
• Analysis of the problem in a real environment
• Design of the mechanisms of the method
• Feedback about initial versions of the method
• Validation of the method through an empirical experiment
• Application of the method to develop a DSL for genetic
analysis.
64
3
3GEM Biosoft. www.gembiosoft.es

Conclusions: Lessons Learned
 Collaboration with geneticists: Developing a DSL for a
complex application domain requires the participation of
domain experts.
 Combination of MDD and agile principles allows getting
feedback from end-users and propagate this feedback to the
whole development lifecycle.
 The conducted empirical experiment and application of the
method to develop the DSL for genetic analysis demonstrate
that the proposed method can be applied in practice.
65

Future Work
 Method improvements:
• Supporting the gathering of non-functional requirements
• Supporting internal and graphical DSLs.
• Exploring semantic specification alternatives.
• Providing tool support for the method
• Detailing the Deployment and Maintenance stages
 Validation of the current version of the DSL with geneticists.
66

69
Concrete Syntax
Grammar
Parser (MDD)
Implementation
Decision
Analysis
Design
Testing
4.2 DSL
Infrastructure
Implementation
Abstract Syntax
Metamodel

4.2 DSL
Infrastructure
Implementation
Validator (TDD)
Implementation
Decision
Analysis
Design
Testing

71
4.2 DSL
Infrastructure
Implementation
Target platform rragments (TDD)
Semantic Templates
Implementation
Decision
Analysis
Design
Testing

72
4.2 DSL
Infrastructure
Implementation
Code Generator (MDD and TDD)
Implementation
Decision
Analysis
Design
Testing
Abstract Syntax
Metamodel

Involving End-Users in DSL Development for Genetic Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (20)

Similar to Involving End-Users in DSL Development for Genetic Analysis

Similar to Involving End-Users in DSL Development for Genetic Analysis (20)

Involving End-Users in DSL Development for Genetic Analysis

Editor's Notes