Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Creating abstractions from scientific workflows: PhD symposium 2015
1. Date: 04/05/2015
Creation of abstractions
in scientific workflows
Daniel Garijo Verdejo,
Oscar Corcho,
Yolanda Gil
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
2. 2
Overview: In Silico Scientific workflows
Benefits:
•Sharing and reusing previous work
•Time savings: reexecution of old experiments with different parameters).
•Teaching: new students can learn existing methods in the lab
•Design for modularity, so others can reuse
•Design for standardization, reduction of heterogeneity
•Debugging of executions
•Paper writing, linking execution pipelines to publications.
•Reproducibility.
•Etc.
Lab book
Digital Log
Laboratory Protocol
(recipe)
Workflow
Experiment
3. Hypotheses
Scientific workflow repositories can be mined automatically
to extract reusable patterns and abstractions that are
useful for workflow developers aiming to reuse existing
workflows.
•H1: It is possible to define common domain independent
patterns based on the functionality of workflow steps.
•H2: It is possible to detect common reusable patterns
automatically.
•H3: Common reusable patterns are potentially useful for users
3
4. Challenges
•Workflow representation
•Heterogeneous representations.
•Lack of a standard
•Lack of methodologies for publishing workflows.
•Workflow abstraction
•There are no catalogs of the typical abstractions that can be found in
scientific workflows based on their basic step functionality.
•Difficulty in relating workflows.
•Workflow reuse
•Difficult to determine which parts of a workflow could be reused for /in
another workflow
•Workflow annotation and documentation
•Manual process
4
6. Vocabularies and methodologies for representing and publishing workflows
6
Interactive
Browsing
(Pubby frontend)
Programatic access
(external apps)
Wings workflow
generation
OPM/PROV
conversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow
Template
Workflow
Instance
PROV
export
Linked
Data
Publication
Users
Other
workflow
environments
RDF
TripleStore
Workflow Provenance
Workflow Plan
Methodology for workflow publishing
Repository of linked workflows:
http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston,
2012.
7. Definition of workflow abstractions
7
Catalog of common independent
workflow abstractions (motifs)
Data-oriented motifs: What kind of
manipulations does the workflow
have?
Workflow-oriented motifs: How does
the workflow perform its operations
Analysis from 260 different workflows
from 10 domains analyzed belonging
to 5 different workflow systems
http://purl.org/net/wf-motifs#
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific
workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
8. Finding and evaluating common abstractions
8
https://github.com/dgarijo/FragFlow
http://purl.org/net/wf-fd
Graph mining techniques
Workflow fragment
representation
and linkage
Workflow fragment
Filtering techniques
Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th
IEEE International Conference on e-Science, Guaruja, 2014
9. Evaluation and results
9
Scientific workflow repositories can be mined automatically to extract reusable patterns
and abstractions that are useful for workflow developers aiming to reuse existing
workflows.
•Evaluation 1: Comparison against what users defined in the corpus
•Are our patterns similar to what you identified as a useful pattern?
•When playing with the pattern frequency, up to 75% of the detected
patterns are the same as the ones defined by users.
•Evaluation 2: User survey
•From those patterns we found disjoint with the user defined ones, are they
useful?
•66%-100% of the proposed patterns were considered useful
•Survey on three corpora.
10. Summary
10
•Workflow representation
•Models based on standards for representing workflow provenance and
workflow templates
•Adapted a common used methodology for publishing workflows as web
objects.
•Workflow abstraction
•Defined a catalog of common domain independent abstractions, based on
their functionality.
•Provided an ontology for semi-automatic annotation.
•Workflow reuse
•Automatic detection and annotation of common useful patterns given a
workflow corpora.
•Models to relate how patterns link and relate different workflows on a
workflow corpus.
11. 11
Collaborators and co-authors
•Daniel Garijo, Oscar Corcho
Ontology Engineering Group, UPM
•Yolanda Gil
Information Sciences Institute, USC
•Boris A. Gutman, Ivo D. Dinov, Paul ThompsonArthur W. Toga,
Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad.
USC Laboratory of Neuro Imaging
IEEE eScience 2014. Guarujá, Brasil
•Pinar Alper, Khalid Belhajjame, Carole Goble
Editor's Notes
Explain the context: what are scientific workflows and their benefits