SlideShare a Scribd company logo
1 of 11
Date: 04/05/2015
Creation of abstractions
in scientific workflows
Daniel Garijo Verdejo,
Oscar Corcho,
Yolanda Gil
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
2
Overview: In Silico Scientific workflows
Benefits:
•Sharing and reusing previous work
•Time savings: reexecution of old experiments with different parameters).
•Teaching: new students can learn existing methods in the lab
•Design for modularity, so others can reuse
•Design for standardization, reduction of heterogeneity
•Debugging of executions
•Paper writing, linking execution pipelines to publications.
•Reproducibility.
•Etc.
Lab book
Digital Log
Laboratory Protocol
(recipe)
Workflow
Experiment
Hypotheses
Scientific workflow repositories can be mined automatically
to extract reusable patterns and abstractions that are
useful for workflow developers aiming to reuse existing
workflows.
•H1: It is possible to define common domain independent
patterns based on the functionality of workflow steps.
•H2: It is possible to detect common reusable patterns
automatically.
•H3: Common reusable patterns are potentially useful for users
3
Challenges
•Workflow representation
•Heterogeneous representations.
•Lack of a standard
•Lack of methodologies for publishing workflows.
•Workflow abstraction
•There are no catalogs of the typical abstractions that can be found in
scientific workflows based on their basic step functionality.
•Difficulty in relating workflows.
•Workflow reuse
•Difficult to determine which parts of a workflow could be reused for /in
another workflow
•Workflow annotation and documentation
•Manual process
4
Approach
5
Vocabularies and methodologies for representing and publishing workflows
6
Interactive
Browsing
(Pubby frontend)
Programatic access
(external apps)
Wings workflow
generation
OPM/PROV
conversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow
Template
Workflow
Instance
PROV
export
Linked
Data
Publication
Users
Other
workflow
environments
RDF
TripleStore
Workflow Provenance
Workflow Plan
Methodology for workflow publishing
Repository of linked workflows:
http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston,
2012.
Definition of workflow abstractions
7
Catalog of common independent
workflow abstractions (motifs)
Data-oriented motifs: What kind of
manipulations does the workflow
have?
Workflow-oriented motifs: How does
the workflow perform its operations
Analysis from 260 different workflows
from 10 domains analyzed belonging
to 5 different workflow systems
http://purl.org/net/wf-motifs#
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific
workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
Finding and evaluating common abstractions
8
https://github.com/dgarijo/FragFlow
http://purl.org/net/wf-fd
Graph mining techniques
Workflow fragment
representation
and linkage
Workflow fragment
Filtering techniques
Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th
IEEE International Conference on e-Science, Guaruja, 2014
Evaluation and results
9
Scientific workflow repositories can be mined automatically to extract reusable patterns
and abstractions that are useful for workflow developers aiming to reuse existing
workflows.
•Evaluation 1: Comparison against what users defined in the corpus
•Are our patterns similar to what you identified as a useful pattern?
•When playing with the pattern frequency, up to 75% of the detected
patterns are the same as the ones defined by users.
•Evaluation 2: User survey
•From those patterns we found disjoint with the user defined ones, are they
useful?
•66%-100% of the proposed patterns were considered useful
•Survey on three corpora.
Summary
10
•Workflow representation
•Models based on standards for representing workflow provenance and
workflow templates
•Adapted a common used methodology for publishing workflows as web
objects.
•Workflow abstraction
•Defined a catalog of common domain independent abstractions, based on
their functionality.
•Provided an ontology for semi-automatic annotation.
•Workflow reuse
•Automatic detection and annotation of common useful patterns given a
workflow corpora.
•Models to relate how patterns link and relate different workflows on a
workflow corpus.
11
Collaborators and co-authors
•Daniel Garijo, Oscar Corcho
Ontology Engineering Group, UPM
•Yolanda Gil
Information Sciences Institute, USC
•Boris A. Gutman, Ivo D. Dinov, Paul ThompsonArthur W. Toga,
Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad.
USC Laboratory of Neuro Imaging
IEEE eScience 2014. Guarujá, Brasil
•Pinar Alper, Khalid Belhajjame, Carole Goble

More Related Content

Similar to Creating abstractions from scientific workflows: PhD symposium 2015

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
Raffi Khatchadourian
 

Similar to Creating abstractions from scientific workflows: PhD symposium 2015 (20)

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Common Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical AnalysisCommon Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical Analysis
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
Converting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsConverting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research Objects
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
DITEC - Software Engineering
DITEC - Software EngineeringDITEC - Software Engineering
DITEC - Software Engineering
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
ppt2.pptx
ppt2.pptxppt2.pptx
ppt2.pptx
 
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
CS6502 OOAD - Question Bank and Answer
CS6502 OOAD - Question Bank and AnswerCS6502 OOAD - Question Bank and Answer
CS6502 OOAD - Question Bank and Answer
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
 

More from dgarijo

Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Recently uploaded (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Creating abstractions from scientific workflows: PhD symposium 2015

  • 1. Date: 04/05/2015 Creation of abstractions in scientific workflows Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid
  • 2. 2 Overview: In Silico Scientific workflows Benefits: •Sharing and reusing previous work •Time savings: reexecution of old experiments with different parameters). •Teaching: new students can learn existing methods in the lab •Design for modularity, so others can reuse •Design for standardization, reduction of heterogeneity •Debugging of executions •Paper writing, linking execution pipelines to publications. •Reproducibility. •Etc. Lab book Digital Log Laboratory Protocol (recipe) Workflow Experiment
  • 3. Hypotheses Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing workflows. •H1: It is possible to define common domain independent patterns based on the functionality of workflow steps. •H2: It is possible to detect common reusable patterns automatically. •H3: Common reusable patterns are potentially useful for users 3
  • 4. Challenges •Workflow representation •Heterogeneous representations. •Lack of a standard •Lack of methodologies for publishing workflows. •Workflow abstraction •There are no catalogs of the typical abstractions that can be found in scientific workflows based on their basic step functionality. •Difficulty in relating workflows. •Workflow reuse •Difficult to determine which parts of a workflow could be reused for /in another workflow •Workflow annotation and documentation •Manual process 4
  • 6. Vocabularies and methodologies for representing and publishing workflows 6 Interactive Browsing (Pubby frontend) Programatic access (external apps) Wings workflow generation OPM/PROV conversion Publication Share Reuse Core Portal WINGS on local laptop Workflow Template Workflow Instance PROV export Core Portal WINGS on shared host Workflow Template Workflow Instance PROV export Core Portal WINGS on web server Workflow Template Workflow Instance PROV export Linked Data Publication Users Other workflow environments RDF TripleStore Workflow Provenance Workflow Plan Methodology for workflow publishing Repository of linked workflows: http://www.opmw.org/sparql http://purl.org/net/p-plan http://www.opmw.org/ontology/ Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56. Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.
  • 7. Definition of workflow abstractions 7 Catalog of common independent workflow abstractions (motifs) Data-oriented motifs: What kind of manipulations does the workflow have? Workflow-oriented motifs: How does the workflow perform its operations Analysis from 260 different workflows from 10 domains analyzed belonging to 5 different workflow systems http://purl.org/net/wf-motifs# Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
  • 8. Finding and evaluating common abstractions 8 https://github.com/dgarijo/FragFlow http://purl.org/net/wf-fd Graph mining techniques Workflow fragment representation and linkage Workflow fragment Filtering techniques Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014
  • 9. Evaluation and results 9 Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing workflows. •Evaluation 1: Comparison against what users defined in the corpus •Are our patterns similar to what you identified as a useful pattern? •When playing with the pattern frequency, up to 75% of the detected patterns are the same as the ones defined by users. •Evaluation 2: User survey •From those patterns we found disjoint with the user defined ones, are they useful? •66%-100% of the proposed patterns were considered useful •Survey on three corpora.
  • 10. Summary 10 •Workflow representation •Models based on standards for representing workflow provenance and workflow templates •Adapted a common used methodology for publishing workflows as web objects. •Workflow abstraction •Defined a catalog of common domain independent abstractions, based on their functionality. •Provided an ontology for semi-automatic annotation. •Workflow reuse •Automatic detection and annotation of common useful patterns given a workflow corpora. •Models to relate how patterns link and relate different workflows on a workflow corpus.
  • 11. 11 Collaborators and co-authors •Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Boris A. Gutman, Ivo D. Dinov, Paul ThompsonArthur W. Toga, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad. USC Laboratory of Neuro Imaging IEEE eScience 2014. Guarujá, Brasil •Pinar Alper, Khalid Belhajjame, Carole Goble

Editor's Notes

  1. Explain the context: what are scientific workflows and their benefits