SlideShare a Scribd company logo
1 of 25
Date: 10/11/2012
Common Motifs in Scientific
Workflows: An Empirical
Analysis
Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho
*, Yolanda Gil Ŧ, Carole Goble ⱡ
* Universidad Politécnica de Madrid,
ⱡUniversity of Manchester,
Ŧ USC Information Sciences Institute
IEEE eScience 2012. Chicago, USA
2
Overview
• Empirical analysis on 177 workflow templates from Taverna and
Wings
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs
•Understandability and reuse
IEEE eScience 2012. Chicago, USA
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
3
Background
• Workflows as software artifacts that capture the scientific method
• Addition to paper publication
• Reuse
• Existing repositories of workflows (myExperiment)
• Sharing workflows
• Exploring existing workflows.
• PROBLEMS to address:
• Sometimes workflows are difficult to understand
• Workflow descriptions depend on tools/files
• Decay of workflows
• Identify good practices for workflow design
IEEE eScience 2012. Chicago, USA
http://www.myexperiment.org
4
Approach
•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use
IEEE eScience 2012. Chicago, USA
5
Taverna and Wings
IEEE eScience 2012. Chicago, USA
http://www.taverna.org.uk/
http://www.wings-workflows.org/
6
Workflow Motifs
•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?
•E.g.:
•Data retrieval
•Data preparation
• etc.
2. Workflow-oriented motifs: How does the workflow perform its operations?
•E.g.:
•Stateful steps
•Stateless steps
•Human interactions
•etc.
IEEE eScience 2012. Chicago, USA
WHAT?
HOW?
7
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
8
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
9
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
10
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
11
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
12
Data Oriented Motifs
Data-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
13
Workflow Oriented Motifs
Workflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
14
Workflow Oriented Motifs
Workflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
15
Workflow Oriented Motifs
Workflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
16
Workflow Oriented Motifs
Workflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
17
Workflow Oriented Motifs
Workflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
18
Experiment setup
IEEE eScience 2012. Chicago, USA
•177 Workflow templates
• 111 from Taverna, sample from myExperiment
• 66 from Wings, available in public server (now as Linked Data)
• Diverse domains
0
5
10
15
20
25
30
35
40
Taverna
Wings
19
Result Summary: Data Oriented Motifs
IEEE eScience 2012. Chicago, USA
•Over 60% of the motifs are data preparation motifs
•Of the 4 subcategories, the most common across domains are output
splitting, input augmentation, and reformatting steps.
•Data retrieval common in domains where curated databases exist
•Data analysis is often the main functionality of the workflow
Data organisation
20
Result Summary: Workflow Oriented Motifs
IEEE eScience 2012. Chicago, USA
• Around 40% composite workflows and internal macros
•Workflow reuse is present even in some atomic workflows
•Human interactions steps increasingly used in some domains
21
Differences and commonalities of the workflow systems
IEEE eScience 2012. Chicago, USA
•Data moving/retrieval, stateful interactions and human interaction steps are
not present in Wings
•Web services (Taverna) versus software components (Wings)
•Wings has layered execution through Pegasus
•Data preparation steps are common in both systems
•Use of sub workflows is high
22
Discussion
IEEE eScience 2012. Chicago, USA
http://www.sandensconsulting.com/images/DataObfuscation.jpg
Our observations:
• Obfuscation of scientific workflows
•The abundance of data preparation
steps make the functionality of the
workflow unclear.
• Decay of scientific workflows
• Create an abstract description.
• Good practices for workflow design
• Sub-workflows
• Workflow overloading
Method in paper
Workflow
•Empirical analysis of scientific workflows
177 workflows
• 2 different systems
• A variety of heterogeneous domains
•Workflow motif catalog
• Data oriented motifs
• Workflow oriented motifs
•Future work: automatic abstractions on workflows
Template analysis
 Trace analysis (provenance)
 Include other workflow systems
23
Conclusions and future work
IEEE eScience 2012. Chicago, USA
24
Who are we?
•Pinar Alper
School of Computer Science, University of Manchester
•Khalid Belhajjame
School of Computer Science, University of Manchester
•Oscar Corcho
Ontology Engineering Group, UPM
•Yolanda Gil
Information Sciences Institute, USC
•Carole Goble
School of Computer Science, University of Manchester
EU Wf4Ever project (270129)
funded under EU FP7 (ICT- 2009.4.1).
(http://www.wf4ever-project.org)
IEEE eScience 2012. Chicago, USA
Date: 10/11/2012
Common Motifs in Scientific
Workflows: An Empirical
Analysis
Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho
*, Yolanda Gil Ŧ, Carole Goble ⱡ
* Universidad Politécnica de Madrid,
ⱡUniversity of Manchester,
Ŧ USC Information Sciences Institute
IEEE eScience 2012. Chicago, USA

More Related Content

Similar to Common Motifs in Scientific Workflows: An Empirical Analysis

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 

Similar to Common Motifs in Scientific Workflows: An Empirical Analysis (20)

From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering Methodology
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 

More from dgarijo

Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Common Motifs in Scientific Workflows: An Empirical Analysis

  • 1. Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ * Universidad Politécnica de Madrid, ⱡUniversity of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA
  • 2. 2 Overview • Empirical analysis on 177 workflow templates from Taverna and Wings • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse IEEE eScience 2012. Chicago, USA http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
  • 3. 3 Background • Workflows as software artifacts that capture the scientific method • Addition to paper publication • Reuse • Existing repositories of workflows (myExperiment) • Sharing workflows • Exploring existing workflows. • PROBLEMS to address: • Sometimes workflows are difficult to understand • Workflow descriptions depend on tools/files • Decay of workflows • Identify good practices for workflow design IEEE eScience 2012. Chicago, USA http://www.myexperiment.org
  • 4. 4 Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use IEEE eScience 2012. Chicago, USA
  • 5. 5 Taverna and Wings IEEE eScience 2012. Chicago, USA http://www.taverna.org.uk/ http://www.wings-workflows.org/
  • 6. 6 Workflow Motifs •Workflow motif: Domain independent conceptual abstraction on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? •E.g.: •Data retrieval •Data preparation • etc. 2. Workflow-oriented motifs: How does the workflow perform its operations? •E.g.: •Stateful steps •Stateless steps •Human interactions •etc. IEEE eScience 2012. Chicago, USA WHAT? HOW?
  • 7. 7 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 8. 8 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 9. 9 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 10. 10 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 11. 11 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 12. 12 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  • 13. 13 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  • 14. 14 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  • 15. 15 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  • 16. 16 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  • 17. 17 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  • 18. 18 Experiment setup IEEE eScience 2012. Chicago, USA •177 Workflow templates • 111 from Taverna, sample from myExperiment • 66 from Wings, available in public server (now as Linked Data) • Diverse domains 0 5 10 15 20 25 30 35 40 Taverna Wings
  • 19. 19 Result Summary: Data Oriented Motifs IEEE eScience 2012. Chicago, USA •Over 60% of the motifs are data preparation motifs •Of the 4 subcategories, the most common across domains are output splitting, input augmentation, and reformatting steps. •Data retrieval common in domains where curated databases exist •Data analysis is often the main functionality of the workflow Data organisation
  • 20. 20 Result Summary: Workflow Oriented Motifs IEEE eScience 2012. Chicago, USA • Around 40% composite workflows and internal macros •Workflow reuse is present even in some atomic workflows •Human interactions steps increasingly used in some domains
  • 21. 21 Differences and commonalities of the workflow systems IEEE eScience 2012. Chicago, USA •Data moving/retrieval, stateful interactions and human interaction steps are not present in Wings •Web services (Taverna) versus software components (Wings) •Wings has layered execution through Pegasus •Data preparation steps are common in both systems •Use of sub workflows is high
  • 22. 22 Discussion IEEE eScience 2012. Chicago, USA http://www.sandensconsulting.com/images/DataObfuscation.jpg Our observations: • Obfuscation of scientific workflows •The abundance of data preparation steps make the functionality of the workflow unclear. • Decay of scientific workflows • Create an abstract description. • Good practices for workflow design • Sub-workflows • Workflow overloading Method in paper Workflow
  • 23. •Empirical analysis of scientific workflows 177 workflows • 2 different systems • A variety of heterogeneous domains •Workflow motif catalog • Data oriented motifs • Workflow oriented motifs •Future work: automatic abstractions on workflows Template analysis  Trace analysis (provenance)  Include other workflow systems 23 Conclusions and future work IEEE eScience 2012. Chicago, USA
  • 24. 24 Who are we? •Pinar Alper School of Computer Science, University of Manchester •Khalid Belhajjame School of Computer Science, University of Manchester •Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Carole Goble School of Computer Science, University of Manchester EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org) IEEE eScience 2012. Chicago, USA
  • 25. Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ * Universidad Politécnica de Madrid, ⱡUniversity of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA