Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Common Motifs in Scientific Workflows: An Empirical Analysis

1,405 views

Published on

Slides for the e-Science 2012 presentation for the paper: Common Motifs in Scientific Workflows: An Empirical Analysis. The paper provides an analysis on 177 workflows from Taverna and Wings workflow systems, across diverse domains. The analysis highlights the commonmotifs or patterns that were found in the templates based on the functionality of each workflow step.

Published in: Technology
  • Be the first to comment

Common Motifs in Scientific Workflows: An Empirical Analysis

  1. 1. Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ * Universidad Politécnica de Madrid, ⱡUniversity of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA
  2. 2. 2 Overview • Empirical analysis on 177 workflow templates from Taverna and Wings • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse IEEE eScience 2012. Chicago, USA http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
  3. 3. 3 Background • Workflows as software artifacts that capture the scientific method • Addition to paper publication • Reuse • Existing repositories of workflows (myExperiment) • Sharing workflows • Exploring existing workflows. • PROBLEMS to address: • Sometimes workflows are difficult to understand • Workflow descriptions depend on tools/files • Decay of workflows • Identify good practices for workflow design IEEE eScience 2012. Chicago, USA http://www.myexperiment.org
  4. 4. 4 Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use IEEE eScience 2012. Chicago, USA
  5. 5. 5 Taverna and Wings IEEE eScience 2012. Chicago, USA http://www.taverna.org.uk/ http://www.wings-workflows.org/
  6. 6. 6 Workflow Motifs •Workflow motif: Domain independent conceptual abstraction on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? •E.g.: •Data retrieval •Data preparation • etc. 2. Workflow-oriented motifs: How does the workflow perform its operations? •E.g.: •Stateful steps •Stateless steps •Human interactions •etc. IEEE eScience 2012. Chicago, USA WHAT? HOW?
  7. 7. 7 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  8. 8. 8 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  9. 9. 9 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  10. 10. 10 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  11. 11. 11 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  12. 12. 12 Data Oriented Motifs Data-Oriented Motifs Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation IEEE eScience 2012. Chicago, USA
  13. 13. 13 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  14. 14. 14 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  15. 15. 15 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  16. 16. 16 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  17. 17. 17 Workflow Oriented Motifs Workflow-Oriented Motifs Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading IEEE eScience 2012. Chicago, USA
  18. 18. 18 Experiment setup IEEE eScience 2012. Chicago, USA •177 Workflow templates • 111 from Taverna, sample from myExperiment • 66 from Wings, available in public server (now as Linked Data) • Diverse domains 0 5 10 15 20 25 30 35 40 Taverna Wings
  19. 19. 19 Result Summary: Data Oriented Motifs IEEE eScience 2012. Chicago, USA •Over 60% of the motifs are data preparation motifs •Of the 4 subcategories, the most common across domains are output splitting, input augmentation, and reformatting steps. •Data retrieval common in domains where curated databases exist •Data analysis is often the main functionality of the workflow Data organisation
  20. 20. 20 Result Summary: Workflow Oriented Motifs IEEE eScience 2012. Chicago, USA • Around 40% composite workflows and internal macros •Workflow reuse is present even in some atomic workflows •Human interactions steps increasingly used in some domains
  21. 21. 21 Differences and commonalities of the workflow systems IEEE eScience 2012. Chicago, USA •Data moving/retrieval, stateful interactions and human interaction steps are not present in Wings •Web services (Taverna) versus software components (Wings) •Wings has layered execution through Pegasus •Data preparation steps are common in both systems •Use of sub workflows is high
  22. 22. 22 Discussion IEEE eScience 2012. Chicago, USA http://www.sandensconsulting.com/images/DataObfuscation.jpg Our observations: • Obfuscation of scientific workflows •The abundance of data preparation steps make the functionality of the workflow unclear. • Decay of scientific workflows • Create an abstract description. • Good practices for workflow design • Sub-workflows • Workflow overloading Method in paper Workflow
  23. 23. •Empirical analysis of scientific workflows 177 workflows • 2 different systems • A variety of heterogeneous domains •Workflow motif catalog • Data oriented motifs • Workflow oriented motifs •Future work: automatic abstractions on workflows Template analysis  Trace analysis (provenance)  Include other workflow systems 23 Conclusions and future work IEEE eScience 2012. Chicago, USA
  24. 24. 24 Who are we? •Pinar Alper School of Computer Science, University of Manchester •Khalid Belhajjame School of Computer Science, University of Manchester •Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Carole Goble School of Computer Science, University of Manchester EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org) IEEE eScience 2012. Chicago, USA
  25. 25. Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ * Universidad Politécnica de Madrid, ⱡUniversity of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA

×