Successfully reported this slideshow.

WORKS 11 Presentation

1

Share

1 of 17
1 of 17

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

WORKS 11 Presentation

  1. 1. A new Approach for Publishing Workflows: Abstractions, Standards and Linked Data Daniel Garijo Ontology Engineering Group, Departamento de Inteligencia Artificial. Universidad Politécnica de Madrid Yolanda Gil Information Sciences and Institute University of Southern California, Marina del Rey Date: 14/11/2011
  2. 2. Index of contents Index: 1. Background 2. Limitations of existing approaches to workflow publication 3. Features of our approach • Publishing abstract workflows and specific workflows • OPMW Ontology • Linked Data Publication 4. Workflow querying and Linked Data consumption 5. Conclusions 1
  3. 3. Background Typical Published Article Reproducible Article: Weaver, GenePattern GRRD, etc. Text: Text: Narrative of method, Narrative of method, software packages used software packages used Data: Data: Key datasets and figures/plots Key datasets and figures/plots Workflow: NOT published, Workflow/scripts describing loosely recorded: dataflow, codes, and parameters Software: scripted codes + manual steps + notes/emails 2
  4. 4. Current issues with existing publication approaches Only executable workflow is published: Reproducible Article: 1. Must have the same codes to re-execute Weaver, GenePattern GRRD, etc. the workflow, but: – Codes become unavailable • Eg: eHits was proprietary and replaced by Text: AutodockVina Narrative of method, – Different labs prefer different codes software packages used • Eg: R vs Matlab • Eg: viz in Citoscape vs yEd Data: 2. Must have the same workflow framework Key datasets and figures/plots to re-execute the workflow – Must have R for Weaver Workflow: 3. Must import files to local file system and Workflow/scripts describing workflow framework dataflow, codes, and parameters – Must import bundle of workflow/data/code files to reproduce 3
  5. 5. Key Features of our approach • Publish an abstract workflow in addition to executable workflow – Description of workflow that is independent of the codes executed – Maps to the codes executed (the “executable workflow”) • Publish both abstract and executable workflow using the OPM standard – OPM (Open Provenance Model) is independent of workflow framework and is widely implemented – Other groups can import to their own workflow framework • Publish data and workflows as Linked Data on the Web – All workflows and related files are web-accessible – Simple mechanism to share across local file systems 4
  6. 6. What is Linked Data 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” 5
  7. 7. High level architecture Other workflow WINGS on local laptop environments Workflow Core Template OPM Portal Workflow export Instance Programatic access (external apps) WINGS on shared host Workflow Linked Core Template OPM Portal export Data Workflow Instance Publication Interactive WINGS on web server Browsing Workflow (Pubby frontend) Core Template OPM Portal export Users Workflow Instance Wings workflow OPM Publication Share Reuse generation conversion 6
  8. 8. Publishing the abstract workflow Comparison of Dissimilar protein structures workflow 7
  9. 9. OPMW Ontology opmv:Artifact opmv:Artifact opmw: opmw: opmv: opmw: hasArtifactTemplate ArtifactTemplate ArtifactInstance Agent artifact1 execInput1 user1 opmo:account opmo: opmo:hasArtifact opmv:used opmv:used opmo:account opmo: opmv:wasControlledBy OPMGraph opmv:Process Account opmv:Process opmw: opmw: opmw:ProcessTemplate opmw:ProcessInstance opmo: WorkflowTemplate ExecutionAccount opmo: templateNode1 opmw:hasProcessTemplate executionNode1 account template1 account1 hasProcess opmw:hasTemplateComponent opmw:hasSpecificComponent opmo: opmv: hasArtifact wasGeneratedBy ac:AbstractComponent ac:SpecificComponent opmv:wasGeneratedBy opmo: account absComp1 specComp1 opmw: opmw: opmw:hasArtifactTemplate ArtifactInstace ArtifactTemplate outputArtifact1 executionOutput1 opmv:Artifact opmv:Artifact opmw:hasWorkflowTemplate Abstract Workflow Executable Workflow 8
  10. 10. Publication of Workflows as Linked Data Linked Data publication Abstract Workflow RDF Upload Wings (OPM) Interface OPM conversion OPM Executable Other workflow conversion frameworks Workflow RDF (OPM) OPM Permanent Triple store import web- accessible Workflow file Data, store SPARQL Web Components, Endpoint accessible etc. Web browser 9
  11. 11. Searching/Browsing Workflows as Linked Data Types of search Resource URI (Process instance) Autocomplete search bar Specific component for this process instance Properties 10
  12. 12. Searching/Browsing Workflows as Linked Data Component Name Component Inputs Component Outputs Code Implementations Template additional metadata Record of the different executions of this workflow 11
  13. 13. Conclusions 1. Publication of an abstract workflow that represents the computational method in an execution-independent manner. 2. Publication of the abstract workflow and the executed workflow using the OPM standard that is independent of the execution environment used. 3. Publication of the workflows, components, codes and datasets as Linked Data on the web. 12
  14. 14. Future work • Extensions to abstract workflow publication – Be able to provide abstractions on several steps. – Incomplete provenance. • Create an OPMV/W3C PROV-O profile for common workflow representation. – Increase interoperability with other workflow representation systems. • Workflow reuse in different workflow systems. – Import and execute workflows in other workflow frameworks. 13
  15. 15. References • WINGS workflow system: http://seagull.isi.edu/marbles/ •The Open Provenance Model Specification: http://openprovenance.org/ • OPMO: http://openprovenance.org/model/opmo •OPMV: http://open-biomed.sourceforge.net/opmv/ns.html • TB Drugome Wiki (Evolution of this work): http://seagull.isi.edu/wings-drugome/index.php/Main_Page •W3C PROV-O current ontology (draft): http://www.w3.org/2011/prov/wiki/PIL_OWL_Ontology •Principles of Linked Data: http://www.w3.org/DesignIssues/LinkedData.html 14
  16. 16. Acknowledgements •UCSD people: •Li Xie •Lei Xie •Sarah Kinnings •Phil Bourne •ISI people: •Varun Ratnakaar •OEG people: •Oscar Corcho 15
  17. 17. A new Approach for Publishing Workflows: Abstractions, Standards and Linked Data Daniel Garijo Ontology Engineering Group, Departamento de Inteligencia Artificial. Universidad Politécnica de Madrid Yolanda Gil Information Sciences and Institute University of Southern California, Marina del Rey Date: 14/11/2011

×