2012 02-10 myExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture

1,164 views

Published on

Presented at the EGI/SHIWA Workshops on e-Science Workflows
Budapest, 2012-02-10
https://www.egi.eu/indico/contributionDisplay.py?contribId=6&confId=656


Increasingly, published research is based on digital artefacts such as scientific workflows, web services and public data sets. myExperiment is a social website for sharing workflows and data, used by scientists of different domains such as bioinformatics, chemistry and astrophysics.

myExperiment let scientists build a structured aggregation or pack of the artefacts supporting a scientific work, which we call a Research Object. Packs may include a workflow, input data, reference data and the workflow results, thereby giving other researchers the ability to independently reproduce the virtual experiment and verify the results or reuse the work to analyse new data.

Research Objects can be collectively annotated, shared, published, modified, combined and derived, but are also subject to external changes, evolution and decay. For instance, rerunning a Research Object becomes difficult or impossible if its workflow depends on tools and services which are no longer executable. Similarly, researchers analysing the work might struggle to reuse a Research Object if it is not clear how its artefacts can be combined.

To address these preservation concerns we are developing “myExperiment 2.0” as both a software architecture and a reference implementation, called Wf4Ever. The Wf4Ever Architecture uses a Research Object Model to specify aggregations of digital artefacts with rich annotations, recording their provenance and evolution, and allowing sharing and collaboration through a set of decoupled RESTful Linked Data services.

The Wf4Ever Toolkit combines techniques such as analysing and comparing workflow structures, provenance traces from automated workflow reruns and utilising workflow integrity and authenticity checking, in order to detect Research Object decay, replay past workflow executions, attempt automated repair and recommend replacement workflow fragments.

http://www.myexperiment.org/
http://www.wf4ever-project.org/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,164
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2012 02-10 myExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture

  1. 1. myExperiment 2.0 –Preserving digital Research Objects using the Wf4Ever architecture Stian Soiland-Reyes myGrid, University of Manchester EGI/SHIWA Workshops on e-Science Workflows Budapest, 2012-02-10
  2. 2. Workflow-based Science Background Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO,http://www.taverna.org.uk/ e-Lico, VPH-SHARE, EGI-INSPiRE…. myExperiment - Web 3.0 virtual environment, library and social network for workflowshttp://www.myexperiment.org/ ~5000 registered users ~2200 workflows ~21 different systems 2
  3. 3. Workflow-based ScienceWhat is a Scientific Workflow?» Workflows coordinate the execution of services and link together resources.» Data-driven rather than process-driven: «Send output from A to B and C»» Semi-automated computational execution in scientific problem-solving  repeatable, reproducable, reusable» The implementation of a scientific method 3
  4. 4. Trident Triana Kepler BPELMeandre Taverna Galaxy
  5. 5. Reuse, Recycling, Repurposing• Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle• Paul meets Jo. Jo is investigating whipworm in mouse.• Jo reuses one of Paul’s workflow without change.• Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.• Previously a manual two year study by Jo had failed to do this. 5
  6. 6. “A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
  7. 7. http://www.myexperiment.org/  “Facebook for Scientists”  A probe into researcher behaviour ...but different to Facebook!  A repository of research methods  Open source (BSD) Ruby on Rails app A social network of people and things  REST and SPARQL, Linked Data A Social Virtual Research Environment  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs
  8. 8. http://www.myexperiment.org/workflows/140
  9. 9. myExperiment API APIs for developers XML facebook iGoogle android HTML API config SPARQL endpoint Managed REST API tags ratings reviews profiles Search workflows credits groups Engine packs friendships files` RDF Store mySQL Enactor 10
  10. 10. myExperiment API Taverna integration» myExperiment plugin for Taverna › Browse myExperiment workflows • My workflows • Tags • Search › Open workflow • + Embed in existing workflow › Upload workflows • Provide metadata 11
  11. 11. 12
  12. 12. Paul’sPaul’s Pack Workflow 16 QTL Research Results Object produces Included in Published in Included in Feeds intoLogs produces Included in Included inMetadata Slides Paper produces Published in Common pathways Workflow 13 Results
  13. 13. The R.* dimensionsReusable. The key tenet of Research Replayable. Studies might involveObjects is to support the sharing and single investigations that happen inreuse of data, methods and processes. milliseconds or protracted processesRepurposeable. Reuse may also that take years.involve the reuse of constituent parts of Referenceable. If research objects arethe Research Object. to augment or replace traditionalRepeatable. There should be sufficient publication methods, then they must be referenceable or citeable.information in a Research Object to beable to repeat the study, perhaps years Revealable. Third parties must be ablelater. to audit the steps performed in theReproducible. A third party can start research in order to be convinced of the validity of results.with the same inputs and methods andsee if a prior result can be confirmed. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/
  14. 14. Pack analysis Workflow – pack contains a • Paper - source for a paper number of workflows • Tutorial - tutorial material Presentation - encapsulation of • Data - collection of data files a single presentation • Derived data - results of Collection - a number of things: workflow workflows, presentations, • Benchmark - benchmarking data papers • Supplementary - stuff associated Heterogeneous - where the with a paper workflows do not appear to have a clear common purpose • Noise - tests, tryouts, rubbish Homogeneous - workflows • Oddity - none of the above appear to be designed to work together Analysis by Sean Bechhofer
  15. 15.  Workflow Preservation  Research Objects  Provenance  Recommendation Astronomy and Genomics http://www.wf4ever-project.org/
  16. 16. Wf4Ever ChallengesPreservation of scientific workflows » Scientific workflows aim at the heart of in data-intensive science experimental science › Enable automation of scientific methods › Encourage best practices » Need to be preserved › Reuse is fundamental for incremental scientific development › Method reproducibility is key for credit and publication » …but workflow preservation is complex › Heterogeneous types of information need to be aggregated, including workflows and related resources into research objects › Research objects need to be trusted and understandable n years from now › Social aspects need to be addressed in order to support reuse in scientific communities 17
  17. 17. Wf4Ever Stability, Completeness, Integrity, Authenticity, QualityWorkflow Decay• Component level• flux/decay/unavailability• Data level • formats/ids/standards• Infrastructure level • platform/resourcesExperiment Decay• Methodological changes• New technologies• New resources/components• New data 18
  18. 18. Research Objects as Social Objects20 20 20
  19. 19. Research objects 21
  20. 20. Research Object modelResearch Object Core (ro) 23
  21. 21. Research Object modelWorkflow Description (wfdesc) 24
  22. 22. Research Object modelWorkflow Provenance (wfprov) 25
  23. 23. The Wf4Ever Proposal Technical Infrastructure• Models • Research Object • Annotation • Provenance • Evolution and Versioning • Semantic Web Encoding• Services • Foundational, Extension, User • APIs, Architecture • Web protocols/services• Principles • Map into standards • Adopt standards • Lightweight components• Ecosystem • Command line • Third party systems 26
  24. 24. The Wf4Ever Proposal ServicesUserClientsExtensionServicesFoundationServices 27
  25. 25. Next steps» Analyse decay within all myExperiment workflows › Estimate: Roughly 50% don’t run correctly › But why? Services gone? Did they ever work? › How have the community evolved those workflows?» Service and wf substitutions; recommendations» Provenance analysis and use › e.g. verifiability, replayability» SHIWA approach – can handle “What if the workflow system stops working” 28
  26. 26. Thank you! Any Questions? http://www.myexperiment.org/ http://www.wf4ever-project.org/ http://www.mygrid.org.uk/This work is licensed under the Creative Commons Attribution 3.0 http://www.taverna.org.uk/Unported License. To view a copy of this license, visithttp://creativecommons.org/licenses/by/3.0/ or send a letter to CreativeCommons, 444 Castro Street, Suite 900, Mountain View, California,94041, USA. 29

×