SlideShare a Scribd company logo
1 of 27
Download to read offline
myExperiment 2.0 –
Preserving digital Research Objects
  using the Wf4Ever architecture
             Stian Soiland-Reyes
        myGrid, University of Manchester
                          EGI/SHIWA Workshops on e-Science Workflows
                                               Budapest, 2012-02-10
Workflow-based Science
                                                            Background

                               Taverna - Scientific Workflow Management
                                  System
                               ~85000 downloads
                               ~EU projects: SCAPE, BioVeL, HELIO,
http://www.taverna.org.uk/
                               e-Lico, VPH-SHARE, EGI-INSPiRE….

                               myExperiment - Web 3.0 virtual
                                 environment, library and social
                                 network for workflows
http://www.myexperiment.org/
                               ~5000 registered users
                               ~2200 workflows
                               ~21 different systems

                                                                          2
Workflow-based Science
What is a Scientific Workflow?


» Workflows coordinate the
  execution of services and
  link together resources.
» Data-driven rather than
  process-driven: «Send
  output from A to B and C»
» Semi-automated
  computational execution in
  scientific problem-solving
   repeatable, reproducable,
  reusable
» The implementation of a
  scientific method
                                3
Trident                     Triana
                   Kepler

                                     BPEL
Meandre                      Taverna


          Galaxy
Reuse, Recycling, Repurposing

• Paul writes workflows for identifying biological
  pathways implicated in resistance to Trypanosomiasis
  in cattle
• Paul meets Jo. Jo is investigating whipworm in
  mouse.
• Jo reuses one of Paul’s workflow without change.
• Jo identifies the biological pathways involved in sex
  dependence in the mouse model, believed to be
  involved in the ability of mice to expel the parasite.
• Previously a manual two year study by Jo had failed
  to do this.

                                                                 5
“A biologist would rather share their
 toothbrush than their gene name”




                                  Mike Ashburner and others
                                Professor in Dept of Genetics,
                                 University of Cambridge, UK
http://www.myexperiment.org/

       “Facebook for Scientists”           A probe into researcher behaviour
       ...but different to Facebook!

   A repository of research methods       Open source (BSD) Ruby on Rails app

 A social network of people and things       REST and SPARQL, Linked Data

 A Social Virtual Research Environment    Influenced BioCatalogue, MethodBox
                                                      and SysMO-SEEK

     myExperiment currently has 5378 members, 292 groups, 2273
                workflows, 534 files and 217 packs
http://www.myexperiment.org/workflows/140
myExperiment API
                                                                APIs for developers
                        XML




                  facebook
                             iGoogle
                                       android
    HTML                                               API
                                                      config




                                                                                             SPARQL endpoint
                                Managed REST API

           tags              ratings              reviews        profiles
                                                                                   Search
                  workflows                       credits      groups              Engine
                                       packs                   friendships
             files

`                                                                                            RDF
                                                                                            Store

                                                 mySQL
                                                                             Enactor
                                                                                                               10
myExperiment API
                                      Taverna integration


» myExperiment plugin for Taverna
   › Browse myExperiment workflows
     • My workflows
     • Tags
     • Search
   › Open workflow
     • + Embed in existing workflow
   › Upload workflows
     • Provide metadata

                                                          11
12
Paul’s
Paul’s Pack                                         Workflow 16                                      QTL

 Research                       Results
  Object                                                        produces


                  Included in


                                                                                              Published in
                                                             Included in
                                   Feeds into


Logs   produces                            Included in                     Included in



Metadata                                                 Slides                               Paper
                                                  produces                     Published in



                    Common pathways

                    Workflow 13                                      Results
The R.* dimensions


Reusable. The key tenet of Research                 Replayable. Studies might involve
Objects is to support the sharing and               single investigations that happen in
reuse of data, methods and processes.               milliseconds or protracted processes
Repurposeable. Reuse may also                       that take years.
involve the reuse of constituent parts of Referenceable. If research objects are
the Research Object.                      to augment or replace traditional
Repeatable. There should be sufficient publication methods, then they must be
                                          referenceable or citeable.
information in a Research Object to be
able to repeat the study, perhaps years Revealable. Third parties must be able
later.                                    to audit the steps performed in the
Reproducible. A third party can start research in order to be convinced of the
                                          validity of results.
with the same inputs and methods and
see if a prior result can be confirmed.   Respectful. Explicit representations of
                                          the provenance, lineage and flow of
                                          intellectual property.
   Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/
Pack analysis



 Workflow – pack contains a         •   Paper - source for a paper
  number of workflows                •   Tutorial - tutorial material
 Presentation - encapsulation of    •   Data - collection of data files
  a single presentation              •   Derived data - results of
 Collection - a number of things:       workflow
  workflows, presentations,          •   Benchmark - benchmarking data
  papers
                                     •   Supplementary - stuff associated
 Heterogeneous - where the              with a paper
  workflows do not appear to
  have a clear common purpose        •   Noise - tests, tryouts, rubbish
 Homogeneous - workflows            •   Oddity - none of the above
  appear to be designed to work
  together                                  Analysis by Sean Bechhofer
 Workflow Preservation
    Research Objects
       Provenance
    Recommendation
 Astronomy and Genomics
                           http://www.wf4ever-project.org/
Wf4Ever
                                                                     Challenges
Preservation of scientific workflows   » Scientific workflows aim at the heart of
     in data-intensive science           experimental science
                                            › Enable automation of scientific
                                              methods
                                            › Encourage best practices
                                       » Need to be preserved
                                            › Reuse is fundamental for incremental
                                              scientific development
                                            › Method reproducibility is key for credit
                                              and publication
                                       » …but workflow preservation is complex
                                            › Heterogeneous types of information
                                              need to be aggregated, including
                                              workflows and related resources into
                                              research objects
                                            › Research objects need to be trusted and
                                              understandable n years from now
                                            › Social aspects need to be addressed in
                                              order to support reuse in scientific
                                              communities
                                                                                   17
Wf4Ever
                           Stability, Completeness, Integrity, Authenticity, Quality




Workflow Decay
•   Component level
•   flux/decay/unavailability
•   Data level
     •   formats/ids/standards
•   Infrastructure level
     •   platform/resources


Experiment Decay
•   Methodological changes
•   New technologies
•   New resources/components
•   New data



                                                                                  18
Research Objects as Social Objects




20          20
                                     20
Research objects




              21
Research Object model
Research Object Core (ro)




                           23
Research Object model
Workflow Description (wfdesc)




                                24
Research Object model
Workflow Provenance (wfprov)




                                25
The Wf4Ever Proposal
                                         Technical Infrastructure

•   Models
     • Research Object
     • Annotation
     • Provenance
     • Evolution and Versioning
     • Semantic Web Encoding
•   Services
     • Foundational, Extension, User
     • APIs, Architecture
     • Web protocols/services
•   Principles
     • Map into standards
     • Adopt standards
     • Lightweight components
•   Ecosystem
     • Command line
     • Third party systems

                                                              26
The Wf4Ever Proposal
                      Services


User
Clients



Extension
Services




Foundation
Services



                               27
Next steps

» Analyse decay within all myExperiment workflows
   › Estimate: Roughly 50% don’t run correctly
   › But why? Services gone? Did they ever work?
   › How have the community evolved those workflows?
» Service and wf substitutions; recommendations
» Provenance analysis and use
   › e.g. verifiability, replayability
» SHIWA approach – can handle “What if the
  workflow system stops working”

                                                       28
Thank you!




                                      Any Questions?
                                                                   http://www.myexperiment.org/

                                                           http://www.wf4ever-project.org/

                                                                           http://www.mygrid.org.uk/

This work is licensed under the Creative Commons Attribution 3.0           http://www.taverna.org.uk/
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative
Commons, 444 Castro Street, Suite 900, Mountain View, California,
94041, USA.                                                                                             29

More Related Content

More from Stian Soiland-Reyes

2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015Stian Soiland-Reyes
 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architectureStian Soiland-Reyes
 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator projectStian Soiland-Reyes
 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wildStian Soiland-Reyes
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)Stian Soiland-Reyes
 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?Stian Soiland-Reyes
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...Stian Soiland-Reyes
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow systemStian Soiland-Reyes
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXStian Soiland-Reyes
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Stian Soiland-Reyes
 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using TavernaStian Soiland-Reyes
 

More from Stian Soiland-Reyes (17)

2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015
 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project
 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance
 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using Taverna
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

2012 02-10 myExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture

  • 1. myExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture Stian Soiland-Reyes myGrid, University of Manchester EGI/SHIWA Workshops on e-Science Workflows Budapest, 2012-02-10
  • 2. Workflow-based Science Background Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO, http://www.taverna.org.uk/ e-Lico, VPH-SHARE, EGI-INSPiRE…. myExperiment - Web 3.0 virtual environment, library and social network for workflows http://www.myexperiment.org/ ~5000 registered users ~2200 workflows ~21 different systems 2
  • 3. Workflow-based Science What is a Scientific Workflow? » Workflows coordinate the execution of services and link together resources. » Data-driven rather than process-driven: «Send output from A to B and C» » Semi-automated computational execution in scientific problem-solving  repeatable, reproducable, reusable » The implementation of a scientific method 3
  • 4. Trident Triana Kepler BPEL Meandre Taverna Galaxy
  • 5. Reuse, Recycling, Repurposing • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating whipworm in mouse. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this. 5
  • 6. “A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
  • 7. http://www.myexperiment.org/  “Facebook for Scientists”  A probe into researcher behaviour ...but different to Facebook!  A repository of research methods  Open source (BSD) Ruby on Rails app  A social network of people and things  REST and SPARQL, Linked Data  A Social Virtual Research Environment  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs
  • 8.
  • 10. myExperiment API APIs for developers XML facebook iGoogle android HTML API config SPARQL endpoint Managed REST API tags ratings reviews profiles Search workflows credits groups Engine packs friendships files ` RDF Store mySQL Enactor 10
  • 11. myExperiment API Taverna integration » myExperiment plugin for Taverna › Browse myExperiment workflows • My workflows • Tags • Search › Open workflow • + Embed in existing workflow › Upload workflows • Provide metadata 11
  • 12. 12
  • 13. Paul’s Paul’s Pack Workflow 16 QTL Research Results Object produces Included in Published in Included in Feeds into Logs produces Included in Included in Metadata Slides Paper produces Published in Common pathways Workflow 13 Results
  • 14. The R.* dimensions Reusable. The key tenet of Research Replayable. Studies might involve Objects is to support the sharing and single investigations that happen in reuse of data, methods and processes. milliseconds or protracted processes Repurposeable. Reuse may also that take years. involve the reuse of constituent parts of Referenceable. If research objects are the Research Object. to augment or replace traditional Repeatable. There should be sufficient publication methods, then they must be referenceable or citeable. information in a Research Object to be able to repeat the study, perhaps years Revealable. Third parties must be able later. to audit the steps performed in the Reproducible. A third party can start research in order to be convinced of the validity of results. with the same inputs and methods and see if a prior result can be confirmed. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/
  • 15. Pack analysis  Workflow – pack contains a • Paper - source for a paper number of workflows • Tutorial - tutorial material  Presentation - encapsulation of • Data - collection of data files a single presentation • Derived data - results of  Collection - a number of things: workflow workflows, presentations, • Benchmark - benchmarking data papers • Supplementary - stuff associated  Heterogeneous - where the with a paper workflows do not appear to have a clear common purpose • Noise - tests, tryouts, rubbish  Homogeneous - workflows • Oddity - none of the above appear to be designed to work together Analysis by Sean Bechhofer
  • 16.  Workflow Preservation  Research Objects  Provenance  Recommendation  Astronomy and Genomics http://www.wf4ever-project.org/
  • 17. Wf4Ever Challenges Preservation of scientific workflows » Scientific workflows aim at the heart of in data-intensive science experimental science › Enable automation of scientific methods › Encourage best practices » Need to be preserved › Reuse is fundamental for incremental scientific development › Method reproducibility is key for credit and publication » …but workflow preservation is complex › Heterogeneous types of information need to be aggregated, including workflows and related resources into research objects › Research objects need to be trusted and understandable n years from now › Social aspects need to be addressed in order to support reuse in scientific communities 17
  • 18. Wf4Ever Stability, Completeness, Integrity, Authenticity, Quality Workflow Decay • Component level • flux/decay/unavailability • Data level • formats/ids/standards • Infrastructure level • platform/resources Experiment Decay • Methodological changes • New technologies • New resources/components • New data 18
  • 19. Research Objects as Social Objects 20 20 20
  • 21. Research Object model Research Object Core (ro) 23
  • 22. Research Object model Workflow Description (wfdesc) 24
  • 23. Research Object model Workflow Provenance (wfprov) 25
  • 24. The Wf4Ever Proposal Technical Infrastructure • Models • Research Object • Annotation • Provenance • Evolution and Versioning • Semantic Web Encoding • Services • Foundational, Extension, User • APIs, Architecture • Web protocols/services • Principles • Map into standards • Adopt standards • Lightweight components • Ecosystem • Command line • Third party systems 26
  • 25. The Wf4Ever Proposal Services User Clients Extension Services Foundation Services 27
  • 26. Next steps » Analyse decay within all myExperiment workflows › Estimate: Roughly 50% don’t run correctly › But why? Services gone? Did they ever work? › How have the community evolved those workflows? » Service and wf substitutions; recommendations » Provenance analysis and use › e.g. verifiability, replayability » SHIWA approach – can handle “What if the workflow system stops working” 28
  • 27. Thank you! Any Questions? http://www.myexperiment.org/ http://www.wf4ever-project.org/ http://www.mygrid.org.uk/ This work is licensed under the Creative Commons Attribution 3.0 http://www.taverna.org.uk/ Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. 29