  1. Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI meet-up Manchester 2013-01-17
  2. Agenda » Preserving digital science » The Research Object » Anatomy » Lifecycle » Wf4Ever Tools » Future developments 2
  3. Computation Processes in Today’s Research » Research is being conducted in increasingly digital and online environment » This has led to the emergence of new digital artifacts » In some respects, these objects can be regarded as data » However, some objects include the description of the research method that is captured as a computational process » Such processes encapsulate the knowledge related to the generation, (re)use and general transformation of data in experimental sciences Raw data Results Computational process 3
  4. Scientific Workflow In this work, we focus on a particular kind of computational processes called scientific workflows » A scientific workflow is a precise, executable description of a scientific procedure - a series of analysis operations connected using data links » Each operation represents the execution of a computational process » Can be supplied by independently developed web services » Can also use existing data sources that are accessible on the Web 4
  5. Preservation Challenges Challenges deal with their executable aspects and their vulnerability to the volatility of the resources required for their execution » Changes by 3rd parties » Workflow may produce different lists at different times » Workflow may become inoperable » Workflow decay – The execution of the workflow may fail or yield different results, due to dependencies on resources and services subject to independent changes, e.g., EMBL-EBI. Even workflows that depend on local resources are vulnerable. 5
  6. Repeat Reproduce Within Lab Between Labs Materials Publication Materials Methods Methods Data Instruments Instruments Models, Techniques, Algorithms Laboratory Laboratory Replicate / Repeat Provenance Reproduce Exactly replicate the original Attribution Run experiment with experiment and experimental Credit differences in experimental conditions. Eliminate change. conditions.. Compare to test Observe. for same result. Observe. Context Investigation Study Experiment Capture Curate Discover Use Reuse Preserve
  7. RO Architecture is Hourglass Astronomy, Biology, services/protocols Viewing, collaboration services/protocols Provenance, Versioning, Mim services ROs structured packages Exchange services (media specific) Storage services (media specific)
  8. From Electronic papers to Research objects Scientists Hypothesis Experiments Annotations Research Object Electronic Results paper Provenance Datasets 8
  10. Research Object: A user scenario 10
  11. Why research objects?  A research object aggregates all elements deemed necessary to understand research investigations  Promote reuse, sharing  Enable the verification of reproducibility of the results  Trackable, versionable, referenceable 11
  12. Anatomy of a research object ore:aggregates ore:describes ro:Resource ro:Manifest ro:ResearchObject ore:proxyFor ore:aggregates ro:annotatesAggregatedResource ro:FolderEntry Subclass of ore:proxyIn ro:SemanticAnnotation ro:Folder ao:body RDF file 12
  13. Grounding Workflow-centric Research Objects Using Semantic Technologies  Workflow-centric research objects are encoded using RDF, according to a set of ontologies that are publicly available  Research objects extend the Object Exchange and Reuse (ORE) model, to represent aggregation. ORE 13
  14. Grounding Workflow-centric Research Objects Using Semantic Technologies  We use the Annotation Ontology (AO) to annotate research object resources and their relationships. 14
  15. Relating resources in research object Results Workflow_16 QTL produces Included in Included in Feeds into Published in Logs produces Included in Metadata Included in Paper Slides Published in produces Common pathways Results Workflow_13 The provenance of the RO elements is key to understanding, comparing and debugging scientific workflows and to verifying the validity of a claim made within the context of a RO 15
  16. Evolution of a research object Live RO Live RO Scientist My supervisor calls me Reviews received My supervisor calls me to A new PhD student again and we decide to and final version report my work continues my work publish our RO+paper published <<copy>> <<copy>> <<copy, filter and curate>> <<copy>> <<versionOf>> Scientist RO snapshot RO snapshot <<versionOf>> Identified by a URI Identified by a URI Some metadata Some metadata Some curation Some curation Mostly private (for my group Mostly private (for my group) and for paper reviewers) Identified by a URI Librarian/Curator Good metadata Archived RO and curation 16 Mostly public
  17. PROV standard - Basis for evolution model Candidate Recommendation 17
  18. Wf4Ever Tools Customizable preservability checklists 18
  19. Wf4Ever Tools Portal: Browsing and annotating 19
  20. Wf4Ever Tools Command line tools, Client libraries 20
  21. Wf4Ever Tools Specifications and APIs 21
  22. Current Status and Ongoing Work  Models/spec v0.1 public: - Upcoming revision v0.2: (Q1 2013) • Minor additions to workflow model terms • “RO Terms” – Upper user level view of RO: hypothesis, results – many are “shortcuts” for structured model - TODO: Update annotation model to Open Annotation Data Model (OAC) - TODO: PAV for detailed authorship provenance  Showing, managing and sharing of Research Objects through myExperiment web site [3] 22 22
  23. Open Annotation Data Model Community Draft “Almost final” spec: 2013-01-28 Roll out meeting in Manchester: March 2013 23
  24. myExperiment RO support 24
