Data models for preserving and publishing digital research material beyond the PDF
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data models for preserving and publishing digital research material beyond the PDF

on

  • 984 views

Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when ...

Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).

Statistics

Views

Total Views
984
Views on SlideShare
984
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Attribution is part of the RO model and myExperiment, but we are also developing something specifically to address this aspect of digital preservation and publishing… Nanopublications
  • http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • expected (based on previous knowledge) or serendipitous result/finding?
  • So let’s have a look at what a Research Object looks like. The core is the concept of the Research Object itself, which you may also known as an ORE aggregation. This is described by the manifest, which is simply an RDF file. The RO aggregates a series of resources – in Linked Data these could be anywhere in the world. Additionally it aggregates a set of annotations, which we know is the link between a target resource (here aggregated in the RO), and an body resource. In Wf4Ever we typically provide the body as a separate RDF Graph, so that we can use existing vocabularies to describe and relate the resources.
  • new schemalandmark screenshotworkflow hypothese sketch
  • new schemalandmark screenshotworkflow hypothese sketch
  • new schemalandmark screenshotworkflow hypothese sketch
  • new schemalandmark screenshotworkflow hypothese sketch
  • So we have recently formed a W3C Community Group for Research Object, which has gathered significant interest, 75 participant. As you see, I am one of the chairs, and so is Rob which you already know from OA group. We are just starting up, and our focus in the RO community group is rather how to practically use Research Objects as a concept than to specify a new model – we’ll refer to existing models where it’s appropriate, but also explore other models which could be described as research objects.

Data models for preserving and publishing digital research material beyond the PDF Presentation Transcript

  • 1. Data models for digital preservation and publishing beyond the PDF Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera, Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend Mons The Workflow Forever project (EU FP7 nr. 270192), Digital Libraries and Digital Preservation. (ICT-2009.4.1)
  • 2. Recording your computational steps… Bioinformaticians have no labbooks! and no training on digital notekeeping http://graemefielder.wordpress.com/2010/09/17/lab-books-evolution-required/
  • 3. State of the art study capture?
  • 4. How then? Workflows encapsulate in silico analysis http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.php?img=2743669_1471-2105-10-252-2&req=4
  • 5. 5 Components to understand an experiment Is a workflow enough? Workflow: Which biological pathways explain the associations? Interpret results (Interaction pathways in the cell) Research Question Genome Wide Association Studies (GWAS) In 1000+ people: which gene mutations are associated with metabolic syndrome, and why? Download data - External DB - Existing Knowledge Hypothesis Genes involved in inflammation pathways are involved in the onset of metabolic syndrome.
  • 6. 6 Components to understand an experiment Is a workflow enough? Workflow: Which biological pathways explain the associations? Interpret results (Interaction pathways in the cell) Research Question Genome Wide Association Studies (GWAS) In 1000+ people: which gene mutations are associated with metabolic syndrome, and why? Download data - External DB - Existing Knowledge Hypothesis Genes involved in inflammation pathways are involved in the onset of metabolic syndrome.
  • 7. Research Object Data Method/Experi mental protocol Findings Types of resources ISA-TAB/ISA2OWL Nanopublication ISA-TAB/ISA2OWL Wfdesc Data Models Capture more than workflows
  • 8. Research Object Model Preservation for understanding Preserve at least the: – Hypothesis – A workflow-like sketch – One or more workflows – Input data – Workflow runs – Results – Conclusion My Research Book
  • 9. 9 Fame and Glory It was me, me, me! What I found How I found it HDAC1 interacts with Parvb Discovered by: me Nanopublication Assertion Provenance of Assertion Metadata of nanopublication
  • 10. Prototyping the models • Create: myExperiment • Better: Checklist service • Evolution: Digital Library software • Curation: Quality Monitoring Service • Credit original assertions: LandMark Tool • Applications by private partners
  • 11. myExperiment - create Research Objects Prototyping the Research Object Data Model in
  • 12. Checklist service - make better Research Objects Prototyping the Research Object Data Model in
  • 13. http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • 14. http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • 15. RELEASE!  http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  • 16. Digital Library software - evolution of a Research Object Prototyping the Research Object Data Model in
  • 17. Research Object ‘under construction’
  • 18. Snaphots to record intermediate states
  • 19. Full copy ‘Ready for Release’
  • 20. Quality Monitoring Service - Long term curation Prototyping the Research Object Data Model in
  • 21. Landmark Claim Tool - mark and credit the first discovery Prototyping the Nanopublication Model
  • 22. Landmark Claim Tool Core data Attribution Qualification
  • 23. Applications from private partners - Robust tools for business stakeholders Prototyping the Nanopublication Model
  • 24. Nanopublication applications Euretos Company Copyright Euretos b.v. 2013 48 Releases planned for 2014
  • 25. Some gory detail Data models ‘under the hood’
  • 26. Research Object Model at a glance 50 Research Object Resource Resource Resource Annotation Annotation Annotation oa:hasTarget Resource Resource Annotation graph oa:hasBody ore:aggregates Manifest For more information and extensions (Evolution model, MINIM) see http://wf4ever-project.org/
  • 27. Extensions
  • 28. Wf4Ever architecture 52 Semantic REST API RDF triple store (RO structure, Annotations) RO index Uploaded files PortalChecklist service Command line Workflow runner ...
  • 29. Nanopublication Data Model Assertion Nanopublication URL Provenance PublicationInfo assertio n opm: was Derived From opm: wasGene- ratedBy this nanopub dcterms: created pav: authored- By associa- tion a sio:statis- ticalAssociation sio:has- measurem entValue Association_1_ p_value a Sio:probability- value sio:has-value 6.56e-5 ^^xsd:float sio: refers-to dcterms: DOI … Integrity Key An Individual association between concepts: • statement or declaration • measurement • hypothetical inference • quantitative or qalitative Guarantee immutability after publication Unique, persistent and resolvable identifier How this assertion came to be, methods, evidence, context, etc. • Detailed attribution for authors, institutions, lab technicians, curators • License info • Publication date
  • 30. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results slides hypothesis Research object can link to a nanopub as an experimental result ro:aggregates
  • 31. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results slides hypothesis Nanopublication gains detailed workflow provenance by linking to RO ro:aggregates rdf:describedBy
  • 32. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results ro:aggregates slides hypothesis Extend your provenance! E.g. link the claim to the original data elements from which it was derived rdf:describedBy
  • 33. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results ro:aggregates slides hypothesis ? rdf:describedBy
  • 34. Community effort • Research Objects http://researchobjects.org/ http://wf4ever-project.org/ • Nanopublication http://Nanopub.org/ • ISA-tools http://www.isa-tools.org/ • Research Objects Community Group at W3C http://w3.org/community/rosc
  • 35. W3C community group for RO http://www.w3.org/community/rosc/
  • 36. Conclusions (1/2) • Applications of RO and Nanopublication data models to capture the bioinformatics research process ‘beyond the PDF’ • Data models: ISA, Research Objects, Nanopublications
  • 37. Conclusions (2/2) • Reference implementations / first to adopt: myExperiment, DLibra, Checklist service, Curation/monitoring, Landmark tool • Private partners developing stable nanopublication applications • Prevent perfectionism of the developers: get involved now!
  • 38. THANK YOU FOR YOUR ATTENTION http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/ Research Object Community group at W3C: http://w3.org/community/rosc