Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data models for digital preservation and
publishing beyond the PDF
Jun Zhao, Mark Thompson, Kristina Hettne,
Stian Soiland...
Recording your computational steps…
Bioinformaticians
have no labbooks!
and no training on
digital notekeeping
http://grae...
State of the art study capture?
How then?
Workflows encapsulate in silico analysis
http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.p...
5
Components to understand an experiment
Is a workflow enough?
Workflow:
Which biological
pathways explain the
association...
6
Components to understand an experiment
Is a workflow enough?
Workflow:
Which biological
pathways explain the
association...
Research Object
Data
Method/Experi
mental protocol
Findings
Types of resources
ISA-TAB/ISA2OWL
Nanopublication
ISA-TAB/ISA...
Research Object Model
Preservation for understanding
Preserve at least the:
– Hypothesis
– A workflow-like sketch
– One or...
9
Fame and Glory
It was
me, me,
me!
What I
found
How I
found
it
HDAC1 interacts with Parvb
Discovered by: me
Nanopublicati...
Prototyping the models
• Create: myExperiment
• Better: Checklist service
• Evolution: Digital Library software
• Curation...
myExperiment
- create Research Objects
Prototyping the Research Object Data Model in
Checklist service
- make better Research Objects
Prototyping the Research Object Data Model in
http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
RELEASE!  http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
Digital Library software
- evolution of a Research Object
Prototyping the Research Object Data Model in
Research Object ‘under construction’
Snaphots to record intermediate states
Full copy ‘Ready for Release’
Quality Monitoring Service
- Long term curation
Prototyping the Research Object Data Model in
Landmark Claim Tool
- mark and credit the first discovery
Prototyping the Nanopublication Model
Landmark Claim Tool
Core data
Attribution
Qualification
Applications from private partners
- Robust tools for business stakeholders
Prototyping the Nanopublication Model
Nanopublication applications
Euretos Company
Copyright Euretos b.v. 2013
48
Releases planned for 2014
Some gory detail
Data models ‘under the hood’
Research Object Model at a glance
50
Research
Object
Resource
Resource
Resource
Annotation
Annotation
Annotation
oa:hasTar...
Extensions
Wf4Ever architecture
52
Semantic REST API
RDF triple store
(RO structure,
Annotations)
RO index
Uploaded files
PortalCheck...
Nanopublication Data Model
Assertion
Nanopublication URL
Provenance PublicationInfo
assertio
n
opm:
was
Derived
From
opm:
...
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-
Info
SoapDenovo 2
increases correct
assembly length b...
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-
Info
SoapDenovo 2
increases correct
assembly length b...
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-
Info
SoapDenovo 2
increases correct
assembly length b...
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-
Info
SoapDenovo 2
increases correct
assembly length b...
Community effort
• Research Objects
http://researchobjects.org/
http://wf4ever-project.org/
• Nanopublication
http://Nanop...
W3C community group for RO
http://www.w3.org/community/rosc/
Conclusions (1/2)
• Applications of RO and Nanopublication data
models to capture the bioinformatics research
process ‘bey...
Conclusions (2/2)
• Reference implementations / first to adopt:
myExperiment, DLibra, Checklist service,
Curation/monitori...
THANK YOU FOR YOUR ATTENTION
http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/
Research Object Communit...
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Data models for preserving and publishing digital research material beyond the PDF
Upcoming SlideShare
Loading in …5
×

Data models for preserving and publishing digital research material beyond the PDF

1,204 views

Published on

Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).

Published in: Lifestyle, Education, Technology
  • Be the first to comment

  • Be the first to like this

Data models for preserving and publishing digital research material beyond the PDF

  1. 1. Data models for digital preservation and publishing beyond the PDF Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera, Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend Mons The Workflow Forever project (EU FP7 nr. 270192), Digital Libraries and Digital Preservation. (ICT-2009.4.1)
  2. 2. Recording your computational steps… Bioinformaticians have no labbooks! and no training on digital notekeeping http://graemefielder.wordpress.com/2010/09/17/lab-books-evolution-required/
  3. 3. State of the art study capture?
  4. 4. How then? Workflows encapsulate in silico analysis http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.php?img=2743669_1471-2105-10-252-2&req=4
  5. 5. 5 Components to understand an experiment Is a workflow enough? Workflow: Which biological pathways explain the associations? Interpret results (Interaction pathways in the cell) Research Question Genome Wide Association Studies (GWAS) In 1000+ people: which gene mutations are associated with metabolic syndrome, and why? Download data - External DB - Existing Knowledge Hypothesis Genes involved in inflammation pathways are involved in the onset of metabolic syndrome.
  6. 6. 6 Components to understand an experiment Is a workflow enough? Workflow: Which biological pathways explain the associations? Interpret results (Interaction pathways in the cell) Research Question Genome Wide Association Studies (GWAS) In 1000+ people: which gene mutations are associated with metabolic syndrome, and why? Download data - External DB - Existing Knowledge Hypothesis Genes involved in inflammation pathways are involved in the onset of metabolic syndrome.
  7. 7. Research Object Data Method/Experi mental protocol Findings Types of resources ISA-TAB/ISA2OWL Nanopublication ISA-TAB/ISA2OWL Wfdesc Data Models Capture more than workflows
  8. 8. Research Object Model Preservation for understanding Preserve at least the: – Hypothesis – A workflow-like sketch – One or more workflows – Input data – Workflow runs – Results – Conclusion My Research Book
  9. 9. 9 Fame and Glory It was me, me, me! What I found How I found it HDAC1 interacts with Parvb Discovered by: me Nanopublication Assertion Provenance of Assertion Metadata of nanopublication
  10. 10. Prototyping the models • Create: myExperiment • Better: Checklist service • Evolution: Digital Library software • Curation: Quality Monitoring Service • Credit original assertions: LandMark Tool • Applications by private partners
  11. 11. myExperiment - create Research Objects Prototyping the Research Object Data Model in
  12. 12. Checklist service - make better Research Objects Prototyping the Research Object Data Model in
  13. 13. http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  14. 14. http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  15. 15. RELEASE!  http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
  16. 16. Digital Library software - evolution of a Research Object Prototyping the Research Object Data Model in
  17. 17. Research Object ‘under construction’
  18. 18. Snaphots to record intermediate states
  19. 19. Full copy ‘Ready for Release’
  20. 20. Quality Monitoring Service - Long term curation Prototyping the Research Object Data Model in
  21. 21. Landmark Claim Tool - mark and credit the first discovery Prototyping the Nanopublication Model
  22. 22. Landmark Claim Tool Core data Attribution Qualification
  23. 23. Applications from private partners - Robust tools for business stakeholders Prototyping the Nanopublication Model
  24. 24. Nanopublication applications Euretos Company Copyright Euretos b.v. 2013 48 Releases planned for 2014
  25. 25. Some gory detail Data models ‘under the hood’
  26. 26. Research Object Model at a glance 50 Research Object Resource Resource Resource Annotation Annotation Annotation oa:hasTarget Resource Resource Annotation graph oa:hasBody ore:aggregates Manifest For more information and extensions (Evolution model, MINIM) see http://wf4ever-project.org/
  27. 27. Extensions
  28. 28. Wf4Ever architecture 52 Semantic REST API RDF triple store (RO structure, Annotations) RO index Uploaded files PortalChecklist service Command line Workflow runner ...
  29. 29. Nanopublication Data Model Assertion Nanopublication URL Provenance PublicationInfo assertio n opm: was Derived From opm: wasGene- ratedBy this nanopub dcterms: created pav: authored- By associa- tion a sio:statis- ticalAssociation sio:has- measurem entValue Association_1_ p_value a Sio:probability- value sio:has-value 6.56e-5 ^^xsd:float sio: refers-to dcterms: DOI … Integrity Key An Individual association between concepts: • statement or declaration • measurement • hypothetical inference • quantitative or qalitative Guarantee immutability after publication Unique, persistent and resolvable identifier How this assertion came to be, methods, evidence, context, etc. • Detailed attribution for authors, institutions, lab technicians, curators • License info • Publication date
  30. 30. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results slides hypothesis Research object can link to a nanopub as an experimental result ro:aggregates
  31. 31. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results slides hypothesis Nanopublication gains detailed workflow provenance by linking to RO ro:aggregates rdf:describedBy
  32. 32. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results ro:aggregates slides hypothesis Extend your provenance! E.g. link the claim to the original data elements from which it was derived rdf:describedBy
  33. 33. Assertion http://www.store.net/mynanopub.rdf Provenance Publication- Info SoapDenovo 2 increases correct assembly length by 3-80 times over Soapdenovo 1 pav:authoredBy dc:rights dc:created A Galaxy workflow results ro:aggregates slides hypothesis ? rdf:describedBy
  34. 34. Community effort • Research Objects http://researchobjects.org/ http://wf4ever-project.org/ • Nanopublication http://Nanopub.org/ • ISA-tools http://www.isa-tools.org/ • Research Objects Community Group at W3C http://w3.org/community/rosc
  35. 35. W3C community group for RO http://www.w3.org/community/rosc/
  36. 36. Conclusions (1/2) • Applications of RO and Nanopublication data models to capture the bioinformatics research process ‘beyond the PDF’ • Data models: ISA, Research Objects, Nanopublications
  37. 37. Conclusions (2/2) • Reference implementations / first to adopt: myExperiment, DLibra, Checklist service, Curation/monitoring, Landmark tool • Private partners developing stable nanopublication applications • Prevent perfectionism of the developers: get involved now!
  38. 38. THANK YOU FOR YOUR ATTENTION http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/ Research Object Community group at W3C: http://w3.org/community/rosc

×