Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Reproducibility Using Semantics:
An Overview
Dagstuhl Seminar
Jan 2016
Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez,
...
The Research Method in different disciplines
2
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIEN...
Some problems in lab protocols
 some of them present
insufficient granularity,
 the instructions can be
imprecise or amb...
Currently…
Semi-structured information
Unstructured information
How to formalize the information from laboratory
protocols...
Semantic annotation
The Protocol as a document
sp:application of the protocol
sp:advantage of the protocol
sp:limitation o...
The Research Method in different disciplines
6
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIEN...
Vocabularies and methodologies for representing and publishing
workflows
7
Interactive
Browsing
(Pubby frontend)
Programat...
The Research Method in different disciplines
8
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIEN...
Pegasus
Montage
SoyKB
Epigenomics
CLOUD
Reproducibility of Computational Scientific Experiments
9
FORMER
EQUIPMENT
ANNOTAT...
Some results
• Pegasus Montage Workflow
• Astronomy workflow
• Construct large image mosaics of the sky
• Montage Software...
The Research Method in different disciplines
11
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIE...
Research Objects
ROs as web pages http://rohub.linkeddata.es/
ROs as part of a Linked Data Platform (alpha): http://purl.o...
How to preserve Workflows/Research Objects?
13
Three main ways/levels:
•Descriptive reproducibility
•Documentation
•Workfl...
Some examples
14
Levels of reproducibility
Workflow conservation Plan
Intellectual property rights
15
Visit http://licensius.com!
Acknowledgements
• The Semantic e-Science team at UPM
• Carlos Badenes
• Daniel Garijo
• Olga Giraldo
• Rafael González-Ca...
Upcoming SlideShare
Loading in …5
×

Reproducibility Using Semantics: An Overview

538 views

Published on

Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041

Published in: Education
  • Be the first to comment

  • Be the first to like this

Reproducibility Using Semantics: An Overview

  1. 1. Reproducibility Using Semantics: An Overview Dagstuhl Seminar Jan 2016 Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez, Victor Rodriguez Doncel, Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain
  2. 2. The Research Method in different disciplines 2 INPUT DATA LABORATORY PROTOCOL EQUIPMENT INVIVO/VITROINSILICO DATASET SCIENTIFIC WORKFLOW INFRASTRUCTURE
  3. 3. Some problems in lab protocols  some of them present insufficient granularity,  the instructions can be imprecise or ambiguous due to the use of natural language. • Incubate the centrifuge tubes in a water bath. • Incubate the samples for 5 min with gentle shaking. • Rinse DNA briefly in 1-2 ml of wash. • Incubate at -20C overnight. 3
  4. 4. Currently… Semi-structured information Unstructured information How to formalize the information from laboratory protocols as a knowledge base? NLP tools + Ontologies 4
  5. 5. Semantic annotation The Protocol as a document sp:application of the protocol sp:advantage of the protocol sp:limitation of the protocol sp:provenance of the protocol sp:purpose of the protocol sp:introduction section sp:buffer list sp:equipment and supplies list sp:kit list sp:primer list sp:reagent list sp:software list sp:solution list sp:materials section exact:caution sp:critical step sp:hint sp:pause point sp:storage condition sp:timing sp:troubleshooting sp:methods section sp:experimental protocol iao:document iao:document part iao:textual entity iao:data set owl:subClassOf ro:hasPart ro:partOf owl:subClassOf owl:subClassOfowl:subClassOf ro:hasPart ro:hasPart ro:hasPart ro:partOf ro:partOf ro:partOf owl:subClassOf owl:subClassOf exact:alert message owl:subClassOf sp:basic step of DNA extraction p-plan:Step p-plan:Variable sp:cell disruption sp:plant tissue Basic Steps of DNA Extraction sp:DNA purification obi:DNA extract p-plan:hasInputVariable p-plan:hasOutputVariable p-plan:hasOutputVariable owl:subClassOf sp:digestion reaction sp:powdered tissue owl:subClassOf owl:subClassOf owl:subClassOf p-plan:hasInputVariable sp:digested contaminant p-plan:hasInputVariable p-plan:hasOutputVariable owl:subClassOfowl:subClassOfowl:subClassOfowl:subClassOf bfo:isPrecededBy bfo:isPrecededBy SMART Protocols ontology is available here: http://vocab.linkeddata.es/SMARTProtocols/ GATE Smart Protocols 5
  6. 6. The Research Method in different disciplines 6 INPUT DATA LABORATORY PROTOCOL EQUIPMENT INVIVO/VITROINSILICO DATASET SCIENTIFIC WORKFLOW INFRASTRUCTURE
  7. 7. Vocabularies and methodologies for representing and publishing workflows 7 Interactive Browsing (Pubby frontend) Programatic access (external apps) Wings workflow generation OPM/PROV conversion Publication Share Reuse Core Portal WINGS on local laptop Workflow Template Workflow Instance PROV export Core Portal WINGS on shared host Workflow Template Workflow Instance PROV export Core Portal WINGS on web server Workflow Template Workflow Instance PROV export Linked Data Publication Users Other workflow environments RDF TripleStore Workflow Provenance Workflow Plan Methodology for workflow publishing Repository of linked workflows: http://www.opmw.org/sparql http://purl.org/net/p-plan http://www.opmw.org/ontology/ Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56. Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012. 7
  8. 8. The Research Method in different disciplines 8 INPUT DATA LABORATORY PROTOCOL EQUIPMENT INVIVO/VITROINSILICO DATASET SCIENTIFIC WORKFLOW INFRASTRUCTURE
  9. 9. Pegasus Montage SoyKB Epigenomics CLOUD Reproducibility of Computational Scientific Experiments 9 FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT Dispel4Py Internal Extinction Seismic Cross Correlation Makeflow Blast
  10. 10. Some results • Pegasus Montage Workflow • Astronomy workflow • Construct large image mosaics of the sky • Montage Software distribution • 59 binaries • Target IaaS Cloud Providers • Amazon EC2 & Futuregrid • Vagrant 10 RO available at http://pegasus.isi.edu/publications/reppar
  11. 11. The Research Method in different disciplines 11 INPUT DATA LABORATORY PROTOCOL EQUIPMENT INVIVO/VITROINSILICO DATASET SCIENTIFIC WORKFLOW INFRASTRUCTURE + CONTEXT!
  12. 12. Research Objects ROs as web pages http://rohub.linkeddata.es/ ROs as part of a Linked Data Platform (alpha): http://purl.org/net/ldp4ro 12
  13. 13. How to preserve Workflows/Research Objects? 13 Three main ways/levels: •Descriptive reproducibility •Documentation •Workflow execution reproducibility •Can we run the workflow? •Workflow results reproducibility •Can we get the same results? Checklists! •Corcho et al: Checklist for workflow conservation. •http://dx.doi.org/10.6084/m9.figshare.1285011 •40 different aspects •Documentation •Goals •Results •Metadata •Corcho et al: Checklist for a workflow conservation plan •http://dx.doi.org/10.6084/m9.figshare.1285012 •Based on the DCC’s data management plan
  14. 14. Some examples 14 Levels of reproducibility Workflow conservation Plan
  15. 15. Intellectual property rights 15 Visit http://licensius.com!
  16. 16. Acknowledgements • The Semantic e-Science team at UPM • Carlos Badenes • Daniel Garijo • Olga Giraldo • Rafael González-Cabero • Idafen Santana • Victor Rodriguez Doncel • The Wf4Ever team • Carole Goble, José Manuel Gómez Pérez, Raúl Palma, Jun Zhao, Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera • The Research Object community group • http://www.researchobject.org/ 16

×