Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
http://www.oeg-upm.net/index.php...
Towards Reproducible Science
Introduction
2
HYPOTHESIS CONVINCE
AUDIENCE
REPEATABLE
SCIENTIFIC EXPERIMENTS
Towards Reproducible Science
Introduction
3
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
Alison’s
biodiversity
scientists
Towards Reproducible Science
Introduction
4
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
REPEATABILITY
Alison’s
biodiver...
Towards Reproducible Science 5
 Before continuing….
What does reproducibility
mean for you?
And for your colleagues?
And ...
Towards Reproducible Science
The R* brouhaha
6
Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
7
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
REPLICABILITY
REPRODUCIBILITY
8
Towards Reproducible Science
Experiment components
9
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
10
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
This has at...
Towards Reproducible Science
Block 1. Experimental Protocols
11
Olga Giraldo
Alexander Garcia
Explore alternative ways for...
Towards Reproducible Science
What is an experimental protocol
 Experimental protocols
are like cooking recipes
 They hav...
Towards Reproducible Science
Some of the issues we aim at addressing
• Incubate the
centrifuge tubes in a
water bath.
• In...
Towards Reproducible Science
Bio-ontologies
OBI, EXPO, EXACT, BAO, IAO, ERO…
Data repository
for making data
available
few...
Towards Reproducible Science
Main research question
How to formalize the information from
laboratory protocols as a knowle...
Towards Reproducible Science
Our approach
• Ontology model representing lab protocols
• Gazetteer-based method: use existi...
Towards Reproducible Science
SMART Protocols ontology
17
http://vocab.linkeddata.es/SMARTProtocols/
https://smartprotocols...
Towards Reproducible Science
The SIRO model
Sample/Specimen
(whole organism, anatomical
part, bodily fluids, etc.)
Instrum...
Towards Reproducible Science
Design of semantic Gazetteer and JAPE rules
Design of semantic Gazetteers
• Facilitate the an...
Towards Reproducible Science
Development of a Gold Standard
100 protocols published in
several repositories
Annotators - e...
Towards Reproducible Science
Preliminary results
Entities sample instrument reagent objective
Sample Neural cell 3 0 0 0
n...
Towards Reproducible Science
Our ongoing work
22
 So far, this is ok for handling protocols that have
been already report...
Towards Reproducible Science
Platform for publishing semantic protocols
Features:
 Open semantic publishing platform
o Th...
Towards Reproducible Science
Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols
The platform
Towards Reproducible Science
25
Capturing relevant elements in the document
Towards Reproducible Science
Organisms come from the UniProt Taxon API
26
After selecting
an organism,
the
correspondent
I...
Towards Reproducible Science
Reagents come from the PubChem API
Towards Reproducible Science
Machine processable
workflows
Step
Step
Step
Step
Step
Towards Reproducible Science
Final edited protocol, also available as bioschemas
Towards Reproducible Science
Block 2. Computational Environments
30
Idafen Santana
Is it possible to describe the main pro...
Towards Reproducible Science
Experiment components
31
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
32
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
33
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
34
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
35
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
bundles and relates digital resources of a scientific experiment
or investigation using stand...
Towards Reproducible Science
Experiment components
38
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Open Research Problems
39
Towards Reproducible Science
Open Research Problems
40
 Computational Infrastructures are usually a predefined
element of...
Towards Reproducible Science
Open Research Problems
41
 Computational Infrastructures are usually a predefined
element of...
Towards Reproducible Science
Open Research Problems
42
 Computational Infrastructures are usually a predefined
element of...
Towards Reproducible Science
Representation
43
CLOUD
 Describing execution environments
FORMER
EQUIPMENT
ANNOTATE REPRODU...
Towards Reproducible Science
Representation
 WICUS ontology network
o Workflow Infrastructure Conservation Using Semantic...
Towards Reproducible Science
WICUS ontology network
 WICUS Workflow Execution Requirements ontology
o http://purl.org/net...
Towards Reproducible Science
WICUS ontology network
 WICUS Software Stack ontology
o http://purl.org/net/wicus-stack
46
Towards Reproducible Science
WICUS ontology network
 WICUS Scientific Virtual Appliance ontology
o http://purl.org/net/wi...
Towards Reproducible Science
WICUS ontology network
 WICUS Hardware Specs ontology
o http://purl.org/net/wicus-hwspecs
48
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
49
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
50
Towards Reproducible Science
WICUS system
 Overview, inputs and outputs
51
Towards Reproducible Science
Evaluation
 Workflows reproduced
o 3 scientific domains
o 3 workflow management systems
o 6 ...
Towards Reproducible Science
Evaluation
53
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Evaluation
54
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Evaluation
55
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Evaluation
56
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Evaluation
57
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Evaluation
58
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal...
Towards Reproducible Science
Summarizing
 Two building blocks towards reproducibility of
scientific experiments
o In vivo...
Towards Reproducible Science
Summarizing
 Is this enough?
Clearly not, but a step forward
towards ensuring reproducibilit...
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
Ontology Engineering Group
Unive...
Towards Reproducible Science
Light pollution (www.stars4all.eu)
Upcoming SlideShare
Loading in …5
×

Towards Reproducible Science: a few building blocks from my personal experience

732 views

Published on

Invited keynote given at the Second International Workshop on Semantics for BioDiversity (http://fusion.cs.uni-jena.de/s4biodiv2017/), held in conjunction with ISWC2017 (https://iswc2017.semanticweb.org/)

Published in: Science

Towards Reproducible Science: a few building blocks from my personal experience

  1. 1. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) http://www.oeg-upm.net/index.php/en/researchareas/3- semanticscience/index.html Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  2. 2. Towards Reproducible Science Introduction 2 HYPOTHESIS CONVINCE AUDIENCE REPEATABLE SCIENTIFIC EXPERIMENTS
  3. 3. Towards Reproducible Science Introduction 3 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO Alison’s biodiversity scientists
  4. 4. Towards Reproducible Science Introduction 4 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO REPEATABILITY Alison’s biodiversity scientists
  5. 5. Towards Reproducible Science 5  Before continuing…. What does reproducibility mean for you? And for your colleagues? And for the colleagues from other disciplines?
  6. 6. Towards Reproducible Science The R* brouhaha 6 Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
  7. 7. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION 7
  8. 8. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION REPLICABILITY REPRODUCIBILITY 8
  9. 9. Towards Reproducible Science Experiment components 9 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  10. 10. Towards Reproducible Science Experiment components 10 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO This has attracted most of the attention so far
  11. 11. Towards Reproducible Science Block 1. Experimental Protocols 11 Olga Giraldo Alexander Garcia Explore alternative ways for documenting and retrieving information from experimental protocols Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro A, Corcho O - ICBO, 2015 Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear SMART protocols: semantic representation for experimental protocols. Giraldo O, García-Castro A, Corcho O – Linked Science 2014
  12. 12. Towards Reproducible Science What is an experimental protocol  Experimental protocols are like cooking recipes  They have ingredients: reagents and sample  They have appliances: equipment,  They have a list of instructions, The protocols should have complete information that allows anybody to recreate an experiment.  They have a total time  They have critical steps…
  13. 13. Towards Reproducible Science Some of the issues we aim at addressing • Incubate the centrifuge tubes in a water bath. • Incubate the samples for 5 min with gentle shaking. • Rinse DNA briefly in 1-2 ml of wash. • Incubate at -20C overnight.  some protocols present insufficient granularity,  the instructions can be imprecise or ambiguous due to the use of natural language.  The protocols lack structure
  14. 14. Towards Reproducible Science Bio-ontologies OBI, EXPO, EXACT, BAO, IAO, ERO… Data repository for making data available few efforts focus on representing and standardizing experimental protocols. For reproducibility purposes, if the data must be available, so does the experimental protocol detailing the methodology followed to derive the data. Resources for reporting guidelines or Minimum Information standards Ingredients for Improving Reproducibility
  15. 15. Towards Reproducible Science Main research question How to formalize the information from laboratory protocols as a knowledge base?
  16. 16. Towards Reproducible Science Our approach • Ontology model representing lab protocols • Gazetteer-based method: use existing lists of named entities  Lists of proper nouns, which refer to real-life entities • Rule-based approaches: write manual extraction rules • Development of a Gold Standard of protocols annotated manually
  17. 17. Towards Reproducible Science SMART Protocols ontology 17 http://vocab.linkeddata.es/SMARTProtocols/ https://smartprotocols.github.io/
  18. 18. Towards Reproducible Science The SIRO model Sample/Specimen (whole organism, anatomical part, bodily fluids, etc.) Instruments (equipment, devices, consumables, software) Reagents (chemical compounds, mixtures) Objective (purpose) The SIRO model supports search, retrieval and classification of experimental protocols
  19. 19. Towards Reproducible Science Design of semantic Gazetteer and JAPE rules Design of semantic Gazetteers • Facilitate the annotation of instances related to:  Experimental actions  Instruments  Samples/ organisms  Reagents Design of grammar rules • Facilitate the annotation of instructions
  20. 20. Towards Reproducible Science Development of a Gold Standard 100 protocols published in several repositories Annotators - experts in life sciences http://smart- protocols.labs.linkingdata.io/dist/d ev/#/login The SMART Protocols Annotation Tool Guidelines about What and How annotate Materials: • BioTechniques, • CSH-Protocols, • Current protocols, • Genet and Mol. Res, • Journal of Biolog. Methods, • Jove, • MethodsX, • Nature protocols exchange, • Nature protocols • Curso BIOS 2016, Colombia • Universidad del Valle, Colombia • Japan (Database Center for Life Science (DBCLS), Robotic Biology Institute (RBI), Spiber, Yachie-Lab, University of Tokyo). • Universidad Santiago de Cali, Colombia
  21. 21. Towards Reproducible Science Preliminary results Entities sample instrument reagent objective Sample Neural cell 3 0 0 0 neural stem cells (NSCs) 3 0 0 0 Instrument Cell culture centrifuge 0 3 0 0 cell culture incubator 0 3 0 0 Microscope 0 3 0 0 Millicell culture plate inserts 8-?m pore size 0 3 0 0 reagent B27 supplement 0 0 3 0 DMEM/F12 0 0 3 0 FGF2 neutralizing antibody 0 0 3 0 glucose 0 0 3 0 objective Here we describe two migration assays, a matrigel migration assay and a Boyden chamber migration assay, which allow the in vitro assessment of neural migration under defined conditions (Ladewig, Koch and Brüstle, 2014). 0 0 0 3 entities sample instrument reagent Reagent - Sample/Organism Ac-omega viral DNA 1 2 baculoviral 1 2 DNA insert 2 1 I-Sce I meganuclease 1 2 Sample/Organism Insect cells 3 Instrument spinner 3 Centrifuge 3 Flask 3 Reagent IPL-41 powdered 3 Liposome formulation 3 Phenol:chloroform 3 Fleiss Kappa for 3 raters = 1.0 Fleiss Kappa for 3 raters = 0.755
  22. 22. Towards Reproducible Science Our ongoing work 22  So far, this is ok for handling protocols that have been already reported in papers Can we actually change the way in which these protocols are produced?
  23. 23. Towards Reproducible Science Platform for publishing semantic protocols Features:  Open semantic publishing platform o The protocols are born semantic  Self describing documents o Meaningful entities o Machine procesable workflows  Documents will reference existing URIs o Samples/organisms o Reagents/chemical compounds o Instruments SMART Protocols Ontology / Gazetteers / Grammar rules UniProt NCBI taxonomy PubChem Vendors
  24. 24. Towards Reproducible Science Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols The platform
  25. 25. Towards Reproducible Science 25 Capturing relevant elements in the document
  26. 26. Towards Reproducible Science Organisms come from the UniProt Taxon API 26 After selecting an organism, the correspondent ID is automatically recorded
  27. 27. Towards Reproducible Science Reagents come from the PubChem API
  28. 28. Towards Reproducible Science Machine processable workflows Step Step Step Step Step
  29. 29. Towards Reproducible Science Final edited protocol, also available as bioschemas
  30. 30. Towards Reproducible Science Block 2. Computational Environments 30 Idafen Santana Is it possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization techniques? Conservation of Computational Scientific Execution Environments for Workflow- based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016. http://oa.upm.es/39520/
  31. 31. Towards Reproducible Science Experiment components 31 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  32. 32. Towards Reproducible Science Experiment components 32 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  33. 33. Towards Reproducible Science Experiment components 33 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  34. 34. Towards Reproducible Science Experiment components 34 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  35. 35. Towards Reproducible Science Experiment components 35 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  36. 36. Towards Reproducible Science bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms, “tool middleware” http://www.w3.org/community/rosc/ http://www.researchobject.org/
  37. 37. Towards Reproducible Science Experiment components 38 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  38. 38. Towards Reproducible Science Open Research Problems 39
  39. 39. Towards Reproducible Science Open Research Problems 40  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.
  40. 40. Towards Reproducible Science Open Research Problems 41  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.
  41. 41. Towards Reproducible Science Open Research Problems 42  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.  Current reproducibility approaches for computational experiments consider mostly data and procedure.
  42. 42. Towards Reproducible Science Representation 43 CLOUD  Describing execution environments FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT
  43. 43. Towards Reproducible Science Representation  WICUS ontology network o Workflow Infrastructure Conservation Using Semantics o http://purl.org/net/wicus o 5 ontologies • WICUS Workflow Execution Requirements ontology • WICUS Software Stack ontology • WICUS Hardware Specs ontology • WICUS Scientific Virtual Appliance ontology • WICUS Ontology: links the previous ontologies 44
  44. 44. Towards Reproducible Science WICUS ontology network  WICUS Workflow Execution Requirements ontology o http://purl.org/net/wicus-reqs 45
  45. 45. Towards Reproducible Science WICUS ontology network  WICUS Software Stack ontology o http://purl.org/net/wicus-stack 46
  46. 46. Towards Reproducible Science WICUS ontology network  WICUS Scientific Virtual Appliance ontology o http://purl.org/net/wicus-sva 47
  47. 47. Towards Reproducible Science WICUS ontology network  WICUS Hardware Specs ontology o http://purl.org/net/wicus-hwspecs 48
  48. 48. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 49
  49. 49. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 50
  50. 50. Towards Reproducible Science WICUS system  Overview, inputs and outputs 51
  51. 51. Towards Reproducible Science Evaluation  Workflows reproduced o 3 scientific domains o 3 workflow management systems o 6 different workflows 52 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST (2003) (2014)(2014) (2015) (2011)(2011)
  52. 52. Towards Reproducible Science Evaluation 53 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results FORMER EQUIPMENT ANNOTATE REPRODUCE CLOU D EQUIVALENT EXECUTION ENVIRONMENTSEMANTIC ANNOTATIONS COMPARE
  53. 53. Towards Reproducible Science Evaluation 54 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  54. 54. Towards Reproducible Science Evaluation 55 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Non-deterministic • Standard and error output • Generated files equivalent
  55. 55. Towards Reproducible Science Evaluation 56 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Same results • Results from Int. Extinction may vary
  56. 56. Towards Reproducible Science Evaluation 57 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Genomic data • Exact match
  57. 57. Towards Reproducible Science Evaluation 58 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  58. 58. Towards Reproducible Science Summarizing  Two building blocks towards reproducibility of scientific experiments o In vivo/vitro • Focus on providing structured descriptions of methods (laboratory protocols) • Our tools: ontologies, gazeteers, NLP tools and automatic and manual annotation tools • Challenge: make protocols be more structured (and semantic) from the beginning o In silico • Focus on the equipment (computational infrastructure) for workflow-based experiments • Ontologies, automatic and manual annotation tools, and an execution environment • Challenge: keep track of all types of appliances, and make scientists work on providing annotations  Is this enough? 59
  59. 59. Towards Reproducible Science Summarizing  Is this enough? Clearly not, but a step forward towards ensuring reproducibility (with a focus on methods) 60
  60. 60. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  61. 61. Towards Reproducible Science Light pollution (www.stars4all.eu)

×