From peer-reviewed to peer-reproduced:
a role for research objects in scholarly
publishing in the life
sciences
Alejandra ...
"AGBell Notebook" by Alexander Graham Bell. (d. 1922) -
page 40-41 of Alexander Graham Bell Family Papers in the Library o...
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
experi...
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
• open peer-review
• availability of
• data
• analysis scripts
• documentation
Evaluation of SOAPdenovo2 tool for the de n...
Experimental Description
Experimental Description
EXCELERATE interoperability component
http://www.ncbi.nlm.nih.gov/books/NBK279831/
http://elixir-...
genome
assembly
algorithm
genome
size
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational...
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
Pre...
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bac...
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bac...
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bac...
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifier...
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifier...
Reproducing SOAPdenovo2 results
with Galaxy workflows
S. aureus pipeline
Reproducing SOAPdenovo2 results
with Galaxy workflows
S. aureus pipeline
2241 400
30
119.0 11 106 24 68
0
Reproducing SOAPdenovo2 results
with Galaxy workflows
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured da...
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured da...
“genome coverage increased
over the human data when
comparing SOAPdenovo2
against SOAPdenovo1”
Link conclusions
to
experim...
http://www.researchobject.org/
Aggregation and workflow preservation as
ResearchObject: enables the aggregation of the digi...
http://isa-tools.github.io/soapdenovo2
Aggregation and workflow preservation as
http://www.researchobject.org/
From narrative to self-described structured data
Model & workflow assisted experimental description and review process
Dept...
Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of Hong Kong
Tak-wah Lam, University of Hong Kong
SOA...
Questions?
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
@...
From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences
Upcoming SlideShare
Loading in …5
×

From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences

1,315 views

Published on

The reproducibility of science in the digital age is attracting a lot of attention and concerns from the scientific community, where studies have shown the inability to reproduce results due to a variety of reasons, ranging from unavailability of the data to lack of proper descriptions of the experimental steps.
Multiple research object models have been proposed to describe different aspects of the research process. Investigation/Study/Assay (ISA) is a widely used general-purpose metadata tracking framework with an associated suite of open-source software, which offers a rich description of the experiment’s hypotheses and design, investigators involved, experimental factors, protocols applied. The information is organised in a three-level hierarchy where ’Investigation’ provides the project context for a ’Study’ (a research question), which itself contains one or more ’Assays’ (taking analytical measurements and key data processing and analysis steps). Nanopublication (NP) is a research object model which enables specific scientific assertions, such as the conclusions of an experiment, to be annotated with supporting evidence, published and cited. Lastly, the Research Object (RO) is a model that enables the aggregation of the digital resources contributing to findings of computational research, including results, data and software, as citable compound digital objects.
For computational reproducibility, platforms such as Taverna and Galaxy are popular and efficient ways to represent the data analysis steps in the form of reusable workflows, where the data transformations can be specified and executed in an automatic way.
In this presentation, we will address the question of whether such research object models and workflow representation frameworks can be used to assist in the peer review process, by facilitating evaluation of the accuracy of the information provided by scientific articles with respect to their repeatability.
Our case study is based on an article on a genome assembler algorithm published in GigaScience, but due to the proven use of the respective research object models in their respective communities, we argue that the combination of models and workflow system will improve the scholarly publishing process, making science peer-reproduced.

Published in: Science
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total views
1,315
On SlideShare
0
From Embeds
0
Number of Embeds
246
Actions
Shares
0
Downloads
13
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences

  1. 1. From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences Alejandra González-Beltrán Oxford e-Research Centre, University of Oxford -ontology.org Bioinformatics Open Source Conference (BOSC), Dublin, Ireland July 10-11 2015
  2. 2. "AGBell Notebook" by Alexander Graham Bell. (d. 1922) - page 40-41 of Alexander Graham Bell Family Papers in the Library of Congress' Manuscript Division. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg#/media/File:AGBell_Notebook.jpg http://petcaretips.net/bonding-rabbit-to-pets.html Many things have been said about the challenges of science reproducibility and how it can go wrong… Difficulties when the description of the experimental steps is only available in lab notebooks and scientific articles; lack of data, lack of software tools required for analysis
  3. 3. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How? experimental description (design & steps) conclusions computational workflows aggregation & workflow preservation
  4. 4. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  5. 5. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  6. 6. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  7. 7. • open peer-review • availability of • data • analysis scripts • documentation Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler. pre-publication history https://github.com/aquaskyline/SOAPdenovo2 http://sourceforge.net/projects/soapdenovo2/
  8. 8. Experimental Description
  9. 9. Experimental Description EXCELERATE interoperability component http://www.ncbi.nlm.nih.gov/books/NBK279831/ http://elixir-uk.org/interoperability-infrastructure
  10. 10. genome assembly algorithm genome size Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case
  11. 11. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case
  12. 12. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) 3x3 factorial design 9 study groups The experimental plan - computational case
  13. 13. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case S. aureus R. sphaeroides B. impatiens Chinese Han genome (orYH genome)
  14. 14. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case Response Variables (with units) genome coverage (%) computation run time (h) peak memory consumption (Gb) contig N50 (kb or bp) scaffold N50 (kb or bp) number of errors
  15. 15. The experimental steps Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); we suggest a dedicated article section Experimental workflows - identification of processes, their inputs and outputs Experimental design: identify experimental goal, independent and response variables
  16. 16. The experimental steps Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); dedicated article section Experimental workflows - identification of processes, their inputs and outputs Experimental design: identify experimental goal, independent and response variables
  17. 17. Reproducing SOAPdenovo2 results with Galaxy workflows S. aureus pipeline
  18. 18. Reproducing SOAPdenovo2 results with Galaxy workflows S. aureus pipeline
  19. 19. 2241 400 30 119.0 11 106 24 68 0 Reproducing SOAPdenovo2 results with Galaxy workflows
  20. 20. Publishing findings as nanopublications assertion provenance publication info nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity
  21. 21. Publishing findings as nanopublications assertion provenance publication info nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity Abstract & Conclusions assertion provenance Generation of nanopublications for all the results of the response variables NanoMaton templates for nanopublications Prevent priming; report all findings corresponding to the identified response variables Remain neutral and report all findings of similar importance with the same weight
  22. 22. “genome coverage increased over the human data when comparing SOAPdenovo2 against SOAPdenovo1” Link conclusions to experimental description
  23. 23. http://www.researchobject.org/ Aggregation and workflow preservation as ResearchObject: enables the aggregation of the digital resources contributing to findings of computational research, including results, data and software, as citable compound digital objects
  24. 24. http://isa-tools.github.io/soapdenovo2 Aggregation and workflow preservation as http://www.researchobject.org/
  25. 25. From narrative to self-described structured data Model & workflow assisted experimental description and review process Depth and breadth of semantic resources, clear meaning of experimental elements
  26. 26. Ruibang Luo, University of Hong Kong Tin-Lap Lee, Chinese University of Hong Kong Tak-wah Lam, University of Hong Kong SOAPdenovo2 Scott Edmunds, GigaScience Peter Li, GigaScience Marco Roos, Leiden University Mark Thompson, Leiden University Rajaram Kaliyaperumal, Leiden University Eelke van der Horst, Leiden University Jun Zhao, Lancaster University María Susana Avila García, Oxford University Philippe Rocca-Serra, Oxford University Susanna-Assunta Sansone, Oxford University Alejandra Gonzalez-Beltran, Oxford University Team
  27. 27. Questions? You can email us... isatools@googlegroups.com View our blog http://isatools.wordpress.com Follow us onTwitter @isatools View our websites View our Git repo & contribute http://github.com/ISA-tools Thanks for your attention!

×