From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences

Alejandra Gonzalez-Beltran
Alejandra Gonzalez-BeltranData Management Team Lead, Software Engineering Group
From peer-reviewed to peer-reproduced:
a role for research objects in scholarly
publishing in the life
sciences
Alejandra González-Beltrán
Oxford e-Research Centre, University of Oxford
-ontology.org
Bioinformatics Open Source Conference (BOSC), Dublin, Ireland
July 10-11 2015
From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences
"AGBell Notebook" by Alexander Graham Bell. (d. 1922) -
page 40-41 of Alexander Graham Bell Family Papers in the Library of Congress' Manuscript Division.
Licensed under Public Domain via Wikimedia Commons
- http://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg#/media/File:AGBell_Notebook.jpg
http://petcaretips.net/bonding-rabbit-to-pets.html
Many things have been said about
the challenges of
science reproducibility
and how it can go wrong…
Difficulties when the description
of the experimental steps
is only available in
lab notebooks and scientific articles;
lack of data,
lack of software tools
required for analysis
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
experimental
description
(design & steps)
conclusions
computational
workflows
aggregation & workflow preservation
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
• open peer-review
• availability of
• data
• analysis scripts
• documentation
Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments
reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler.
pre-publication history
https://github.com/aquaskyline/SOAPdenovo2
http://sourceforge.net/projects/soapdenovo2/
Experimental Description
Experimental Description
EXCELERATE interoperability component
http://www.ncbi.nlm.nih.gov/books/NBK279831/
http://elixir-uk.org/interoperability-infrastructure
genome
assembly
algorithm
genome
size
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables
(Factor Name, Factor Type)
3x3 factorial design
9 study groups
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational case
S. aureus
R. sphaeroides
B. impatiens
Chinese Han genome
(orYH genome)
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational case
Response Variables
(with units)
genome coverage (%)
computation run time (h)
peak memory consumption (Gb)
contig N50 (kb or bp)
scaffold N50 (kb or bp)
number of errors
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers
if available (ORCIDs, DOIs); we suggest a dedicated article section
Experimental workflows - identification of processes, their inputs and outputs
Experimental design: identify experimental goal, independent and response variables
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers
if available (ORCIDs, DOIs); dedicated article section
Experimental workflows - identification of processes, their inputs and outputs
Experimental design: identify experimental goal, independent and response variables
Reproducing SOAPdenovo2 results
with Galaxy workflows
S. aureus pipeline
Reproducing SOAPdenovo2 results
with Galaxy workflows
S. aureus pipeline
2241 400
30
119.0 11 106 24 68
0
Reproducing SOAPdenovo2 results
with Galaxy workflows
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its
provenance in a single publishable and citable entity
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its
provenance in a single publishable and citable entity
Abstract & Conclusions
assertion provenance
Generation of nanopublications for all the results of the
response variables
NanoMaton
templates for nanopublications
Prevent priming; report all findings corresponding to the identified
response variables
Remain neutral and report all findings of similar
importance with the same weight
“genome coverage increased
over the human data when
comparing SOAPdenovo2
against SOAPdenovo1”
Link conclusions
to
experimental
description
http://www.researchobject.org/
Aggregation and workflow preservation as
ResearchObject: enables the aggregation of the digital
resources contributing to findings of computational
research, including results, data and software, as citable
compound digital objects
http://isa-tools.github.io/soapdenovo2
Aggregation and workflow preservation as
http://www.researchobject.org/
From narrative to self-described structured data
Model & workflow assisted experimental description and review process
Depth and breadth of semantic resources, clear meaning of experimental
elements
Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of Hong Kong
Tak-wah Lam, University of Hong Kong
SOAPdenovo2
Scott Edmunds, GigaScience
Peter Li, GigaScience
Marco Roos, Leiden University
Mark Thompson, Leiden University
Rajaram Kaliyaperumal, Leiden University
Eelke van der Horst, Leiden University
Jun Zhao, Lancaster University
María Susana Avila García,
Oxford University
Philippe Rocca-Serra, Oxford University
Susanna-Assunta Sansone, Oxford University
Alejandra Gonzalez-Beltran, Oxford University
Team
Questions?
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
@isatools
View our websites
View our Git repo & contribute
http://github.com/ISA-tools
Thanks for your attention!
1 of 28

More Related Content

Similar to From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences(20)

More from Alejandra Gonzalez-Beltran(18)

The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
Alejandra Gonzalez-Beltran322 views
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
Alejandra Gonzalez-Beltran368 views
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
Alejandra Gonzalez-Beltran332 views
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
Alejandra Gonzalez-Beltran433 views
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
Alejandra Gonzalez-Beltran521 views
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
Alejandra Gonzalez-Beltran487 views
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
Alejandra Gonzalez-Beltran976 views
UKON 2014UKON 2014
UKON 2014
Alejandra Gonzalez-Beltran5.3K views
NETTAB 2013NETTAB 2013
NETTAB 2013
Alejandra Gonzalez-Beltran5.5K views
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
Alejandra Gonzalez-Beltran5.4K views
BCU 2013BCU 2013
BCU 2013
Alejandra Gonzalez-Beltran394 views
CSHALS 2013CSHALS 2013
CSHALS 2013
Alejandra Gonzalez-Beltran5.3K views
SELENfest 2012SELENfest 2012
SELENfest 2012
Alejandra Gonzalez-Beltran591 views
NETTAB 2012NETTAB 2012
NETTAB 2012
Alejandra Gonzalez-Beltran5.4K views
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
Alejandra Gonzalez-Beltran5.5K views

From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences

  • 1. From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences Alejandra González-Beltrán Oxford e-Research Centre, University of Oxford -ontology.org Bioinformatics Open Source Conference (BOSC), Dublin, Ireland July 10-11 2015
  • 3. "AGBell Notebook" by Alexander Graham Bell. (d. 1922) - page 40-41 of Alexander Graham Bell Family Papers in the Library of Congress' Manuscript Division. Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg#/media/File:AGBell_Notebook.jpg http://petcaretips.net/bonding-rabbit-to-pets.html Many things have been said about the challenges of science reproducibility and how it can go wrong… Difficulties when the description of the experimental steps is only available in lab notebooks and scientific articles; lack of data, lack of software tools required for analysis
  • 4. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How? experimental description (design & steps) conclusions computational workflows aggregation & workflow preservation
  • 5. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  • 6. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  • 7. Can data models and computational workflows help in capturing the experimental processes and reproduce findings? How?
  • 8. • open peer-review • availability of • data • analysis scripts • documentation Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler. pre-publication history https://github.com/aquaskyline/SOAPdenovo2 http://sourceforge.net/projects/soapdenovo2/
  • 10. Experimental Description EXCELERATE interoperability component http://www.ncbi.nlm.nih.gov/books/NBK279831/ http://elixir-uk.org/interoperability-infrastructure
  • 11. genome assembly algorithm genome size Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case
  • 12. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case
  • 13. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) 3x3 factorial design 9 study groups The experimental plan - computational case
  • 14. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case S. aureus R. sphaeroides B. impatiens Chinese Han genome (orYH genome)
  • 15. genome assembly algorithm genome size SOAPdenovo2 SOAPdenovo1 ALL-PATHS-LG bacterial genome insect genome human genome bacterial genome insect genome human genome bacterial genome insect genome human genome Predictor Variables (Factor Name, Factor Type) The experimental plan - computational case Response Variables (with units) genome coverage (%) computation run time (h) peak memory consumption (Gb) contig N50 (kb or bp) scaffold N50 (kb or bp) number of errors
  • 16. The experimental steps Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); we suggest a dedicated article section Experimental workflows - identification of processes, their inputs and outputs Experimental design: identify experimental goal, independent and response variables
  • 17. The experimental steps Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); dedicated article section Experimental workflows - identification of processes, their inputs and outputs Experimental design: identify experimental goal, independent and response variables
  • 18. Reproducing SOAPdenovo2 results with Galaxy workflows S. aureus pipeline
  • 19. Reproducing SOAPdenovo2 results with Galaxy workflows S. aureus pipeline
  • 20. 2241 400 30 119.0 11 106 24 68 0 Reproducing SOAPdenovo2 results with Galaxy workflows
  • 21. Publishing findings as nanopublications assertion provenance publication info nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity
  • 22. Publishing findings as nanopublications assertion provenance publication info nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity Abstract & Conclusions assertion provenance Generation of nanopublications for all the results of the response variables NanoMaton templates for nanopublications Prevent priming; report all findings corresponding to the identified response variables Remain neutral and report all findings of similar importance with the same weight
  • 23. “genome coverage increased over the human data when comparing SOAPdenovo2 against SOAPdenovo1” Link conclusions to experimental description
  • 24. http://www.researchobject.org/ Aggregation and workflow preservation as ResearchObject: enables the aggregation of the digital resources contributing to findings of computational research, including results, data and software, as citable compound digital objects
  • 25. http://isa-tools.github.io/soapdenovo2 Aggregation and workflow preservation as http://www.researchobject.org/
  • 26. From narrative to self-described structured data Model & workflow assisted experimental description and review process Depth and breadth of semantic resources, clear meaning of experimental elements
  • 27. Ruibang Luo, University of Hong Kong Tin-Lap Lee, Chinese University of Hong Kong Tak-wah Lam, University of Hong Kong SOAPdenovo2 Scott Edmunds, GigaScience Peter Li, GigaScience Marco Roos, Leiden University Mark Thompson, Leiden University Rajaram Kaliyaperumal, Leiden University Eelke van der Horst, Leiden University Jun Zhao, Lancaster University María Susana Avila García, Oxford University Philippe Rocca-Serra, Oxford University Susanna-Assunta Sansone, Oxford University Alejandra Gonzalez-Beltran, Oxford University Team
  • 28. Questions? You can email us... isatools@googlegroups.com View our blog http://isatools.wordpress.com Follow us onTwitter @isatools View our websites View our Git repo & contribute http://github.com/ISA-tools Thanks for your attention!