SlideShare a Scribd company logo
1 of 39
Download to read offline
Tools for: 
Open-Source 
Open-Data 
Rob L Davidson about.me/rob.davidson
The problem 
reproducibility.cs.arizona.edu 
• 515 papers (429 conf, 86 journal) 
• <30% reproducible
The problem 
reproducibility.cs.arizona.edu
The Cause 
• Stodden 2010 
– 638 registrant at NIPS 
• 30% share code 
• 20% share data 
http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf
Publishers must provide! 
Hosting 
Curating 
Citations for everything: 
data, tools + workflows
Tools for Reproducibility 
• Data: GigaDB 
• Images: OMERO 
• Workflows 
– Galaxy 
– Executable Docs 
– VMs
GigaDB 
github.com/gigascience/gigadb-cogini
Hosting all data
Hosting all research objects
Impact for research objects 
• Host 
• Curate 
• Share 
• Cite - DOI
Even more accessible, transparent data? 
Hosting image data with OMERO
Hosting Images 
• Image LIMS 
• Web embedding 
– View online, no 
need for software 
• Full res 
• Link all images to 
publication 
– No cherry picking 
http://www.openmicroscopy.org/site/products/omero
Cyber-Centipedes! Phenotyping 
NO
Accessible Cyber-Centipede images 
OMERO: providing 
access to imaging data 
View, filter, measure raw 
images with direct links 
from journal article. 
See all image data, not 
just cherry picked 
examples. 
Download and reprocess.
OMERO: Adding value
The alternative... 
...look but don't touch
Workflows 
1. Galaxy 
galaxyproject.org
galaxy.cbiit.cuhk.edu.hk
Implement workflows in a community-accepted 
format 
http://galaxyproject.org 
Open source 
Over 45,000 main 
Galaxy server users 
Over 1,000 papers 
citing Galaxy use 
Over 55 Galaxy 
servers deployed
Implement workflows in an intuitive format 
Tool list ToCoolp yprigahrt aNBmAFe-Bte 2r0i1s3ation Results panel
Visualising Workflows
Birmingham Metabo-Galaxy Workflow
Birmingham Metabo-Galaxy 
Tools wrapped in Python and XML 
User sees web form (easy!) 
Data stored centrally (secure!) 
Work done centrally (easy update)
Hosting Workflows
Hosting Workflows 
1) Test data 
2) Software files 
3) Instructions 
+ Galaxy implementation
Can we reproduce results? SOAPdenovo2 S. aureus pipeline
Workflows 
2. Executable Docs
Open lab books, dynamic documents 
• Facilitate reuse and sharing with tools like: Knitr, Sweave, 
iPython Notebook 
Sweave 
• Working towards executable papers…
E.g.
E.g.
Some testimonials for Knitr 
Authors (Wolfgang Huber) 
“I do all my projects in Knitr. Having the textual 
explanation, the associated code and the results all in one 
place really increases productivity, and helps explaining 
my analyses to colleagues, or even just to my future self.” 
Reviewers (Christophe Pouzat) 
“It took me a couple of hours to get the data, the few 
custom developed routines, the “vignette” and to 
REPRODUCE EXACTLY the analysis presented in the 
manuscript. With few more hours, I was able to modify the 
authors’ code to change their Fig. 4. In addition to making 
the presented research trustworthy, the reproducible 
research paradigm definitely makes the reviewer’s job 
much more fun!
Workflow accessibility: 
VMs
Why VMs? 
• OS settings 
• Dependencies 
– Versions 
– e.g. python! 
• Data + Code linked 
• Download or run in 
cloud
VMs in GigaDB
Summary
Share data in GigaDB 
Share all images in GigaDB 
-View images via OMERO 
Share code in GigaDB! 
Share pipeline using: 
Executable docs! 
Galaxy! 
VMs!
Improve 
reproducibility! 
Give us data, papers 
& pipelines* 
Contact us: 
scott@gigasciencejournal.com 
editorial@gigasciencejournal.com 
database@gigasciencejournal.com 
* APC’s currently generously covered 
by BGI until 2015 
www.gigasciencejournal.com
Thanks to: 
team: Our collaborators: Case study: 
Ruibang Luo (BGI/HKU) 
Shaoguang Liang (BGI-SZ) 
Tin-Lap Lee (CUHK) 
Qiong Luo (HKUST) 
Senghong Wang (HKUST) 
Yan Zhou (HKUST) 
Funding from: CBIIT 
@gigascience 
facebook.com/GigaScience 
blogs.biomedcentral.com/gigablog/ 
Peter Li 
Huayan Gao 
Chris Hunter 
Jesse Si Zhe 
Nicole Nogoy 
Laurie Goodman 
Amye Kenall 
(BMC) 
Marco Roos (LUMC) 
Mark Thompson (LUMC) 
Jun Zhao (Lancaster) 
Susanna Sansone (Oxford) 
Philippe Rocca-Serra (Oxford) 
Alejandra Gonzalez-Beltran 
(Oxford) 
www.gigadb.org 
galaxy.cbiit.cuhk.edu.hk 
www.gigasciencejournal.com

More Related Content

Similar to G3 talk rld_2

EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Derek Jacoby
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOpsJeff Bramwell
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...GigaScience, BGI Hong Kong
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Xing Xu
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample TrackingBruce Kozuma
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOpsJeff Bramwell
 
Code the docs-yu liu
Code the docs-yu liuCode the docs-yu liu
Code the docs-yu liuStreamNative
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkCarolyn Duby
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a briefameermalik11
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOpsJeff Bramwell
 
"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013Kaitlin Thaney
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 

Similar to G3 talk rld_2 (20)

EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
 
Code the docs-yu liu
Code the docs-yu liuCode the docs-yu liu
Code the docs-yu liu
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
The Power of Azure DevOps
The Power of Azure DevOpsThe Power of Azure DevOps
The Power of Azure DevOps
 
"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 

Recently uploaded

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Recently uploaded (20)

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

G3 talk rld_2