Beyond Dead Trees: data & workflow
publishing with
Scott Edmunds
Rob Davidson
The problems with publishing
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts...
Why is this important?
…to publish protocols BEFORE analysis
…better access to supporting data/code
…more transparent & ac...
Consequences: increasing number of retractions
>15X increase in last decade
1. Ioannidis et al., 2009. Repeatability of pu...
• Data
• Software
• Review
• Re-use…
= Credit
}
Credit where credit is overdue:
“One option would be to provide researcher...
GigaSolution: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
Utilizes big-data infrastructure and expe...
On top of regular papers…
Rewarding open data
http://gigadb.org/
• Multi Omics focus (not just genomics)
• 10-100x faster download than FTP
• Provide (ISA) curation & integration with oth...
IRRI GALAXY
Democratization through data publishing
IRRI GALAXY
Rice 3K project: 3,000 rice genomes, 13.4TB public data
Democratization through data publishing
Two tools for reproducible research
Rob Davidson
RO:and
GigaSolution: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
Utilizes big-data infrastructure and expe...
Visualizations
& DOIs for workflows
galaxy.cbiit.cuhk.edu.hk
Implement workflows in a community-accepted format
http://galaxyproject.org
Over 36,000 main
Galaxy server users
Over 1,00...
Copyright NBAF-B 2013
Tool Tool parameterisation Results panel
Rewarding and aiding reproducibility
Implement workflows in...
Birmingham Metabo-Galaxy Workflow
Birmingham Metabo-Galaxy
Tools wrapped in Python and XML
User sees web form (easy!)
Data stored centrally (secure!)
Work d...
First RAW -> stats Galaxy Pipe
SOAPdenovo2 S. aureus pipeline
NO
Handling of imaging (phenotype) data
Cyber-centipedes & virtual worms
Aiding reproducibility
OMERO: providing
access to imaging data
View, filter, measure raw
images with direct links
from jou...
JCB: Aiding reproducibility, adding value
The alternative...
...look but don't touch
In Summary
• Reproducibility is important!!
– Currently not very common!
• Many tools appearing for data publishing and
sh...
Give us data, papers &
pipelines*
Help us make it
happen!
scott@gigasciencejournal.com
rob@gigasciencejournal.com
editoria...
Ruibang Luo (BGI/HKU)
Shaoguang Liang (BGI-SZ)
Tin-Lap Lee (CUHK)
Qiong Luo (HKUST)
Senghong Wang (HKUST)
Yan Zhou (HKUST)...
Upcoming SlideShare
Loading in …5
×

Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

1,064 views

Published on

Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience, Tsuruoka 23rd June 2014

Published in: Science, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,064
On SlideShare
0
From Embeds
0
Number of Embeds
99
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Over 20,000 users on the main server
    Over 500 papers citing the use of Galaxy
    Over 55 servers deployed on the Web
  • That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing.
    Thank you for listening.
  • Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting on Beyond Dead Trees: data & workflow publishing with GigaScience

    1. 1. Beyond Dead Trees: data & workflow publishing with Scott Edmunds Rob Davidson
    2. 2. The problems with publishing • Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995 • Lack of transparency, lack of credit for anything other than “regular” dead tree publication. • Traditional publishing models, policies and practices holding things back
    3. 3. Why is this important? …to publish protocols BEFORE analysis …better access to supporting data/code …more transparent & accountable review …to publish replication studies Need:
    4. 4. Consequences: increasing number of retractions >15X increase in last decade 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 At current % > by 2045 as many papers published as retracted
    5. 5. • Data • Software • Review • Re-use… = Credit } Credit where credit is overdue: “One option would be to provide researchers who release data to public repositories with a means of accreditation.” “An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “ Nature Biotechnology 27, 579 (2009) New incentives/credit
    6. 6. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com Utilizes big-data infrastructure and expertise from: Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
    7. 7. On top of regular papers…
    8. 8. Rewarding open data http://gigadb.org/
    9. 9. • Multi Omics focus (not just genomics) • 10-100x faster download than FTP • Provide (ISA) curation & integration with other DBs (e.g. MetaboLights, SRA, etc.) For more see: http://database.oxfordjournals.org/content/2014/bau018.abstract
    10. 10. IRRI GALAXY Democratization through data publishing
    11. 11. IRRI GALAXY Rice 3K project: 3,000 rice genomes, 13.4TB public data Democratization through data publishing
    12. 12. Two tools for reproducible research Rob Davidson RO:and
    13. 13. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com Utilizes big-data infrastructure and expertise from: Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
    14. 14. Visualizations & DOIs for workflows galaxy.cbiit.cuhk.edu.hk
    15. 15. Implement workflows in a community-accepted format http://galaxyproject.org Over 36,000 main Galaxy server users Over 1,000 papers citing Galaxy use Over 55 Galaxy servers deployed Open source Rewarding and aiding reproducibility
    16. 16. Copyright NBAF-B 2013 Tool Tool parameterisation Results panel Rewarding and aiding reproducibility Implement workflows in a community-accepted format
    17. 17. Birmingham Metabo-Galaxy Workflow
    18. 18. Birmingham Metabo-Galaxy Tools wrapped in Python and XML User sees web form (easy!) Data stored centrally (secure!) Work done centrally (easy update)
    19. 19. First RAW -> stats Galaxy Pipe
    20. 20. SOAPdenovo2 S. aureus pipeline
    21. 21. NO Handling of imaging (phenotype) data Cyber-centipedes & virtual worms
    22. 22. Aiding reproducibility OMERO: providing access to imaging data View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess.
    23. 23. JCB: Aiding reproducibility, adding value
    24. 24. The alternative... ...look but don't touch
    25. 25. In Summary • Reproducibility is important!! – Currently not very common! • Many tools appearing for data publishing and sharing (images, tools, workflows). • Data publishing → more publications, more citations, more impact! • Are you convinced? • What barriers? Code standards? Data standards? Too much work?
    26. 26. Give us data, papers & pipelines* Help us make it happen! scott@gigasciencejournal.com rob@gigasciencejournal.com editorial@gigasciencejournal.com database@gigasciencejournal.com Contact us: * APC’s currently generously covered by BGI until 2015 www.gigasciencejournal.com
    27. 27. Ruibang Luo (BGI/HKU) Shaoguang Liang (BGI-SZ) Tin-Lap Lee (CUHK) Qiong Luo (HKUST) Senghong Wang (HKUST) Yan Zhou (HKUST) Thanks to: @gigascience facebook.com/GigaScience blogs.biomedcentral.com/gigablog/ Peter Li Huayan Gao Chris Hunter Jesse Si Zhe Nicole Nogoy Laurie Goodman Amye Kenall (BMC) Marco Roos (LUMC) Mark Thompson (LUMC) Jun Zhao (Lancaster) Susanna Sansone (Oxford) Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford) www.gigadb.org galaxy.cbiit.cuhk.edu.hk www.gigasciencejournal.com CBIITFunding from: Our collaborators:team:

    ×