Your SlideShare is downloading. ×
Peter Li at GCC2014: A journal’s experiences of reproducing published data analyses
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Peter Li at GCC2014: A journal’s experiences of reproducing published data analyses

262
views

Published on

Peter Li at the 2014 Galaxy Community Conference: A journal’s experiences of reproducing published data analyses, 1st July 2014

Peter Li at the 2014 Galaxy Community Conference: A journal’s experiences of reproducing published data analyses, 1st July 2014

Published in: Technology, Health & Medicine

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
262
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Publication = Selective reporting?
    For research involving computation
  • DOIs
    Provide example of a GigaScience paper
    Mention DOI for the paper itself
    Highlight data set generated and its DOI
  • Transcript

    • 1. A journal’s experiences of reproducing published data analyses Peter Li peter@gigasciencejournal.com
    • 2. Journal and database for large-scale data studies Editor-in-Chief: Laurie Goodman Executive Editor: Scott Edmunds Commissioning Editor: Nicole Nogoy GigaDB: Chris Hunter, Jesse Xiao GigaGalaxy: Peter Li in conjunction with
    • 3. www.gigasciencejournal.com
    • 4. reproducibility trust understanding
    • 5. Publication only Full replication Not reproducible Gold standard Data Code and data Linked and executable code and data Publication + Reproducibility spectrum Adapted from Roger Peng (2011) Reproducible research in computational science. Science 334: 122
    • 6. gigadb.org
    • 7. Paper DOI Data set DOI Linking of papers and data by citation of DOIs
    • 8. Publication only Full replication Not reproducible Gold standard Data Code and data Linked and executable code and data Publication + Reproducibility spectrum Adapted from Roger Peng (2011) Reproducible research in computational science. Science 334: 1226-1227.
    • 9. Can the results in a GigaScience paper be replicated using Galaxy?
    • 10. Pilot project
    • 11. Replicate
    • 12. Tools http://gigadb.org/dataset/100044
    • 13. Tools and data http://gage.cbcb.umd.edu/data/index.html
    • 14. Data in GigaGalaxy
    • 15. Integration of SOAPdenovo2 into GigaGalaxy
    • 16. Short reads Downloaded pipeline Downloaded pipeline is missing two tools for reproducibility KmerFreq_AR Corrector_AR SOAPdenovo2 GapCloser Scaffold seqs Short reads Table 2 N50 & corrected N50 scores Required pipeline KmerFreq_AR Corrector_AR SOAPdenovo2 GapCloser ExtractACGT GAGE eval
    • 17. Short reads Table 2 N50 & corrected N50 scores Required pipeline KmerFreq_AR Corrector_AR SOAPdenovo2 GapCloser ExtractACGT GAGE eval Need to add two extra tools into GigaGalaxy
    • 18. SOAPdenovo2 S. aureus pipeline
    • 19. Species Tool Contigs Scaffolds Number N50 (kb) Errors N50 corrected (kb) Number N50 (kb) Errors N50 corrected (kb) S. aureus SOAPdenovo1 79 148.6 156 23 49 342 0 342 SOAPdenovo2 80 98.6 25 71.5 38 1086 2 1078 ALL-PATHS-LG 37 149.7 13 119.0 11 1477 1 1093 R. sphaeroides SOAPdenovo1 2241 3.5 400 2.8 956 106 24 68 SOAPdenovo2 721 18 106 14.1 333 2549 4 2540 ALL-PATHS-LG 190 41.9 30 36.7 32 3191 0 0 Published and Galaxy-reproduced statistics of genome assemblies of S. aureus and R. sphaeroides Species Tool Contigs Scaffolds Number N50 (kb) Errors N50 corrected (kb) Number N50 (kb) Errors N50 corrected (kb) S. aureus SOAPdenovo1 79 148.6 156 23 49 342 0 342 SOAPdenovo2 80 98.6 25 71.5 38 1086 2 1078 ALL-PATHS-LG 37 149.7 13 117.6 10 1477 1 1093 R. sphaeroides SOAPdenovo1 2242 3.5 392 2.8 956 105 18 70 SOAPdenovo2 721 18 106 14.1 333 2549 4 2540 ALL-PATHS-LG 190 41.9 31 36.7 32 3191 0 3310 PublishedReproduced
    • 20. http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/p/soapdenovo2-s-aureus
    • 21. Observations • Complete scientific reproduction is difficult – Time and effort required • Requires help from authors • Do we need education and training in scientific reproducibility?
    • 22. http://www.cf.ac.uk/socsi/contactsandpeople/harrycollins/image-36548-web.gif
    • 23. Ruibang Luo (BGI/HKU) Shaoguang Liang (BGI-SZ) Tin-Lap Lee (CUHK) Qiong Luo (HKUST) Senghong Wang (HKUST) Yan Zhou (HKUST) Thanks to: @gigascience facebook.com/GigaScience blogs.biomedcentral.com/gigablog/ Peter Li Huayan Gao Chris Hunter Jesse Si Zhe Nicole Nogoy Laurie Goodman Amye Kenall (BMC) Marco Roos (LUMC) Mark Thompson (LUMC) Jun Zhao (Lancaster) Susanna Sansone (Oxford) Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford) www.gigadb.org galaxy.cbiit.cuhk.edu.hk www.gigasciencejournal.com Funding from: Our collaborators:team: Case study:

    ×