GigaScience: a new resource for the big-data community.

  • 1,395 views
Uploaded on

Scott Edmunds talk at BGI's Bio-IT APAC meeting in Shenzhen introducing BGI/BMC's new big-data journal - GigaScience. 7th July 2011

Scott Edmunds talk at BGI's Bio-IT APAC meeting in Shenzhen introducing BGI/BMC's new big-data journal - GigaScience. 7th July 2011

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,395
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
23
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scott Edmunds
    :: a new resource for the big-data community.
    www.gigasciencejournal.com
  • 2. Data Tsunami?
    Flickr cc: opensourceway
  • 3. Sequencing cost($ per Mbp)
    Moore’s Law
    ~100,000X
    Sequencing
    Source: E Lander/Broad
  • 4. Sequencing Output
    Data
    Storage
    Moore’s/Kryders Law
  • 5. Sequencing Output
    Data
    Publication
    Dissemination?
  • 6. Potential sequencing capacity
    1 IlluminaHiSeq 2000 (+Truseq upgrade)
    = 600Gb/run (12 days)
    X 128 Hiseq= 6Tb/day = >2Pb/year
    = ~ 2000 Human Genomes/day
  • 7. Can we keep up?
    Flickr cc: opensourceway
  • 8. Now taking submissions…
    Large-Scale Data
    Journal/Database
    In conjunction with:
    Editor-in-Chief: Laurie Goodman, PhD
    Editor: Scott Edmunds, PhD
    Assistant Editor: Alexandra Basford, PhD
    www.gigasciencejournal.com
  • 9. Editorial Board: International
    Stephen O'Brien, USA
    HanchuanPeng, USA
    Ming Qi, China/USA
    Susanna-AssuntaSansone, UK
    Michael Schatz, USA
    David Schwartz, USA
    SumioSugano, Japan
    Thomas Wachtler, Germany
    Jun Wang, China
    Marie Zins, France
    Stephan Beck, UK
    Ann-Shyn Chiang, Taiwan
    Richard Durbin, UK
    Paul Flicek, UK
    Robert Hanner, Canada
    YoshihideHayashizaki, Japan
    Henning Hermjakob, UK
    Gary King, USA
    Donald Moerman, Canada
    Karen Nelson, USA
    www.gigasciencejournal.com
  • 10. Editorial Board: Broad Span (all “big-data”)
    Stephen O'Brien, Genomics
    HanchuanPeng, Imaging/Neuro
    Ming Qi, Genetics/Variome
    Susanna-AssuntaSansone, Standards
    Michael Schatz, Genomics/Cloud
    David Schwartz, Optical Mapping
    SumioSugano, Genomics
    Thomas Wachtler, Neuroscience
    Jun Wang, Genomics
    Marie Zins, Medicine
    Stephan Beck, Epigenomics
    Ann-Shyn Chiang, Neuroscience
    Richard Durbin, Genetics/Genomics
    Paul Flicek, Genomics/Databases
    Robert Hanner, Barcoding/Ecology
    YoshihideHayashizaki, Genomics
    Henning Hermjakob, Proteomics
    Gary King, Medicine/methods
    Donald Moerman, Functional Genomics
    Karen Nelson, Metagenomics
    www.gigasciencejournal.com
  • 11. Criteria and Focus of Journal/Database
    • Reproducibility/Reuse
    • 12. Utility/Usability
    • 13. Standards/Searchability/Scale/Sharing
    • 14. Data publishing/DOI
    www.gigasciencejournal.com
  • 15. Use of Data = Importance + Usability
    easier to assess
    subjective?
    www.gigasciencejournal.com
  • 16. Data publishing/DOI
    • Data hosting will follow standard funding agency and community guidelines.
    • 17. DOI assignment available for submitted data to allow ease of findingand citing datasets, as well as for citation tracking.
    • 18. Datasets tracked by WOS/ISI allowing additional metrics/credit for use.
    www.gigasciencejournal.com
  • 19. Reproducibility/Reuse
    • BGI Cloud Computing resources for handling and analyzing large-scale data.
    • 20. Integrated tools to promote more widespread access, viewing, and analysis of data.
    • 21. Encourage and aid use of workflow systems for methods (e.g. submission of Galaxy XML files).
    www.gigasciencejournal.com
  • 22. Special Series/Hub for cloud-based tools
    • Technical notes: test tools in the BGI-Cloud.
    • 23. Tools + Test Data (BGI or user) in one place.
    • 24. Aids reproducibility.
    • 25. Aids reviewers (free)
    • 26. Aids authors: visibility (pubmed, etc.) hosting (included/free offers)
    –contact us: editorial@gigasciencejournal.com
    Oledoeflickr cc
    www.gigasciencejournal.com
  • 27. Standards/Searchability/Sharing
    • ISA-Tab compatibility to aid and promote best practice in metadata reporting.
    • 28. Allsupporting data must be publically available.
    • 29. Ask for MIBBI compliance and use of reporting checklists.
    • 30. Part of the Biosharing network.
    www.gigasciencejournal.com
  • 31. Benefits of Data-sharing
    Sharing Detailed Research Data Is Associated with Increased Citation Rate.
    Piwowar HA, Day RS, Fridsma DB (2007) PLoSONE 2(3): e308. doi:10.1371/journal.pone.0000308
    Every 10 datasets collected contributes to at least 4papers in the following 3-years.
    Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473 (7347), 285-285 DOI: 10.1038/473285a
  • 32. Our first DOI:
    To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:
    Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001
    To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 33.
  • 34.
  • 35. “The way that the genetic data of the 2011 E. coli strain were disseminated globally suggests a more effective approach for tackling public health problems. Both groups put their sequencing data on the Internet, so scientists the world over could immediately begin their own analysis of the bug's makeup. BGI scientists also are using Twitter to communicate their latest findings.”
    “German scientists and their colleagues at the Beijing Genomics Institute in China have been working on uncovering secrets of the outbreak. BGI scientists revised their draft genetic sequence of the E. coli strain and have been sharing their data with dozens of scientists around the world as a way to "crowdsource" this data. By publishing their data publicy and freely, these other scientists can have a look at the genetic structure, and try to sort it out for themselves.”
  • 36. G10K Genomes Get DOI®s
    doi:10.5524/100004
  • 37. We want your data!
    scott.edmunds@genomics.org.cn
    editorial@gigasciencejournal.com
    @gigascience
    www.gigasciencejournal.com