Your SlideShare is downloading. ×
0
GigaDB and Galaxy: revolutionizing datadissemination, organization and analysis                  Peter Li                G...
Journal and database for    large-scale data  in conjunction with     Editor-in-Chief: Laurie Goodman          Editor: Sco...
Why another *omics journal?  Already many journals publishing research           involving large data sets                ...
Unrepeatability of scientific results                                                  Out of 18 microarray papers, result...
How are we supporting data               reproducibility?                    Data setsGigaScience   paper            Analy...
Linking of papers and data    by citation of DOIs                Data set DOI                Paper DOI
http://gigadb.org
GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biologicaland bi...
Faster download speedsAspera data transfer
BGI Datasets Get DOI®sInvertebrate                                     Released pre-publicationAnt                        ...
Currently: 39 public datasets                          *10 citations in references*HumansAncient DNA- Aboriginal Australia...
What about the analyses?                           Data setsGigaScience   paper                  Analyses              How...
Bioinformatics data analyses as workflows Example workflow: Investigate the evolutionary relationships between proteins   ...
Implement GigaScience workflows     in a community-accepted format                                         Open source    ...
Tool list   Tool parameterisation   Results panel
Pilot project - Integrate BGI SOAP         package into GalaxyEnable SOAP tools to be used from within Galaxy workflows
Integrate BGI SOAP package into                 Galaxy                      Data analysis pipelinesPython    Python     Py...
GitHub open code repository  https://github.com/gigascience
Tool list   Tool parameterisation   Results panel
SOAPdenovo2 Galaxy workflow
http://www.myexperiment.org
Why publish in GigaScience?             Benefit                        Added value•   Data hosted in GigaDB        •   No ...
Thanks to:• Tin-Lap Lee and Huayan Gao - CUHK• Tam, Jesse, Scott, Nicole & Laurie - GigaScience          peter@gigascience...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organization and analysis
Upcoming SlideShare
Loading in...5
×

Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organization and analysis

3,732

Published on

Peter Li's talk on GigaDB and Galaxy at BGI's 3rd Bioinformatics Software and Data Release Conference at #ICG7 in Hong Kong, 28th November 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,732
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Mini-ping genome published this month
  • DOIsProvide example of a GigaScience paperMention DOI for the paper itselfHighlight data set generated and its DOI
  • And now that you all want to submit to GigaDB, how do you do that and how will people search and find your data and, other than citing your DOI, what will they be able to do with the data? We have redesigned the underlying Giga database and we’re working on the front end which we hope to be public early next month so the following slides are a mix of screenshots from the development site overlaid with tweaks made in powerpoint to illustrate features you can hope to see when we go live.These include:a home page image slider for browsing datasetsa text box search which I will demonstrate shortly
  • ***NEEDS REWORKING!!!!***This is an example landing page for DOI 10.5524/100015 for the YH genome dataset. These pages are still in development but you can see the date released, title and abstract and how the dataset should be cited.Additional information includes links to manuscripts and data accessions at EBI, NCBI or DDBJ.There is then information on the samples and files.
  • A GigaDB dataset citation is also included in the YH Transcriptome paper published in Nature Biotechnology in February this year.As you can see the dataset was published in 2011 but this did not prevent subsequent publication of the analysis paper.
  • Over 20,000 users on the main serverOver 500 papers citing the use of GalaxyOver 55 servers deployed on the Web
  • Allows scientists who may not have programming skills to be able to compose data analysis pipelines.
  • DOIs can now be tracked in the new Thomson Reuters Data Citation index - which gives form of credit and makes the data more discoverable (Scott)
  • Transcript of "Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organization and analysis"

    1. 1. GigaDB and Galaxy: revolutionizing datadissemination, organization and analysis Peter Li GigaScience peter@gigasciencejournal.com
    2. 2. Journal and database for large-scale data in conjunction with Editor-in-Chief: Laurie Goodman Editor: Scott Edmunds Commissioning Editor: Nicole Nogoy Lead Curator: Tam Sneddon Data Platform: Peter Li www.gigasciencejournal.com
    3. 3. Why another *omics journal? Already many journals publishing research involving large data sets Results reproducibility
    4. 4. Unrepeatability of scientific results Out of 18 microarray papers, results from 10 could not be reproducedIoannidis et al., 2009. Repeatability of published microarray gene expression analyses.Nature Genetics 41: 149-155.
    5. 5. How are we supporting data reproducibility? Data setsGigaScience paper Analyses Community tools for data reproduction and reuse
    6. 6. Linking of papers and data by citation of DOIs Data set DOI Paper DOI
    7. 7. http://gigadb.org
    8. 8. GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biologicaland biomedical research as it enters the era of “big-data”… (see more)
    9. 9. Faster download speedsAspera data transfer
    10. 10. BGI Datasets Get DOI®sInvertebrate Released pre-publicationAnt Paper published in GigaScience- Florida carpenter ant Microbe- Jerdon’s jumping ant Vertebrates E. Coli O104:H4 TY-2482- Leaf-cutter ant Darwin’s Finch T2D gut metagenomeRoundworm Giant panda MacaqueSchistosoma -Chinese rhesus Cell-LinesSilkworm -Crab-eating Chinese Hamster OvaryParasitic nematode Mini-Pig Mouse methylomesPacific oyster Naked mole ratHuman Parrot, Puerto Rican PLANTSAsian individual (YH) Penguin Chinese cabbage- DNA Methylome - Emperor penguin Cucumber- Genome Assembly - Adelie penguin Foxtail millet- Transcriptome Pigeon, domestic PigeonpeaCancer (14TB) Polar bear PotatoSingle cell bladder cancer Sheep SorghumHBV infected exomes Tibetan antelopeAncient DNA- Saqqaq Eskimo 39 data sets- Aboriginal Australian
    11. 11. Currently: 39 public datasets *10 citations in references*HumansAncient DNA- Aboriginal Australian- Saqqaq EskimoAsian individual (YH)
    12. 12. What about the analyses? Data setsGigaScience paper Analyses How will we make analyses available for downloading and execution?
    13. 13. Bioinformatics data analyses as workflows Example workflow: Investigate the evolutionary relationships between proteins Multiple ProteinQuery sequence sequences alignment
    14. 14. Implement GigaScience workflows in a community-accepted format Open source Over 20,000 main Galaxy server users Over 500 papers citing Galaxy use Over 55 Galaxy servers deployedhttp://galaxyproject.org
    15. 15. Tool list Tool parameterisation Results panel
    16. 16. Pilot project - Integrate BGI SOAP package into GalaxyEnable SOAP tools to be used from within Galaxy workflows
    17. 17. Integrate BGI SOAP package into Galaxy Data analysis pipelinesPython Python Python Python Python Pythonwrapper wrapper wrapper wrapper wrapper wrapperSOAP1 SOAP2 SOAPdenovo1 SOAPdenovo2 SOAPsnp SOAPsplice
    18. 18. GitHub open code repository https://github.com/gigascience
    19. 19. Tool list Tool parameterisation Results panel
    20. 20. SOAPdenovo2 Galaxy workflow
    21. 21. http://www.myexperiment.org
    22. 22. Why publish in GigaScience? Benefit Added value• Data hosted in GigaDB • No need to use own servers• Allocation of DOIs to data • Citable data• Metadata in isa-tab format • Aids reuse of data• Galaxy tool integration • Supports reuse of tools• Use of tools in Galaxy • Improves documentation workflows • Shows how tool can be used with other bioinf. software
    23. 23. Thanks to:• Tin-Lap Lee and Huayan Gao - CUHK• Tam, Jesse, Scott, Nicole & Laurie - GigaScience peter@gigasciencejournal.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×