Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy


Published on

Tin-Lap Lee's presentation at Bio-IT World Asia on "Next-Gen Sequencing Analysis by GigaGalaxy", 30th May 2013

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Galaxy is a web-based data analysis platform developed by PSUAccessible, Reproducible, and transparentEasy to use, no command line, much shorter learning curve for biologists
  • The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.
  • The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference. Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP.
  • Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

    1. 1. Next-Gen Sequencing Analysisby GigaGalaxyTin-Lap, LEESchool of Biomedical SciencesCUHK-BGI Innovation Institute of Trans-omics,The Chinese University of Hong Kong
    2. 2. CUHK-BGI Innovation Institute of Trans-Omics (CBIIT)• Jointly established between The ChineseUniversity of Hong Kong (CUHK) and BGIin July 2011.• “We aim to provide a platform conductiveto training of multi-disciplinary talentsconversant with the knowledge andapplication of genomics, proteomics,genetics, computation biology andbioinformatics, by capitalizing on bothinstitutions’ expertise and strengths ingenomic science.”
    3. 3. Galaxyhttp://galaxyproject.org/
    4. 4. www.gigasciencejournal.comJournal, data-platform anddatabase for large-scale dataEditor-in-Chief: Laurie GoodmanExecutive Editor: Scott EdmundsCommissioning Editor: Nicole NogoyLead Curator: Chris HunterData Platform: Peter Liin conjunction with
    5. 5. GigaDB
    6. 6. Giga-Galaxy Collaboration between GigaScience and CBIIT A publicly accessible Galaxy Servers Share some of the workload of the main Galaxy server Host data and workflows published in GigaScience, particularly involvingNGS data analysis SOAP package: advantages from GigaGalaxy Application Instance: SOAPdenovo2 tool
    7. 7. http://www.cuhk.edu.hk/cbiit/galaxy.htmlGalaxy/CUHK-BGI
    8. 8. Import data from GigaDB to GigaGalaxy
    9. 9. GigaSolution: deconstructing the paperwww.gigadb.orgwww.gigasciencejournal.comgalaxy.cbiit.cuhk.edu.hkCombines and integrates:Open-access journalData Publishing PlatformData Analysis Platform
    10. 10. doi:10.1186/2047-217X-1-18doi:10.5524/100038AnalysisData Methodsdoi:10.5524/100044+ =Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of aHan Chinese individual (version 2, 07/2012). GigaScience Database.http://dx.doi.org/10.5524/100038Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improvedmemory-efficient short read de novo assembly”. GigaScience Database.http://dx.doi.org/10.5524/100044DataMethodsLuo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novoassembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18AnalysisExample
    11. 11. CBIIT GigaGalaxy StructureToolDevelopment PublishingBiomedical and bioinformatics research
    12. 12. What is SOAP?• SOAP - a tool package that provides full solution to NGS data analysis by BGI.http://soap.genomics.org.cn/
    13. 13. SOAPdenovo2 tools An assembly tool for short reads generated from NGStechnology Four modules Pregraph: construct bruijn graph Contig: identification from overlapping sequence reads Map: reads onto contigs Scaff: generate final assembly results Generate 1. Contig and 2. Scaffold files
    14. 14. SOAPdenovo2 in GigaGalaxy
    15. 15. Integrate BGI SOAP tools into Giga-Galaxy
    16. 16. Assembly Supporting Tools• SOAPfilter: removed reads with artifacts• Kmerfreq HA: a kmer frequency counter• Corrector HA: corrects sequencing errors in short reads• Gapcloser: close gaps in scaffolds
    17. 17. Put them togetherSequencingDataSOAPfilter kmerFreq HACorrector HASOAPdenovo2GAGE evaluation
    18. 18. Soapdenovo2 Workflow
    19. 19. S. Aureus Dataset
    20. 20. GAGE
    21. 21. Visualization Tool: CONTIGuator2
    22. 22. CONTIGuator2 output
    23. 23. VisualizationNC_010079.pdfgi_161510924_ref_NC_010063.1_.pdf
    24. 24. Help Center: Shared Data• Several Datasets are available from the shared data menufor test-running the tools.• Data Libraries• Published Workflows• Published Pages
    25. 25. What is in the shared data menu?
    26. 26. SOAPdenovo2 tutorial
    27. 27. How is GigaScience supporting datareproducibility?Data setsAnalysesOpen-PaperOpen-ReviewDOI:10.1186/2047-217X-1-18~10000 accessesOpen-Code8 reviewers tested data in ftp server & named reports publishedDOI:10.5524/100044Open-PipelinesOpen-WorkflowsDOI:10.5524/100038Open-Data78GB CC0 dataCode in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/~5000 downloadsEnabled code to being picked apart by bloggers in wikihttp://homolog.us/wiki/index.php?title=SOAPdenovo2
    28. 28. SOAPdenovo2 workflows implemented ingalaxy.cbiit.cuhk.edu.hkImplemented entire workflow in GigaGalaxy server, inc.:• 3 pre-processing steps• 4 SOAPdenovo modules• 1 post processing steps• Evaluation and visualization toolsWill be available for >25K Galaxy users in Galaxy Toolshed
    29. 29. Acknowledgements• CUHK• Huayuan Gao• BGI-HK and GigaScience• Peter Li• Scott Edmunds• Galaxy team members