Your SlideShare is downloading. ×
  • Like
Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy


Tin-Lap Lee's presentation at Bio-IT World Asia on "Next-Gen Sequencing Analysis by GigaGalaxy", 30th May 2013

Tin-Lap Lee's presentation at Bio-IT World Asia on "Next-Gen Sequencing Analysis by GigaGalaxy", 30th May 2013

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Galaxy is a web-based data analysis platform developed by PSUAccessible, Reproducible, and transparentEasy to use, no command line, much shorter learning curve for biologists
  • The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.
  • The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference. Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP.


  • 1. Next-Gen Sequencing Analysisby GigaGalaxyTin-Lap, LEESchool of Biomedical SciencesCUHK-BGI Innovation Institute of Trans-omics,The Chinese University of Hong Kong
  • 2. CUHK-BGI Innovation Institute of Trans-Omics (CBIIT)• Jointly established between The ChineseUniversity of Hong Kong (CUHK) and BGIin July 2011.• “We aim to provide a platform conductiveto training of multi-disciplinary talentsconversant with the knowledge andapplication of genomics, proteomics,genetics, computation biology andbioinformatics, by capitalizing on bothinstitutions’ expertise and strengths ingenomic science.”
  • 3. Galaxy
  • 4. www.gigasciencejournal.comJournal, data-platform anddatabase for large-scale dataEditor-in-Chief: Laurie GoodmanExecutive Editor: Scott EdmundsCommissioning Editor: Nicole NogoyLead Curator: Chris HunterData Platform: Peter Liin conjunction with
  • 5. GigaDB
  • 6. Giga-Galaxy Collaboration between GigaScience and CBIIT A publicly accessible Galaxy Servers Share some of the workload of the main Galaxy server Host data and workflows published in GigaScience, particularly involvingNGS data analysis SOAP package: advantages from GigaGalaxy Application Instance: SOAPdenovo2 tool
  • 7.
  • 8. Import data from GigaDB to GigaGalaxy
  • 9. GigaSolution: deconstructing the and integrates:Open-access journalData Publishing PlatformData Analysis Platform
  • 10. doi:10.1186/2047-217X-1-18doi:10.5524/100038AnalysisData Methodsdoi:10.5524/100044+ =Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of aHan Chinese individual (version 2, 07/2012). GigaScience Database. R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improvedmemory-efficient short read de novo assembly”. GigaScience Database. R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novoassembler GigaScience, 1:18 (28th December 2012)
  • 11. CBIIT GigaGalaxy StructureToolDevelopment PublishingBiomedical and bioinformatics research
  • 12. What is SOAP?• SOAP - a tool package that provides full solution to NGS data analysis by BGI.
  • 13. SOAPdenovo2 tools An assembly tool for short reads generated from NGStechnology Four modules Pregraph: construct bruijn graph Contig: identification from overlapping sequence reads Map: reads onto contigs Scaff: generate final assembly results Generate 1. Contig and 2. Scaffold files
  • 14. SOAPdenovo2 in GigaGalaxy
  • 15. Integrate BGI SOAP tools into Giga-Galaxy
  • 16. Assembly Supporting Tools• SOAPfilter: removed reads with artifacts• Kmerfreq HA: a kmer frequency counter• Corrector HA: corrects sequencing errors in short reads• Gapcloser: close gaps in scaffolds
  • 17. Put them togetherSequencingDataSOAPfilter kmerFreq HACorrector HASOAPdenovo2GAGE evaluation
  • 18. Soapdenovo2 Workflow
  • 19. S. Aureus Dataset
  • 20. GAGE
  • 21. Visualization Tool: CONTIGuator2
  • 22. CONTIGuator2 output
  • 23. VisualizationNC_010079.pdfgi_161510924_ref_NC_010063.1_.pdf
  • 24. Help Center: Shared Data• Several Datasets are available from the shared data menufor test-running the tools.• Data Libraries• Published Workflows• Published Pages
  • 25. What is in the shared data menu?
  • 26. SOAPdenovo2 tutorial
  • 27. How is GigaScience supporting datareproducibility?Data setsAnalysesOpen-PaperOpen-ReviewDOI:10.1186/2047-217X-1-18~10000 accessesOpen-Code8 reviewers tested data in ftp server & named reports publishedDOI:10.5524/100044Open-PipelinesOpen-WorkflowsDOI:10.5524/100038Open-Data78GB CC0 dataCode in sourceforge under GPLv3: downloadsEnabled code to being picked apart by bloggers in wiki
  • 28. SOAPdenovo2 workflows implemented entire workflow in GigaGalaxy server, inc.:• 3 pre-processing steps• 4 SOAPdenovo modules• 1 post processing steps• Evaluation and visualization toolsWill be available for >25K Galaxy users in Galaxy Toolshed
  • 29. Acknowledgements• CUHK• Huayuan Gao• BGI-HK and GigaScience• Peter Li• Scott Edmunds• Galaxy team members