SlideShare a Scribd company logo
1 of 31
Next-Gen Sequencing Analysis
by GigaGalaxy
Tin-Lap, LEE
School of Biomedical Sciences
CUHK-BGI Innovation Institute of Trans-omics,
The Chinese University of Hong Kong
CUHK-BGI Innovation Institute of Trans-Omics (CBIIT)
• Jointly established between The Chinese
University of Hong Kong (CUHK) and BGI
in July 2011.
• “We aim to provide a platform conductive
to training of multi-disciplinary talents
conversant with the knowledge and
application of genomics, proteomics,
genetics, computation biology and
bioinformatics, by capitalizing on both
institutions’ expertise and strengths in
genomic science.”
Galaxy
http://galaxyproject.org/
www.gigasciencejournal.com
Journal, data-platform and
database for large-scale data
Editor-in-Chief: Laurie Goodman
Executive Editor: Scott Edmunds
Commissioning Editor: Nicole Nogoy
Lead Curator: Chris Hunter
Data Platform: Peter Li
in conjunction with
GigaDB
Giga-Galaxy
 Collaboration between GigaScience and CBIIT
 A publicly accessible Galaxy Servers
 Share some of the workload of the main Galaxy server
 Host data and workflows published in GigaScience, particularly involving
NGS data analysis
 SOAP package: advantages from GigaGalaxy
 Application Instance: SOAPdenovo2 tool
http://www.cuhk.edu.hk/cbiit/galaxy.html
Galaxy/CUHK-BGI
Import data from GigaDB to GigaGalaxy
GigaSolution: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
galaxy.cbiit.cuhk.edu.hk
Combines and integrates:
Open-access journal
Data Publishing Platform
Data Analysis Platform
doi:10.1186/2047-217X-1-18doi:10.5524/100038
AnalysisData Methods
doi:10.5524/100044+ =
Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a
Han Chinese individual (version 2, 07/2012). GigaScience Database.
http://dx.doi.org/10.5524/100038
Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved
memory-efficient short read de novo assembly”. GigaScience Database.
http://dx.doi.org/10.5524/100044
Data
Methods
Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo
assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18
Analysis
Example
CBIIT GigaGalaxy Structure
Tool
Development PublishingBiomedical and bioinformatics research
What is SOAP?
• SOAP - a tool package that provides full solution to NGS data analysis by BGI.
http://soap.genomics.org.cn/
SOAPdenovo2 tools
 An assembly tool for short reads generated from NGS
technology
 Four modules
 Pregraph: construct bruijn graph
 Contig: identification from overlapping sequence reads
 Map: reads onto contigs
 Scaff: generate final assembly results
 Generate 1. Contig and 2. Scaffold files
SOAPdenovo2 in GigaGalaxy
Integrate BGI SOAP tools into Giga-Galaxy
Assembly Supporting Tools
• SOAPfilter: removed reads with artifacts
• Kmerfreq HA: a kmer frequency counter
• Corrector HA: corrects sequencing errors in short reads
• Gapcloser: close gaps in scaffolds
Put them together
Sequencing
Data
SOAPfilter kmerFreq HA
Corrector HASOAPdenovo2GAGE evaluation
Soapdenovo2 Workflow
S. Aureus Dataset
GAGE
Visualization Tool: CONTIGuator2
CONTIGuator2 output
Visualization
NC_010079.pdf
gi_161510924_ref_NC_010063.1_.pdf
Help Center: Shared Data
• Several Datasets are available from the shared data menu
for test-running the tools.
• Data Libraries
• Published Workflows
• Published Pages
What is in the shared data menu?
SOAPdenovo2 tutorial
How is GigaScience supporting data
reproducibility?
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
~10000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/
~5000 downloads
Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2
SOAPdenovo2 workflows implemented in
galaxy.cbiit.cuhk.edu.hk
Implemented entire workflow in GigaGalaxy server, inc.:
• 3 pre-processing steps
• 4 SOAPdenovo modules
• 1 post processing steps
• Evaluation and visualization tools
Will be available for >25K Galaxy users in Galaxy Toolshed
Acknowledgements
• CUHK
• Huayuan Gao
• BGI-HK and GigaScience
• Peter Li
• Scott Edmunds
• Galaxy team members

More Related Content

Viewers also liked

Viewers also liked (11)

Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
 Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
 
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat PumpsAlldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
 
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
 
Element14 India Private Limited, Bengaluru
Element14 India Private Limited, BengaluruElement14 India Private Limited, Bengaluru
Element14 India Private Limited, Bengaluru
 
Techno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & TransformerTechno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & Transformer
 
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator SunglassesWink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
DNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging SolutionsDNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging Solutions
 
Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?
 

Similar to Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Similar to Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy (20)

Global Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
 
Global Network Advancement Group Next Generation Network-Integrated Sys...
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
 
Ogf27 Ligo
Ogf27 LigoOgf27 Ligo
Ogf27 Ligo
 
Grid computing
Grid computingGrid computing
Grid computing
 
C02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analyticsC02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analytics
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
 
COBWEB technology platform and future development needs
COBWEB technology platform and future development needsCOBWEB technology platform and future development needs
COBWEB technology platform and future development needs
 
Indiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway SupportIndiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway Support
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
G3 talk rld_2
G3 talk rld_2G3 talk rld_2
G3 talk rld_2
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

  • 1. Next-Gen Sequencing Analysis by GigaGalaxy Tin-Lap, LEE School of Biomedical Sciences CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong
  • 2. CUHK-BGI Innovation Institute of Trans-Omics (CBIIT) • Jointly established between The Chinese University of Hong Kong (CUHK) and BGI in July 2011. • “We aim to provide a platform conductive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics, computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”
  • 4. www.gigasciencejournal.com Journal, data-platform and database for large-scale data Editor-in-Chief: Laurie Goodman Executive Editor: Scott Edmunds Commissioning Editor: Nicole Nogoy Lead Curator: Chris Hunter Data Platform: Peter Li in conjunction with
  • 6. Giga-Galaxy  Collaboration between GigaScience and CBIIT  A publicly accessible Galaxy Servers  Share some of the workload of the main Galaxy server  Host data and workflows published in GigaScience, particularly involving NGS data analysis  SOAP package: advantages from GigaGalaxy  Application Instance: SOAPdenovo2 tool
  • 8. Import data from GigaDB to GigaGalaxy
  • 9. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com galaxy.cbiit.cuhk.edu.hk Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
  • 10. doi:10.1186/2047-217X-1-18doi:10.5524/100038 AnalysisData Methods doi:10.5524/100044+ = Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. http://dx.doi.org/10.5524/100038 Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. http://dx.doi.org/10.5524/100044 Data Methods Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18 Analysis Example
  • 11.
  • 12. CBIIT GigaGalaxy Structure Tool Development PublishingBiomedical and bioinformatics research
  • 13. What is SOAP? • SOAP - a tool package that provides full solution to NGS data analysis by BGI. http://soap.genomics.org.cn/
  • 14. SOAPdenovo2 tools  An assembly tool for short reads generated from NGS technology  Four modules  Pregraph: construct bruijn graph  Contig: identification from overlapping sequence reads  Map: reads onto contigs  Scaff: generate final assembly results  Generate 1. Contig and 2. Scaffold files
  • 16. Integrate BGI SOAP tools into Giga-Galaxy
  • 17. Assembly Supporting Tools • SOAPfilter: removed reads with artifacts • Kmerfreq HA: a kmer frequency counter • Corrector HA: corrects sequencing errors in short reads • Gapcloser: close gaps in scaffolds
  • 18. Put them together Sequencing Data SOAPfilter kmerFreq HA Corrector HASOAPdenovo2GAGE evaluation
  • 21. GAGE
  • 25. Help Center: Shared Data • Several Datasets are available from the shared data menu for test-running the tools. • Data Libraries • Published Workflows • Published Pages
  • 26. What is in the shared data menu?
  • 28.
  • 29. How is GigaScience supporting data reproducibility? Data sets Analyses Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 ~10000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/ ~5000 downloads Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2
  • 30. SOAPdenovo2 workflows implemented in galaxy.cbiit.cuhk.edu.hk Implemented entire workflow in GigaGalaxy server, inc.: • 3 pre-processing steps • 4 SOAPdenovo modules • 1 post processing steps • Evaluation and visualization tools Will be available for >25K Galaxy users in Galaxy Toolshed
  • 31. Acknowledgements • CUHK • Huayuan Gao • BGI-HK and GigaScience • Peter Li • Scott Edmunds • Galaxy team members

Editor's Notes

  1. Galaxy is a web-based data analysis platform developed by PSUAccessible, Reproducible, and transparentEasy to use, no command line, much shorter learning curve for biologists
  2. The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.
  3. The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference. Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP.