SlideShare a Scribd company logo
Next-Gen Sequencing Analysis
by GigaGalaxy
Tin-Lap, LEE
School of Biomedical Sciences
CUHK-BGI Innovation Institute of Trans-omics,
The Chinese University of Hong Kong
CUHK-BGI Innovation Institute of Trans-Omics (CBIIT)
• Jointly established between The Chinese
University of Hong Kong (CUHK) and BGI
in July 2011.
• “We aim to provide a platform conductive
to training of multi-disciplinary talents
conversant with the knowledge and
application of genomics, proteomics,
genetics, computation biology and
bioinformatics, by capitalizing on both
institutions’ expertise and strengths in
genomic science.”
Galaxy
http://galaxyproject.org/
www.gigasciencejournal.com
Journal, data-platform and
database for large-scale data
Editor-in-Chief: Laurie Goodman
Executive Editor: Scott Edmunds
Commissioning Editor: Nicole Nogoy
Lead Curator: Chris Hunter
Data Platform: Peter Li
in conjunction with
GigaDB
Giga-Galaxy
 Collaboration between GigaScience and CBIIT
 A publicly accessible Galaxy Servers
 Share some of the workload of the main Galaxy server
 Host data and workflows published in GigaScience, particularly involving
NGS data analysis
 SOAP package: advantages from GigaGalaxy
 Application Instance: SOAPdenovo2 tool
http://www.cuhk.edu.hk/cbiit/galaxy.html
Galaxy/CUHK-BGI
Import data from GigaDB to GigaGalaxy
GigaSolution: deconstructing the paper
www.gigadb.org
www.gigasciencejournal.com
galaxy.cbiit.cuhk.edu.hk
Combines and integrates:
Open-access journal
Data Publishing Platform
Data Analysis Platform
doi:10.1186/2047-217X-1-18doi:10.5524/100038
AnalysisData Methods
doi:10.5524/100044+ =
Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a
Han Chinese individual (version 2, 07/2012). GigaScience Database.
http://dx.doi.org/10.5524/100038
Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved
memory-efficient short read de novo assembly”. GigaScience Database.
http://dx.doi.org/10.5524/100044
Data
Methods
Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo
assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18
Analysis
Example
CBIIT GigaGalaxy Structure
Tool
Development PublishingBiomedical and bioinformatics research
What is SOAP?
• SOAP - a tool package that provides full solution to NGS data analysis by BGI.
http://soap.genomics.org.cn/
SOAPdenovo2 tools
 An assembly tool for short reads generated from NGS
technology
 Four modules
 Pregraph: construct bruijn graph
 Contig: identification from overlapping sequence reads
 Map: reads onto contigs
 Scaff: generate final assembly results
 Generate 1. Contig and 2. Scaffold files
SOAPdenovo2 in GigaGalaxy
Integrate BGI SOAP tools into Giga-Galaxy
Assembly Supporting Tools
• SOAPfilter: removed reads with artifacts
• Kmerfreq HA: a kmer frequency counter
• Corrector HA: corrects sequencing errors in short reads
• Gapcloser: close gaps in scaffolds
Put them together
Sequencing
Data
SOAPfilter kmerFreq HA
Corrector HASOAPdenovo2GAGE evaluation
Soapdenovo2 Workflow
S. Aureus Dataset
GAGE
Visualization Tool: CONTIGuator2
CONTIGuator2 output
Visualization
NC_010079.pdf
gi_161510924_ref_NC_010063.1_.pdf
Help Center: Shared Data
• Several Datasets are available from the shared data menu
for test-running the tools.
• Data Libraries
• Published Workflows
• Published Pages
What is in the shared data menu?
SOAPdenovo2 tutorial
How is GigaScience supporting data
reproducibility?
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
~10000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/
~5000 downloads
Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2
SOAPdenovo2 workflows implemented in
galaxy.cbiit.cuhk.edu.hk
Implemented entire workflow in GigaGalaxy server, inc.:
• 3 pre-processing steps
• 4 SOAPdenovo modules
• 1 post processing steps
• Evaluation and visualization tools
Will be available for >25K Galaxy users in Galaxy Toolshed
Acknowledgements
• CUHK
• Huayuan Gao
• BGI-HK and GigaScience
• Peter Li
• Scott Edmunds
• Galaxy team members

More Related Content

Viewers also liked

Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
GigaScience, BGI Hong Kong
 
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
 Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
IndiaMART InterMESH Limited
 
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat PumpsAlldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
IndiaMART InterMESH Limited
 
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
IndiaMART InterMESH Limited
 
Element14 India Private Limited, Bengaluru
Element14 India Private Limited, BengaluruElement14 India Private Limited, Bengaluru
Element14 India Private Limited, Bengaluru
IndiaMART InterMESH Limited
 
Techno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & TransformerTechno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & Transformer
IndiaMART InterMESH Limited
 
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator SunglassesWink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
IndiaMART InterMESH Limited
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
GigaScience, BGI Hong Kong
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience, BGI Hong Kong
 
DNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging SolutionsDNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging Solutions
IndiaMART InterMESH Limited
 
Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?
Richard Tubb
 

Viewers also liked (11)

Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
 Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
Puneet Laboratories Pvt. Ltd. Mumbai, Mumbai, Zinc Carnosine Capsules
 
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat PumpsAlldelite Heat Pumps Limited, Chennai, Heat Pumps
Alldelite Heat Pumps Limited, Chennai, Heat Pumps
 
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
Unique SPM Solutions & Engineering, Ghaziabad , Broaching Machines
 
Element14 India Private Limited, Bengaluru
Element14 India Private Limited, BengaluruElement14 India Private Limited, Bengaluru
Element14 India Private Limited, Bengaluru
 
Techno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & TransformerTechno Electronics System, Delhi, DC Motor & Transformer
Techno Electronics System, Delhi, DC Motor & Transformer
 
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator SunglassesWink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
Wink Lifestyles Pvt. Ltd., Mumbai, Aviator Sunglasses
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
DNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging SolutionsDNV Creations, New Delhi, Wood Packaging Solutions
DNV Creations, New Delhi, Wood Packaging Solutions
 
Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?Channel Co-operation - A Distant Dream?
Channel Co-operation - A Distant Dream?
 

Similar to Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

Global Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
Larry Smarr
 
Global Network Advancement Group Next Generation Network-Integrated Sys...
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
Larry Smarr
 
Ogf27 Ligo
Ogf27 LigoOgf27 Ligo
Ogf27 Ligo
kentblackburn
 
Grid computing
Grid computingGrid computing
Grid computing
Ramraj Choudhary
 
C02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analyticsC02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analytics
Bioinformatics Open Source Conference
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC
 
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
GIS in the Rockies
 
COBWEB technology platform and future development needs
COBWEB technology platform and future development needsCOBWEB technology platform and future development needs
COBWEB technology platform and future development needs
EDINA, University of Edinburgh
 
Indiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway SupportIndiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway Support
marpierc
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
BioNLPSADI
BioNLPSADIBioNLPSADI
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
Larry Smarr
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
stratuslab
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
Globus
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
marpierc
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
Luis Bermudez
 
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Blue BRIDGE
 
G3 talk rld_2
G3 talk rld_2G3 talk rld_2
G3 talk rld_2
Robert Davidson
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 

Similar to Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy (20)

Global Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
 
Global Network Advancement Group Next Generation Network-Integrated Sys...
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
 
Ogf27 Ligo
Ogf27 LigoOgf27 Ligo
Ogf27 Ligo
 
Grid computing
Grid computingGrid computing
Grid computing
 
C02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analyticsC02-Visualization-Applying visual analytics
C02-Visualization-Applying visual analytics
 
OpenACC and Hackathons Monthly Highlights: April 2023
OpenACC and Hackathons Monthly Highlights: April  2023OpenACC and Hackathons Monthly Highlights: April  2023
OpenACC and Hackathons Monthly Highlights: April 2023
 
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
2015 FOSS4G Track: Open Specifications for the Storage, Transport and Process...
 
COBWEB technology platform and future development needs
COBWEB technology platform and future development needsCOBWEB technology platform and future development needs
COBWEB technology platform and future development needs
 
Indiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway SupportIndiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway Support
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
G3 talk rld_2
G3 talk rld_2G3 talk rld_2
G3 talk rld_2
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
GigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
GigaScience, BGI Hong Kong
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
GigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
GigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
GigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
GigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 

Tin-Lap Lee: Next-Gen Sequencing Analysis by GigaGalaxy

  • 1. Next-Gen Sequencing Analysis by GigaGalaxy Tin-Lap, LEE School of Biomedical Sciences CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong
  • 2. CUHK-BGI Innovation Institute of Trans-Omics (CBIIT) • Jointly established between The Chinese University of Hong Kong (CUHK) and BGI in July 2011. • “We aim to provide a platform conductive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics, computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”
  • 4. www.gigasciencejournal.com Journal, data-platform and database for large-scale data Editor-in-Chief: Laurie Goodman Executive Editor: Scott Edmunds Commissioning Editor: Nicole Nogoy Lead Curator: Chris Hunter Data Platform: Peter Li in conjunction with
  • 6. Giga-Galaxy  Collaboration between GigaScience and CBIIT  A publicly accessible Galaxy Servers  Share some of the workload of the main Galaxy server  Host data and workflows published in GigaScience, particularly involving NGS data analysis  SOAP package: advantages from GigaGalaxy  Application Instance: SOAPdenovo2 tool
  • 8. Import data from GigaDB to GigaGalaxy
  • 9. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com galaxy.cbiit.cuhk.edu.hk Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
  • 10. doi:10.1186/2047-217X-1-18doi:10.5524/100038 AnalysisData Methods doi:10.5524/100044+ = Wang J et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. http://dx.doi.org/10.5524/100038 Luo R et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. http://dx.doi.org/10.5524/100044 Data Methods Luo R et al., (2012): SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience, 1:18 (28th December 2012) http://dx.doi.org/10.1186/2047-217X-1-18 Analysis Example
  • 11.
  • 12. CBIIT GigaGalaxy Structure Tool Development PublishingBiomedical and bioinformatics research
  • 13. What is SOAP? • SOAP - a tool package that provides full solution to NGS data analysis by BGI. http://soap.genomics.org.cn/
  • 14. SOAPdenovo2 tools  An assembly tool for short reads generated from NGS technology  Four modules  Pregraph: construct bruijn graph  Contig: identification from overlapping sequence reads  Map: reads onto contigs  Scaff: generate final assembly results  Generate 1. Contig and 2. Scaffold files
  • 16. Integrate BGI SOAP tools into Giga-Galaxy
  • 17. Assembly Supporting Tools • SOAPfilter: removed reads with artifacts • Kmerfreq HA: a kmer frequency counter • Corrector HA: corrects sequencing errors in short reads • Gapcloser: close gaps in scaffolds
  • 18. Put them together Sequencing Data SOAPfilter kmerFreq HA Corrector HASOAPdenovo2GAGE evaluation
  • 21. GAGE
  • 25. Help Center: Shared Data • Several Datasets are available from the shared data menu for test-running the tools. • Data Libraries • Published Workflows • Published Pages
  • 26. What is in the shared data menu?
  • 28.
  • 29. How is GigaScience supporting data reproducibility? Data sets Analyses Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 ~10000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/ ~5000 downloads Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2
  • 30. SOAPdenovo2 workflows implemented in galaxy.cbiit.cuhk.edu.hk Implemented entire workflow in GigaGalaxy server, inc.: • 3 pre-processing steps • 4 SOAPdenovo modules • 1 post processing steps • Evaluation and visualization tools Will be available for >25K Galaxy users in Galaxy Toolshed
  • 31. Acknowledgements • CUHK • Huayuan Gao • BGI-HK and GigaScience • Peter Li • Scott Edmunds • Galaxy team members

Editor's Notes

  1. Galaxy is a web-based data analysis platform developed by PSUAccessible, Reproducible, and transparentEasy to use, no command line, much shorter learning curve for biologists
  2. The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.
  3. The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference. Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP.