Illumina VariantStudio is a powerful annotation tool for analyzing and interpreting variants from NGS data. It allows users to import VCF or gVCF files, annotate variants using various databases, filter variants, classify variants, and generate customizable reports. VariantStudio streamlines the analysis workflow from raw data to meaningful biological insights.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
An introduction to the commonly used formats for the next-generation sequencing data. ngs.plot is a popular tool for the visualization and data mining of the NGS data.
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
An introduction to the commonly used formats for the next-generation sequencing data. ngs.plot is a popular tool for the visualization and data mining of the NGS data.
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
Computational biologist and Basepair founder, Dr. Amit Sinha (@ausinha) helps viewers navigate the world of RNA-Seq analysis. Topics include: Introduction to RNA-Seq, tools and workflows for analysis, visualization and figures, Q & A. More info at: https://www.basepairtech.com/
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
This presentation gives an introduction to analysing ChIP-seq data and is part of a bioinformatics workshop. The accompanying websites are available at http://sschmeier.github.io/bioinf-workshop/#!galaxy-chipseq/
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
Presentation to cover the data and file formats commonly used in next generation sequencing (high throughput sequencing) analyses. From nucleotide ambiguity codes, FASTA and FASTQ, quality scores to SAM and BAM, CIGAR strings and variant calling format. This was given as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels, Belgium in April 2016.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
An introduction to the tools and methods used for the bioinformatics analysis of ChIP-Seq data.
Written and delivered for the "Epigenetics and its applications in clinical research" course at the Karolinska Institute in Stockholm, Sweden.
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
Analysis of rare variants for population-level data is becoming a more common component of genomic research. Whether using exome chips, whole-exome sequencing, or even whole-genome sequencing, rare variation analysis requires a unique analytic perspective.
In this presentation, we will review some of the tools available in SVS for large sequenced cohorts including summarization, visualization, and statistical analysis of rare variants using KBAC, CMC, and other methods.
Special attention will be given to useful functions available for download from the SVS scripts repository.
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
Computational biologist and Basepair founder, Dr. Amit Sinha (@ausinha) helps viewers navigate the world of RNA-Seq analysis. Topics include: Introduction to RNA-Seq, tools and workflows for analysis, visualization and figures, Q & A. More info at: https://www.basepairtech.com/
The field of next-generation sequencing (NGS) has been experiencing explosive growth over the past several years and shows little sign of slowing down. The increasing capabilities and dramatically lowered costs have expanded NGS's reach beyond that of the human genome into nearly every corner of biological research. An overview of the platforms on the market today, including an assessment of their relative strengths and weaknesses, will be presented. The presentation will conclude with a peek into where the technology is going and what will be available in the future.
This presentation gives an introduction to analysing ChIP-seq data and is part of a bioinformatics workshop. The accompanying websites are available at http://sschmeier.github.io/bioinf-workshop/#!galaxy-chipseq/
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
Presentation to cover the data and file formats commonly used in next generation sequencing (high throughput sequencing) analyses. From nucleotide ambiguity codes, FASTA and FASTQ, quality scores to SAM and BAM, CIGAR strings and variant calling format. This was given as part of the EPIZONE Workshop on Next Generation Sequencing applications and Bioinformatics in Brussels, Belgium in April 2016.
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
An introduction to the tools and methods used for the bioinformatics analysis of ChIP-Seq data.
Written and delivered for the "Epigenetics and its applications in clinical research" course at the Karolinska Institute in Stockholm, Sweden.
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
Analysis of rare variants for population-level data is becoming a more common component of genomic research. Whether using exome chips, whole-exome sequencing, or even whole-genome sequencing, rare variation analysis requires a unique analytic perspective.
In this presentation, we will review some of the tools available in SVS for large sequenced cohorts including summarization, visualization, and statistical analysis of rare variants using KBAC, CMC, and other methods.
Special attention will be given to useful functions available for download from the SVS scripts repository.
What is Kafka? What is real time streaming? What is a data pipeline? What is a message queuing system? This presentation is the answer to these questions and the importance of a powerful real time streaming platform for data sciencists.
Flink for Everyone: Self-Service Data Analytics with StreamPipesApache StreamPipes
Flink Forward 2019
StreamPipes is an open source self-service IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams
https://streampipes.apache.org/
WRENCH enables novel avenues for scientific workflow use, research, development, and education. WRENCH capitalizes on recent and critical advances in the state of the art of distributed platform/application simulation. WRENCH builds on top of the open-source SimGrid simulation framework. SimGrid enables the simulation of large-scale distributed applications in a way that is accurate (via validated simulation models), scalable (low ratio of simulation time to simulated time, ability to run large simulations on a single computer with low compute, memory, and energy footprints), and expressive (ability to simulate arbitrary platform, application, and execution scenarios). WRENCH provides directly usable high-level simulation abstractions using SimGrid as a foundation. More information on https://wrench-project.org
In a nutshell, WRENCH makes it possible to:
- Prototype implementations of Workflow Management System (WMS) components and underlying algorithms;
- Quickly, scalably, and accurately simulate arbitrary workflow and platform scenarios for a simulated WMS implementation; and
- Run extensive experimental campaigns to conclusively compare workflow executions, platform architectures, and WMS algorithms and designs.
Open64 is an open source, optimizing compiler tool for Intel Itanium platform. It was released by SGI (Silicon Graphics, Inc) company and now mostly serves as a research platform for compiler and computer architecture research groups
Enabling Large Scale Sequencing Studies through Science as a ServiceJustin Johnson
Now
“Now” generation sequencing has drastically changed the traditional costs and infrastructure within the sequencing community. There are several technologies, platforms and algorithms that show promise, but it is not always intuitive where to start. This uncertainty is compounded by the fact that commonly used analysis tools are difficult to build, maintain, and run effectively. Sample acquisition and preparation is quickly becoming a bottleneck as projects move from small sample sizes to hundreds or even thousands of samples. We will present case studies highlighting information, methods, challenges and opportunities in leveraging large scale high throughput sequencing and bioinformatics. Specifically we will highlight a recent genome-wide study of methylation patterns in 1575 individuals with Schizophrenia. We will also discuss several cancer transcriptome and exome sequencing projects as well as a human pathogen transcriptome characterization project consisting of multiple organisms and almost a billion reads.
The Future
The Ion Torrent PGM machine is a very promising, rapid throughput, ultra scalable sequencer that could play an integral part in future human health studies. Applications such as microbial whole genome sequencing, metagenomic characterization of environmental and microbiome sample, and targeted resequencing projects stand to benefit from this technology over time. To date we have completed more than 25 runs on a single PGM and will comment on the setup as well as sequence data and analysis.
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Как устроить анализ данных 40 млн. человек за 5 лет так, чтобы это выглядело почти в реальном времени.
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsYaoyu Wang
WebMeV is a robust, open-source cloud based scalable data analysis software tool developed at the Dana-Farber Cancer Institute that uses intuitive visual interfaces to provide users with access to advanced data analysis methods. It will allow researchers and biotechnology companies considering tools for large scale genomic data analysis an alternative option to all the proprietary software.
Big Data Applications Made Easy: Fact Or Fiction?Glenn Renfro
With Spring XD the answer is Fact. In short Spring XD provides a one stop shop for writing and deploying Big Data Applications. It provides a scalable, fault tolerant, distributed runtime for Data Ingestion, Analytics, and Workflow Orchestration using a single programming, configuration and extensibility model. By reducing the complexity of Big Data development, developers can focus on the business problem.
In this discussion, we will cover:
• The basics of Spring XD
• Show how to deploy streams that will handle data received from multiple sources, and write the results to various sinks
• Capture some analytics from a live data stream
• Show how to create and execute Jobs
• Demonstrate the failover capabilities of a XD Cluster
• Discuss how to create your own custom modules
2. 2
Course Objectives
By the end of this course, you will be able to:
Illumina Data Analysis Overview
The Workflow in MiSeq Reporter
Powerful Annotation Tool - VariantStudio
Illumina iCloud - BaseSpace
Slidegeneratedfrom
HenryYen
5. 5
Alignments and
Variant Detection
Images/TIFF files
Base CallingIntensities
Outputs Outputs
Primary and Secondary Analysis Overview
Analysis Type
Primary Analysis
(RTA)
Secondary Analysis
(MSR / BaseSpace)
Sequencing
(MCS/NCS/HCS)
Slidegeneratedfrom
HenryYen
6. 6
MiSeq Analysis Workflow
RTA
Resequencing Amplicon Small RNA
De novo
Assembly
16S
Metagenomics
Base calls &
Quality Scores
Instrument
Control
Software
(MCS)
Images and Intensities
Limited Visualization via HTTP interface
Application-specific additional analysis
Alignment/FASTQ, Variants, Statistics
Enrichment
MiSeq Reporter
I’m All-in-One
Sequencer
Slidegeneratedfrom
HenryYen
7. 7
Why We use the MiSeq Reporter
Automatic
– Auto start after sequencing
Simply
– Start-to-end workflow
Powerful
– Support different analysis required
Friendly
– Graphical User Interface
Slidegeneratedfrom
HenryYen
9. 9
Workflows from MiSeq Reporter
AssemblyCapture-based Taxonomy
Reference Non Reference
Whole genome
Targeted-Seq
PCR-based
Resequencing
Library QC
Enrichment
Amplicon
Amplicon-DS
PCR-Amplicon
mtDNA
RNA
Small RNA
Targeted-RNA
De novo
Assembly
Metagenomics
MiSeq Reporter
Slidegeneratedfrom
HenryYen
10. 10
Resequencing Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to reference genome.
Variants are noted
Output the fastq, .bam, .vcf, .gVCF
Report the on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Report
Fastq file
BAM file
VCF file
PDF file
Duplicated Flag
Resequencing
Slidegeneratedfrom
HenryYen
11. 11
Library QC Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data by BWA.
Reads are aligned to reference genome.
Non Variants calling
Output the fastq, .bam,
Alignment
Indel Realignment
Bin / Sort
Alignment Statistics
Fastq file
BAM file
Duplicated Flag
LibraryQC
Slidegeneratedfrom
HenryYen
12. 12
Enrichment Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to targeted region.
Analyzed data from probe captured
Output the fastq, .bam, .vcf, .gVCF
Report the aligned rate, on-targeted rate, coverage & variants
summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Targeted Statistics
Fastq file
BAM file
VCF file
CSV file
Duplicated Flag
Targeted Region
Enrichment
Slidegeneratedfrom
HenryYen
13. 13
Amplicon Workflows
Adapter Masking
Reads Demultiplexing
Amplicon workflow:
Analyzed the data from short-range PCR.
Reads are aligned to targeted region.
Customer targeted design from Illumina
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
TruSeq Amplicon
Amplicon Viewer
Excel file
Slidegeneratedfrom
HenryYen
14. 14
Amplicon-DS Workflows
Adapter Masking
Reads Demultiplexing
Amplicon-DS workflow:
Analyzed the data from TruSight Tumor.
Variants check by double strand.
Filtering FFPE sample false-positive variants
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
(Somatic)
Fastq file
BAM file
VCF file
Targeted Region
Variants filtering
Amplicon-DS
Slidegeneratedfrom
HenryYen
15. 15
Two manifest file :
1. downstream locus-specific oligos (DLSO)
2. upstream locus-specific oligos (ULSO)
The DNA Deamination bias corrected
The Amplicon Double-Stranded workflow can remove the
FFPE sample DNA deamination bias (C -> T)Slidegeneratedfrom
HenryYen
16. 16
PCR Amplicon Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data from long-range PCR.
Reads are aligned to targeted region.
Targeted design by customer
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
Duplicated Flag
PCR AmpliconSlidegeneratedfrom
HenryYen
17. 17
mtDNA Workflows
Adapter Masking
Reads Demultiplexing
mtDNA workflow:
Analyzed the data by forensic.
Reads are aligned to rRCS.
Output the fastq, .bam, viewer file & excel file
It can be used to trace maternal lineage
Alignment with rRCS
Bin / Sort
Show by mtDNA viewer
Fastq file
BAM file
Excel file
Viewer file generated
Viewer file
mtDNA
Slidegeneratedfrom
HenryYen
18. 18
Small RNA Workflows
Adapter Masking
Reads Demultiplexing
Small RNA workflow:
Analyzed the data by Bowtie.
Reads are aligned to miRBase.
Non Variants calling
Output the fastq, .bam, pi chart & reads count for miRNA
Alignment
Bin / Sort
Reads count
Fastq file
BAM file
TXT file
Small RNASlidegeneratedfrom
HenryYen
19. 19
Targeted RNA Workflows
Adapter Masking
Reads Demultiplexing
Targeted RNA workflow::
Reads are aligned against custom manifest file (banded Smith-Waterman)
Reports relative expression of genes and isoforms between several samples
Outputs:
FASTQ, BAM, HTML report
Alignment
Bin / Sort
Different Expression Analysis
Fastq file
BAM file
HTML file
Targeted
RNA
Slidegeneratedfrom
HenryYen
20. 20
De novo assembly Workflows
Adapter Masking
Reads Demultiplexing
De novo Assembly workflow:
The data Assembly by Velvet.
Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
Output the fastq, .fasta & dot plot
Assembly
Indel Realignment
Dot plot
Fastq file
Fasta file
De Novo
Assembly
Slidegeneratedfrom
HenryYen
21. 21
Metagenomics Workflows
Adapter Masking
Reads Demultiplexing
Metagenomics workflow:
Bacteria population analysis based on 16S rRNA amplicons .
Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
Output the fastq, .fasta & dot plot
Reads Classification
Current Taxonomy
Pi chart
Fastq file
Fasta file
Metagenomics
Slidegeneratedfrom
HenryYen
22. 22
Greengenes database 13.5 (May 2013) to perform taxonomic classification
– http://greengenes.lbl.gov/
– Illumina-curated version
– Filter entries with 16S length <1250 bp
– Filter entries with incomplete annotation
Bayesian classification method to assign taxonomies
RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)
Short sub-sequences are extracted from each read and compared to the
database by the classifier
Uses full length Illumina paired-end reads
Classification down to genus/species-level
16S metagenomics in MiSeq Reporter 2.4
Slidegeneratedfrom
HenryYen
23. 23
Top 20 classification results
Ordered by Taxonomic level
New HTML Output in Metagenomics Workflow
Slidegeneratedfrom
HenryYen
24. 24
Read Stitch in MiSeq Reporter
≥ 10 bps
Read 1
Read 2
Stitch Read
MiSeq Reporter has the PE reads stitch function
Read 1 and Read 2 have minimum 10 bps overlapping
Bases Match Score need ≥ 0.9
Bases Match Score = 1- [Base Mismatch Rate]
Overlapping PE reads can be stitch one read.
Cannot be stitched PE reads are converted to two single reads in the FASTQfile.
Slidegeneratedfrom
HenryYen
26. 26
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate Filter Classify Report
• Intuitive user interface
• Rich annotations
• Flexible and comprehensive set of filters
• Streamlined variant classification
• Easy and customizable report generation
Insight
Slidegeneratedfrom
HenryYen
27. 27
Illumina VariantStudio Workflow
Data in, biological knowledge out
Import VCF or gVCF Files
Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client
Export Report of interpreted variants
VariantStudio
Annotation Database
Slidegeneratedfrom
HenryYen
28. 28
Annotation & Filtering
Leveraging a broad range of annotation sources to enrich data with
biological context
NHLBI
Exome Variant Server
1,000,000s
Detected Variants
1,000,000s
Detected Variants
10,000s
Coding Variants
10,000s
Coding Variants
100s
Deleterious
Variants
100s
Deleterious
Variants
Few
Causal
Variants
Few
Causal
Variants
Big Data
Easy to validate
Slidegeneratedfrom
HenryYen
29. 29
Clinical Panels and VariantStudio
Streamlined workflow from sample to report
Align +
Call Variant Annotate Filter
Generate
Report
Classify
Easy!! Correctly !! Rapid!!
Slidegeneratedfrom
HenryYen
32. 32
BaseSpace Creates a Sequencing Ecosystem
Accelerates Analysis and Sharing of Genomic Data
Electronic
Medical
Record
Electronic
Medical
Record
Medical
History
Medical
History
Drugs &
Immunization
Drugs &
Immunization
Patient
Schedule
Patient
Schedule
Reference
Content
Reference
Content
Lab DataLab Data
Genomic
Data
Diagnostic
Images
Diagnostic
Images
Scanned
Charts
Scanned
Charts
App Space
Public Databases
Slidegeneratedfrom
HenryYen
33. 33
Run data is automatically
sent to Projects in
BaseSpace
Runs and Projects have
separate permissions
Core labs will be able to
transfer ownership of a
project
Runs and Projects
Slidegeneratedfrom
HenryYen
34. 34
Enrichment Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
BWA Enrichment
ILLUMINA, INC
The core algorithms in the BWA Enrichment
workflow are the BWA Genome Alignment Software
and the GATK Variant Caller.
Isaac Enrichment
ILLUMINA, INC
The core algorithms in the Isaac
Enrichment workflow are the Isaac
Genome Alignment Software and the
Isaac Variant Caller.
Only for Human hg 19
Read length of at least 32bp
Support Paired-end run
Free
Slidegeneratedfrom
HenryYen
35. 35
Resequencing Analyzed Apps on BaseSpace
Push-Button, Step by Step App Analysis
BWA Whole Genome Sequencing
ILLUMINA, INC.
BWA/GATK Whole Genome Sequencing processes
whole-genome sequencing data using BWA for
alignment and variant detection using GATK.
Isaac Whole Genome Sequencing v2
ILLUMINA, INC.
The Isaac Whole Genome Sequencing workflow
performs read mapping using Isaac Genome
Alignment Software and Isaac Variant Detection
(SNVs, small indels, copy number anomalies and
structural variations).
HiSeq Isaac Human WGS Workflow
ILLUMINA INC.
Isaac Genome Alignment Software and Isaac
Variant Caller for human samples.
Free
Free
Free
Slidegeneratedfrom
HenryYen
36. 36
About 12 species reference genome to aligned
Read length 21 ~ 150 bps
( Isaac is 35 ~150bps)
Support the Paired end runs
Does not support the Mate-paired reads
Detected CNV & Structure Variants result
[VCF file]
Isaac & BWA Whole Genome Sequencing
ILLUMINA, INC
Whole genome Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Slidegeneratedfrom
HenryYen
37. 37
Tumor/Normal Paired Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Tumor Normal
ILLUMINA, INC
The Tumor/Normal Sequencing App is designed to detect somatic
variants from a tumor and matched normal sample pair
Only support human hg 19
Read length 50 ~ 150 bps
Support the Paired end runs
40X for normal sample & 80X for tumor
(recommend)
Detected the somatic mutation in tumor
Free
Slidegeneratedfrom
HenryYen
38. 38
16S Metagenomics
ILLUMINA, INC.
The 16S Metagenomics app performs taxonomic
classification of 16S rRNA targeted amplicon reads
using an Illumina-curated version of the
GreenGenes taxonomic database.
16s Metagenomics Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
39. 39
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Align, assemble & analyze reads
DNASTAR, INC.
DNASTAR software for comprehensive next-gen
sequence assembly and analysis.
Assemble bacteria de novo - FREE
DNASTAR, INC.
DNASTAR SeqMan NGen allows you to perform
de novo assembly of bacterial genome
sequences.
Slidegeneratedfrom
HenryYen
40. 40
SPAdes
ALGORITHMIC BIOLOGY LAB
SPAdes 3.0 - St. Petersburg Genome Assembler -
is intended for both standard isolates and single-
cell MDA bacterial assemblies.
BayesHammer + SPAdes
BayesHammer – read error correction tool, which works well on both single-cell and standard data sets.
SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the
set of K-mer length values depending on the reads length.
Support MDA (Multiple displacement
amplification) singel-cell bacterial
assemblies
Supports paired-end reads, mate-pairs
and unpaired reads.
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
41. 41
The Algorithm for de Bruijn graph
You should setting the K-mer
in your assemblies
Slidegeneratedfrom
HenryYen
42. 4242
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Software : TopHat2 v2.0.7
Aligner : Bowtie 0.12.9
Assembly & Gene Expression : Cufflinks 2.1.1
Variants Caller: Isaac Variant Caller 2.0.5
Alignment Statistics : Picard tools 1.72
What can the App do ?
A. Alignment to hg19 human genome
B. FPKM value for genes or transcripts
C. Splice Junctions & fusions gene detect
D. cSNPs finding
E. Different expression gene discovery
TopHat Alignment Cufflinks Assembly & DE
FreeSlidegeneratedfrom
HenryYen
43. 43
Support 3 species (Human, Mouse, Rat)
Can call gene fusion
Only can trim adapter from TruSeq
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Slidegeneratedfrom
HenryYen
44. 44
Biological Interpretation for RNA-seq Data in BaseSpace
FreeiPathwayGuide (Supports Human datasets only)
ADVAITA BIO
An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will
perform the following analyses:
DE Gene Analysis
Gene Ontology Analysis for Biological Processes, Molecular Functions,
and Cellular Components
Pathway Analysis with Impact Analysis modeled on KEGG Pathways
Coherent Cascade Analysis on Pathways
Downstream Gene Perturbation Analysis
Drug Interaction Analysis
Disease Analysis based on enrichment
Slidegeneratedfrom
HenryYen
45. 4545
Overview the Core Apps for BaseSpace
BWA Enrichment
BWA Whole Genome Sequencing
Tumor Normal Paired
TopHat Alignment
Cufflinks Assembly & DE
Slidegeneratedfrom
HenryYen
46. 4646
BaseSpace Onsite System
Easy to Use from
sample to Answer
Secure, Safe and
Local Environment
Push-Button Data
Processing
Two 6 cores CPUs with 128GB RAM
Can only do the LIMS for NextSeq 500 now!!
(Support The HiSeq & MiSeq system in future)
RNA-seq
Exome-seq
Whole genome Analysis
Tumor & Normal Paired
Slidegeneratedfrom
HenryYen
47. 4747
Summary
Workflow MSR Local
Version
BaseSpace
Version
Amplicon – DS 2.4 N/A
Assembly 2.4 2.2
Enrichment 2.4 2.2
Generate FASTQ 2.4 2.2
Library QC 2.4 2.2
Metagenomics 2.4 2.2
PCR Amplicon 2.4 2.2
Resequencing 2.4 2.2
Small RNA 2.4 2.2
Targeted RNA 2.4 N/A
TruSeq Amplicon 2.4 2.2
BaseSpace Dual Mode Replicates
Analysis Locally on MiSeq
• Selectable option in MCS
• Allows customers to compare and
evaluate MSR Local vs. BaseSpace
• Retains local copy of all files for
customers reluctant to rely on 100%
remote storage
Slidegeneratedfrom
HenryYen