2. 2
Course Objectives
By the end of this course, you will be able to:
Illumina Data Analysis Overview
The Workflow in MiSeq Reporter
Powerful Annotation Tool - VariantStudio
Illumina iCloud - BaseSpace
Slidegeneratedfrom
HenryYen
5. 5
Alignments and
Variant Detection
Images/TIFF files
Base CallingIntensities
Outputs Outputs
Primary and Secondary Analysis Overview
Analysis Type
Primary Analysis
(RTA)
Secondary Analysis
(MSR / BaseSpace)
Sequencing
(MCS/NCS/HCS)
Slidegeneratedfrom
HenryYen
6. 6
MiSeq Analysis Workflow
RTA
Resequencing Amplicon Small RNA
De novo
Assembly
16S
Metagenomics
Base calls &
Quality Scores
Instrument
Control
Software
(MCS)
Images and Intensities
Limited Visualization via HTTP interface
Application-specific additional analysis
Alignment/FASTQ, Variants, Statistics
Enrichment
MiSeq Reporter
I’m All-in-One
Sequencer
Slidegeneratedfrom
HenryYen
7. 7
Why We use the MiSeq Reporter
Automatic
– Auto start after sequencing
Simply
– Start-to-end workflow
Powerful
– Support different analysis required
Friendly
– Graphical User Interface
Slidegeneratedfrom
HenryYen
9. 9
Workflows from MiSeq Reporter
AssemblyCapture-based Taxonomy
Reference Non Reference
Whole genome
Targeted-Seq
PCR-based
Resequencing
Library QC
Enrichment
Amplicon
Amplicon-DS
PCR-Amplicon
mtDNA
RNA
Small RNA
Targeted-RNA
De novo
Assembly
Metagenomics
MiSeq Reporter
Slidegeneratedfrom
HenryYen
10. 10
Resequencing Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to reference genome.
Variants are noted
Output the fastq, .bam, .vcf, .gVCF
Report the on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Report
Fastq file
BAM file
VCF file
PDF file
Duplicated Flag
Resequencing
Slidegeneratedfrom
HenryYen
11. 11
Library QC Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data by BWA.
Reads are aligned to reference genome.
Non Variants calling
Output the fastq, .bam,
Alignment
Indel Realignment
Bin / Sort
Alignment Statistics
Fastq file
BAM file
Duplicated Flag
LibraryQC
Slidegeneratedfrom
HenryYen
12. 12
Enrichment Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to targeted region.
Analyzed data from probe captured
Output the fastq, .bam, .vcf, .gVCF
Report the aligned rate, on-targeted rate, coverage & variants
summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Targeted Statistics
Fastq file
BAM file
VCF file
CSV file
Duplicated Flag
Targeted Region
Enrichment
Slidegeneratedfrom
HenryYen
13. 13
Amplicon Workflows
Adapter Masking
Reads Demultiplexing
Amplicon workflow:
Analyzed the data from short-range PCR.
Reads are aligned to targeted region.
Customer targeted design from Illumina
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
TruSeq Amplicon
Amplicon Viewer
Excel file
Slidegeneratedfrom
HenryYen
14. 14
Amplicon-DS Workflows
Adapter Masking
Reads Demultiplexing
Amplicon-DS workflow:
Analyzed the data from TruSight Tumor.
Variants check by double strand.
Filtering FFPE sample false-positive variants
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
(Somatic)
Fastq file
BAM file
VCF file
Targeted Region
Variants filtering
Amplicon-DS
Slidegeneratedfrom
HenryYen
15. 15
Two manifest file :
1. downstream locus-specific oligos (DLSO)
2. upstream locus-specific oligos (ULSO)
The DNA Deamination bias corrected
The Amplicon Double-Stranded workflow can remove the
FFPE sample DNA deamination bias (C -> T)Slidegeneratedfrom
HenryYen
16. 16
PCR Amplicon Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data from long-range PCR.
Reads are aligned to targeted region.
Targeted design by customer
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
Duplicated Flag
PCR AmpliconSlidegeneratedfrom
HenryYen
17. 17
mtDNA Workflows
Adapter Masking
Reads Demultiplexing
mtDNA workflow:
Analyzed the data by forensic.
Reads are aligned to rRCS.
Output the fastq, .bam, viewer file & excel file
It can be used to trace maternal lineage
Alignment with rRCS
Bin / Sort
Show by mtDNA viewer
Fastq file
BAM file
Excel file
Viewer file generated
Viewer file
mtDNA
Slidegeneratedfrom
HenryYen
18. 18
Small RNA Workflows
Adapter Masking
Reads Demultiplexing
Small RNA workflow:
Analyzed the data by Bowtie.
Reads are aligned to miRBase.
Non Variants calling
Output the fastq, .bam, pi chart & reads count for miRNA
Alignment
Bin / Sort
Reads count
Fastq file
BAM file
TXT file
Small RNASlidegeneratedfrom
HenryYen
19. 19
Targeted RNA Workflows
Adapter Masking
Reads Demultiplexing
Targeted RNA workflow::
Reads are aligned against custom manifest file (banded Smith-Waterman)
Reports relative expression of genes and isoforms between several samples
Outputs:
FASTQ, BAM, HTML report
Alignment
Bin / Sort
Different Expression Analysis
Fastq file
BAM file
HTML file
Targeted
RNA
Slidegeneratedfrom
HenryYen
20. 20
De novo assembly Workflows
Adapter Masking
Reads Demultiplexing
De novo Assembly workflow:
The data Assembly by Velvet.
Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
Output the fastq, .fasta & dot plot
Assembly
Indel Realignment
Dot plot
Fastq file
Fasta file
De Novo
Assembly
Slidegeneratedfrom
HenryYen
21. 21
Metagenomics Workflows
Adapter Masking
Reads Demultiplexing
Metagenomics workflow:
Bacteria population analysis based on 16S rRNA amplicons .
Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
Output the fastq, .fasta & dot plot
Reads Classification
Current Taxonomy
Pi chart
Fastq file
Fasta file
Metagenomics
Slidegeneratedfrom
HenryYen
22. 22
Greengenes database 13.5 (May 2013) to perform taxonomic classification
– http://greengenes.lbl.gov/
– Illumina-curated version
– Filter entries with 16S length <1250 bp
– Filter entries with incomplete annotation
Bayesian classification method to assign taxonomies
RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)
Short sub-sequences are extracted from each read and compared to the
database by the classifier
Uses full length Illumina paired-end reads
Classification down to genus/species-level
16S metagenomics in MiSeq Reporter 2.4
Slidegeneratedfrom
HenryYen
23. 23
Top 20 classification results
Ordered by Taxonomic level
New HTML Output in Metagenomics Workflow
Slidegeneratedfrom
HenryYen
24. 24
Read Stitch in MiSeq Reporter
≥ 10 bps
Read 1
Read 2
Stitch Read
MiSeq Reporter has the PE reads stitch function
Read 1 and Read 2 have minimum 10 bps overlapping
Bases Match Score need ≥ 0.9
Bases Match Score = 1- [Base Mismatch Rate]
Overlapping PE reads can be stitch one read.
Cannot be stitched PE reads are converted to two single reads in the FASTQfile.
Slidegeneratedfrom
HenryYen
26. 26
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate Filter Classify Report
• Intuitive user interface
• Rich annotations
• Flexible and comprehensive set of filters
• Streamlined variant classification
• Easy and customizable report generation
Insight
Slidegeneratedfrom
HenryYen
27. 27
Illumina VariantStudio Workflow
Data in, biological knowledge out
Import VCF or gVCF Files
Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client
Export Report of interpreted variants
VariantStudio
Annotation Database
Slidegeneratedfrom
HenryYen
28. 28
Annotation & Filtering
Leveraging a broad range of annotation sources to enrich data with
biological context
NHLBI
Exome Variant Server
1,000,000s
Detected Variants
1,000,000s
Detected Variants
10,000s
Coding Variants
10,000s
Coding Variants
100s
Deleterious
Variants
100s
Deleterious
Variants
Few
Causal
Variants
Few
Causal
Variants
Big Data
Easy to validate
Slidegeneratedfrom
HenryYen
29. 29
Clinical Panels and VariantStudio
Streamlined workflow from sample to report
Align +
Call Variant Annotate Filter
Generate
Report
Classify
Easy!! Correctly !! Rapid!!
Slidegeneratedfrom
HenryYen
32. 32
BaseSpace Creates a Sequencing Ecosystem
Accelerates Analysis and Sharing of Genomic Data
Electronic
Medical
Record
Electronic
Medical
Record
Medical
History
Medical
History
Drugs &
Immunization
Drugs &
Immunization
Patient
Schedule
Patient
Schedule
Reference
Content
Reference
Content
Lab DataLab Data
Genomic
Data
Diagnostic
Images
Diagnostic
Images
Scanned
Charts
Scanned
Charts
App Space
Public Databases
Slidegeneratedfrom
HenryYen
33. 33
Run data is automatically
sent to Projects in
BaseSpace
Runs and Projects have
separate permissions
Core labs will be able to
transfer ownership of a
project
Runs and Projects
Slidegeneratedfrom
HenryYen
34. 34
Enrichment Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
BWA Enrichment
ILLUMINA, INC
The core algorithms in the BWA Enrichment
workflow are the BWA Genome Alignment Software
and the GATK Variant Caller.
Isaac Enrichment
ILLUMINA, INC
The core algorithms in the Isaac
Enrichment workflow are the Isaac
Genome Alignment Software and the
Isaac Variant Caller.
Only for Human hg 19
Read length of at least 32bp
Support Paired-end run
Free
Slidegeneratedfrom
HenryYen
35. 35
Resequencing Analyzed Apps on BaseSpace
Push-Button, Step by Step App Analysis
BWA Whole Genome Sequencing
ILLUMINA, INC.
BWA/GATK Whole Genome Sequencing processes
whole-genome sequencing data using BWA for
alignment and variant detection using GATK.
Isaac Whole Genome Sequencing v2
ILLUMINA, INC.
The Isaac Whole Genome Sequencing workflow
performs read mapping using Isaac Genome
Alignment Software and Isaac Variant Detection
(SNVs, small indels, copy number anomalies and
structural variations).
HiSeq Isaac Human WGS Workflow
ILLUMINA INC.
Isaac Genome Alignment Software and Isaac
Variant Caller for human samples.
Free
Free
Free
Slidegeneratedfrom
HenryYen
36. 36
About 12 species reference genome to aligned
Read length 21 ~ 150 bps
( Isaac is 35 ~150bps)
Support the Paired end runs
Does not support the Mate-paired reads
Detected CNV & Structure Variants result
[VCF file]
Isaac & BWA Whole Genome Sequencing
ILLUMINA, INC
Whole genome Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Slidegeneratedfrom
HenryYen
37. 37
Tumor/Normal Paired Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Tumor Normal
ILLUMINA, INC
The Tumor/Normal Sequencing App is designed to detect somatic
variants from a tumor and matched normal sample pair
Only support human hg 19
Read length 50 ~ 150 bps
Support the Paired end runs
40X for normal sample & 80X for tumor
(recommend)
Detected the somatic mutation in tumor
Free
Slidegeneratedfrom
HenryYen
38. 38
16S Metagenomics
ILLUMINA, INC.
The 16S Metagenomics app performs taxonomic
classification of 16S rRNA targeted amplicon reads
using an Illumina-curated version of the
GreenGenes taxonomic database.
16s Metagenomics Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
39. 39
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Align, assemble & analyze reads
DNASTAR, INC.
DNASTAR software for comprehensive next-gen
sequence assembly and analysis.
Assemble bacteria de novo - FREE
DNASTAR, INC.
DNASTAR SeqMan NGen allows you to perform
de novo assembly of bacterial genome
sequences.
Slidegeneratedfrom
HenryYen
40. 40
SPAdes
ALGORITHMIC BIOLOGY LAB
SPAdes 3.0 - St. Petersburg Genome Assembler -
is intended for both standard isolates and single-
cell MDA bacterial assemblies.
BayesHammer + SPAdes
BayesHammer – read error correction tool, which works well on both single-cell and standard data sets.
SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the
set of K-mer length values depending on the reads length.
Support MDA (Multiple displacement
amplification) singel-cell bacterial
assemblies
Supports paired-end reads, mate-pairs
and unpaired reads.
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
41. 41
The Algorithm for de Bruijn graph
You should setting the K-mer
in your assemblies
Slidegeneratedfrom
HenryYen
42. 4242
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Software : TopHat2 v2.0.7
Aligner : Bowtie 0.12.9
Assembly & Gene Expression : Cufflinks 2.1.1
Variants Caller: Isaac Variant Caller 2.0.5
Alignment Statistics : Picard tools 1.72
What can the App do ?
A. Alignment to hg19 human genome
B. FPKM value for genes or transcripts
C. Splice Junctions & fusions gene detect
D. cSNPs finding
E. Different expression gene discovery
TopHat Alignment Cufflinks Assembly & DE
FreeSlidegeneratedfrom
HenryYen
43. 43
Support 3 species (Human, Mouse, Rat)
Can call gene fusion
Only can trim adapter from TruSeq
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Slidegeneratedfrom
HenryYen
44. 44
Biological Interpretation for RNA-seq Data in BaseSpace
FreeiPathwayGuide (Supports Human datasets only)
ADVAITA BIO
An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will
perform the following analyses:
DE Gene Analysis
Gene Ontology Analysis for Biological Processes, Molecular Functions,
and Cellular Components
Pathway Analysis with Impact Analysis modeled on KEGG Pathways
Coherent Cascade Analysis on Pathways
Downstream Gene Perturbation Analysis
Drug Interaction Analysis
Disease Analysis based on enrichment
Slidegeneratedfrom
HenryYen
45. 4545
Overview the Core Apps for BaseSpace
BWA Enrichment
BWA Whole Genome Sequencing
Tumor Normal Paired
TopHat Alignment
Cufflinks Assembly & DE
Slidegeneratedfrom
HenryYen
46. 4646
BaseSpace Onsite System
Easy to Use from
sample to Answer
Secure, Safe and
Local Environment
Push-Button Data
Processing
Two 6 cores CPUs with 128GB RAM
Can only do the LIMS for NextSeq 500 now!!
(Support The HiSeq & MiSeq system in future)
RNA-seq
Exome-seq
Whole genome Analysis
Tumor & Normal Paired
Slidegeneratedfrom
HenryYen
47. 4747
Summary
Workflow MSR Local
Version
BaseSpace
Version
Amplicon – DS 2.4 N/A
Assembly 2.4 2.2
Enrichment 2.4 2.2
Generate FASTQ 2.4 2.2
Library QC 2.4 2.2
Metagenomics 2.4 2.2
PCR Amplicon 2.4 2.2
Resequencing 2.4 2.2
Small RNA 2.4 2.2
Targeted RNA 2.4 N/A
TruSeq Amplicon 2.4 2.2
BaseSpace Dual Mode Replicates
Analysis Locally on MiSeq
• Selectable option in MCS
• Allows customers to compare and
evaluate MSR Local vs. BaseSpace
• Retains local copy of all files for
customers reluctant to rely on 100%
remote storage
Slidegeneratedfrom
HenryYen