LUGM-Update of the Illumina Analysis Pipeline

© 2011 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera,
Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names
contained herein are the property of their respective owners.
Update of the Illumina
Analysis Pipeline
顏海威 Henry Yen
Bioinformatics FAS
均泰生物科技有限公司
techsupport@gtbiotech.com.tw
Slidegeneratedfrom
HenryYen

2
Course Objectives
By the end of this course, you will be able to:
Illumina Data Analysis Overview
The Workflow in MiSeq Reporter
Powerful Annotation Tool - VariantStudio
Illumina iCloud - BaseSpace
Slidegeneratedfrom
HenryYen

3
Illumina Data Analysis Overview
Slidegeneratedfrom
HenryYen

4
Data Visualization
Secondary Analysis
Primary Analysis
Data Analysis Workflow
Slidegeneratedfrom
HenryYen

5
Alignments and
Variant Detection
Images/TIFF files
Base CallingIntensities
Outputs Outputs
Primary and Secondary Analysis Overview
Analysis Type
Primary Analysis
(RTA)
Secondary Analysis
(MSR / BaseSpace)
Sequencing
(MCS/NCS/HCS)
Slidegeneratedfrom
HenryYen

6
MiSeq Analysis Workflow
RTA
Resequencing Amplicon Small RNA
De novo
Assembly
16S
Metagenomics
Base calls &
Quality Scores
Instrument
Control
Software
(MCS)
Images and Intensities
Limited Visualization via HTTP interface
Application-specific additional analysis
Alignment/FASTQ, Variants, Statistics
Enrichment
MiSeq Reporter
I’m All-in-One
Sequencer
Slidegeneratedfrom
HenryYen

7
Why We use the MiSeq Reporter
Automatic
– Auto start after sequencing
Simply
– Start-to-end workflow
Powerful
– Support different analysis required
Friendly
– Graphical User Interface
Slidegeneratedfrom
HenryYen

8
The Workflow in MiSeq Reporter
Slidegeneratedfrom
HenryYen

9
Workflows from MiSeq Reporter
AssemblyCapture-based Taxonomy
Reference Non Reference
Whole genome
Targeted-Seq
PCR-based
Resequencing
Library QC
Enrichment
Amplicon
Amplicon-DS
PCR-Amplicon
mtDNA
RNA
Small RNA
Targeted-RNA
De novo
Assembly
Metagenomics
MiSeq Reporter
Slidegeneratedfrom
HenryYen

10
Resequencing Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
 Reads are aligned to reference genome.
 Variants are noted
 Output the fastq, .bam, .vcf, .gVCF
 Report the on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Report
Fastq file
BAM file
VCF file
PDF file
Duplicated Flag
Resequencing
Slidegeneratedfrom
HenryYen

11
Library QC Workflows
Adapter Masking
PCR Amplicon workflow:
 Analyzed the data by BWA.
 Reads are aligned to reference genome.
 Non Variants calling
 Output the fastq, .bam,
Alignment
Indel Realignment
Bin / Sort
Alignment Statistics
Fastq file
BAM file
Duplicated Flag
LibraryQC
Slidegeneratedfrom
HenryYen

12
Enrichment Workflows
Adapter Masking
Enrichment workflow:
 Reads are aligned to targeted region.
 Analyzed data from probe captured
 Report the aligned rate, on-targeted rate, coverage & variants
summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Targeted Statistics
Fastq file
BAM file
VCF file
CSV file
Duplicated Flag
Targeted Region
Enrichment
Slidegeneratedfrom
HenryYen

13
Amplicon Workflows
Adapter Masking
Amplicon workflow:
 Analyzed the data from short-range PCR.
 Customer targeted design from Illumina
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
TruSeq Amplicon
Amplicon Viewer
Excel file
Slidegeneratedfrom
HenryYen

14
Amplicon-DS Workflows
Adapter Masking
Amplicon-DS workflow:
 Analyzed the data from TruSight Tumor.
 Variants check by double strand.
 Filtering FFPE sample false-positive variants
Alignment
Indel Realignment
Bin / Sort
Variants Calling
(Somatic)
Fastq file
BAM file
VCF file
Targeted Region
Variants filtering
Amplicon-DS
Slidegeneratedfrom
HenryYen

15
Two manifest file :
1. downstream locus-specific oligos (DLSO)
2. upstream locus-specific oligos (ULSO)
The DNA Deamination bias corrected
 The Amplicon Double-Stranded workflow can remove the
FFPE sample DNA deamination bias (C -> T)Slidegeneratedfrom
HenryYen

16
PCR Amplicon Workflows
Adapter Masking
PCR Amplicon workflow:
 Analyzed the data from long-range PCR.
 Targeted design by customer
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
Duplicated Flag
PCR AmpliconSlidegeneratedfrom
HenryYen

17
mtDNA Workflows
Adapter Masking
mtDNA workflow:
 Analyzed the data by forensic.
 Reads are aligned to rRCS.
 Output the fastq, .bam, viewer file & excel file
 It can be used to trace maternal lineage
Alignment with rRCS
Bin / Sort
Show by mtDNA viewer
Fastq file
BAM file
Excel file
Viewer file generated
Viewer file
mtDNA
Slidegeneratedfrom
HenryYen

18
Small RNA Workflows
Adapter Masking
Small RNA workflow:
 Analyzed the data by Bowtie.
 Reads are aligned to miRBase.
 Non Variants calling
 Output the fastq, .bam, pi chart & reads count for miRNA
Alignment
Bin / Sort
Reads count
Fastq file
BAM file
TXT file
Small RNASlidegeneratedfrom
HenryYen

19
Targeted RNA Workflows
Adapter Masking
Targeted RNA workflow::
Reads are aligned against custom manifest file (banded Smith-Waterman)
Reports relative expression of genes and isoforms between several samples
Outputs:
FASTQ, BAM, HTML report
Alignment
Bin / Sort
Different Expression Analysis
Fastq file
BAM file
HTML file
Targeted
RNA
Slidegeneratedfrom
HenryYen

20
De novo assembly Workflows
Adapter Masking
De novo Assembly workflow:
 The data Assembly by Velvet.
 Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
 Output the fastq, .fasta & dot plot
Assembly
Indel Realignment
Dot plot
Fastq file
Fasta file
De Novo
Assembly
Slidegeneratedfrom
HenryYen

21
Metagenomics Workflows
Adapter Masking
Metagenomics workflow:
 Bacteria population analysis based on 16S rRNA amplicons .
 Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
 Output the fastq, .fasta & dot plot
Reads Classification
Current Taxonomy
Pi chart
Fastq file
Fasta file
Metagenomics
Slidegeneratedfrom
HenryYen

22
 Greengenes database 13.5 (May 2013) to perform taxonomic classification
– http://greengenes.lbl.gov/
– Illumina-curated version
– Filter entries with 16S length <1250 bp
– Filter entries with incomplete annotation
 Bayesian classification method to assign taxonomies
 RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)
 Short sub-sequences are extracted from each read and compared to the
 database by the classifier
 Uses full length Illumina paired-end reads
 Classification down to genus/species-level
16S metagenomics in MiSeq Reporter 2.4
Slidegeneratedfrom
HenryYen

23
 Top 20 classification results
 Ordered by Taxonomic level
New HTML Output in Metagenomics Workflow
Slidegeneratedfrom
HenryYen

24
Read Stitch in MiSeq Reporter
≥ 10 bps
Read 1
Read 2
Stitch Read
MiSeq Reporter has the PE reads stitch function
 Read 1 and Read 2 have minimum 10 bps overlapping
 Bases Match Score need ≥ 0.9
 Bases Match Score = 1- [Base Mismatch Rate]
 Overlapping PE reads can be stitch one read.
 Cannot be stitched PE reads are converted to two single reads in the FASTQfile.
Slidegeneratedfrom
HenryYen

25
Powerful Annotation Tool
VaraintStudio
Slidegeneratedfrom
HenryYen

26
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate Filter Classify Report
• Intuitive user interface
• Rich annotations
• Flexible and comprehensive set of filters
• Streamlined variant classification
• Easy and customizable report generation
Insight
Slidegeneratedfrom
HenryYen

27
Illumina VariantStudio Workflow
Data in, biological knowledge out
Import VCF or gVCF Files
Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client
Export Report of interpreted variants
VariantStudio
Annotation Database
Slidegeneratedfrom
HenryYen

28
Annotation & Filtering
Leveraging a broad range of annotation sources to enrich data with
biological context
NHLBI
Exome Variant Server
1,000,000s
Detected Variants
1,000,000s
Detected Variants
10,000s
Coding Variants
10,000s
Coding Variants
100s
Deleterious
Variants
100s
Deleterious
Variants
Few
Causal
Variants
Few
Causal
Variants
Big Data
Easy to validate
Slidegeneratedfrom
HenryYen

29
Clinical Panels and VariantStudio
Streamlined workflow from sample to report
Align +
Call Variant Annotate Filter
Generate
Report
Classify
Easy!! Correctly !! Rapid!!
Slidegeneratedfrom
HenryYen

30
Illumina iCloud
BaseSpace
Slidegeneratedfrom
HenryYen

31
The Illumina Analysis iCloud : BaseSpace
Slidegeneratedfrom
HenryYen

32
BaseSpace Creates a Sequencing Ecosystem
Accelerates Analysis and Sharing of Genomic Data
Electronic
Medical
Record
Electronic
Medical
Record
Medical
History
Medical
History
Drugs &
Immunization
Drugs &
Immunization
Patient
Schedule
Patient
Schedule
Reference
Content
Reference
Content
Lab DataLab Data
Genomic
Data
Diagnostic
Images
Diagnostic
Images
Scanned
Charts
Scanned
Charts
App Space
Public Databases
Slidegeneratedfrom
HenryYen

33
Run data is automatically
sent to Projects in
BaseSpace
Runs and Projects have
separate permissions
Core labs will be able to
transfer ownership of a
project
Runs and Projects
Slidegeneratedfrom
HenryYen

34
Enrichment Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
BWA Enrichment
ILLUMINA, INC
The core algorithms in the BWA Enrichment
workflow are the BWA Genome Alignment Software
and the GATK Variant Caller.
Isaac Enrichment
ILLUMINA, INC
The core algorithms in the Isaac
Enrichment workflow are the Isaac
Genome Alignment Software and the
Isaac Variant Caller.
 Only for Human hg 19
 Read length of at least 32bp
 Support Paired-end run
Free
Slidegeneratedfrom
HenryYen

35
Resequencing Analyzed Apps on BaseSpace
BWA Whole Genome Sequencing
ILLUMINA, INC.
BWA/GATK Whole Genome Sequencing processes
whole-genome sequencing data using BWA for
alignment and variant detection using GATK.
Isaac Whole Genome Sequencing v2
ILLUMINA, INC.
The Isaac Whole Genome Sequencing workflow
performs read mapping using Isaac Genome
Alignment Software and Isaac Variant Detection
(SNVs, small indels, copy number anomalies and
structural variations).
HiSeq Isaac Human WGS Workflow
ILLUMINA INC.
Isaac Genome Alignment Software and Isaac
Variant Caller for human samples.
Free
Free
Free
Slidegeneratedfrom
HenryYen

36
 About 12 species reference genome to aligned
 Read length 21 ~ 150 bps
( Isaac is 35 ~150bps)
 Support the Paired end runs
 Does not support the Mate-paired reads
 Detected CNV & Structure Variants result
[VCF file]
Isaac & BWA Whole Genome Sequencing
ILLUMINA, INC
Whole genome Analysis Apps on BaseSpace
Slidegeneratedfrom
HenryYen

37
Tumor/Normal Paired Analysis Apps on BaseSpace
Tumor Normal
ILLUMINA, INC
The Tumor/Normal Sequencing App is designed to detect somatic
variants from a tumor and matched normal sample pair
 Only support human hg 19
 Read length 50 ~ 150 bps
 Support the Paired end runs
 40X for normal sample & 80X for tumor
(recommend)
 Detected the somatic mutation in tumor
Free
Slidegeneratedfrom
HenryYen

38
16S Metagenomics
ILLUMINA, INC.
The 16S Metagenomics app performs taxonomic
classification of 16S rRNA targeted amplicon reads
using an Illumina-curated version of the
GreenGenes taxonomic database.
16s Metagenomics Apps Release on BaseSpace Now
Free
Slidegeneratedfrom
HenryYen

39
De novo assembly Apps in BaseSpace
Align, assemble & analyze reads
DNASTAR, INC.
DNASTAR software for comprehensive next-gen
sequence assembly and analysis.
Assemble bacteria de novo - FREE
DNASTAR, INC.
DNASTAR SeqMan NGen allows you to perform
de novo assembly of bacterial genome
sequences.
Slidegeneratedfrom
HenryYen

40
SPAdes
ALGORITHMIC BIOLOGY LAB
SPAdes 3.0 - St. Petersburg Genome Assembler -
is intended for both standard isolates and single-
cell MDA bacterial assemblies.
BayesHammer + SPAdes
BayesHammer – read error correction tool, which works well on both single-cell and standard data sets.
SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the
set of K-mer length values depending on the reads length.
 Support MDA (Multiple displacement
amplification) singel-cell bacterial
assemblies
 Supports paired-end reads, mate-pairs
and unpaired reads.
De novo assembly Apps in BaseSpace
Free
Slidegeneratedfrom
HenryYen

41
The Algorithm for de Bruijn graph
You should setting the K-mer
in your assemblies
Slidegeneratedfrom
HenryYen

4242
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Software : TopHat2 v2.0.7
Aligner : Bowtie 0.12.9
Assembly & Gene Expression : Cufflinks 2.1.1
Variants Caller: Isaac Variant Caller 2.0.5
Alignment Statistics : Picard tools 1.72
What can the App do ?
A. Alignment to hg19 human genome
B. FPKM value for genes or transcripts
C. Splice Junctions & fusions gene detect
D. cSNPs finding
E. Different expression gene discovery
TopHat Alignment Cufflinks Assembly & DE
FreeSlidegeneratedfrom
HenryYen

43
 Support 3 species (Human, Mouse, Rat)
 Can call gene fusion
 Only can trim adapter from TruSeq
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Slidegeneratedfrom
HenryYen

44
Biological Interpretation for RNA-seq Data in BaseSpace
FreeiPathwayGuide (Supports Human datasets only)
ADVAITA BIO
An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will
perform the following analyses:
 DE Gene Analysis
 Gene Ontology Analysis for Biological Processes, Molecular Functions,
and Cellular Components
 Pathway Analysis with Impact Analysis modeled on KEGG Pathways
 Coherent Cascade Analysis on Pathways
 Downstream Gene Perturbation Analysis
 Drug Interaction Analysis
 Disease Analysis based on enrichment
Slidegeneratedfrom
HenryYen

4545
Overview the Core Apps for BaseSpace
BWA Enrichment
BWA Whole Genome Sequencing
Tumor Normal Paired
TopHat Alignment
Cufflinks Assembly & DE
Slidegeneratedfrom
HenryYen

4646
BaseSpace Onsite System
 Easy to Use from
sample to Answer
 Secure, Safe and
Local Environment
 Push-Button Data
Processing
Two 6 cores CPUs with 128GB RAM
Can only do the LIMS for NextSeq 500 now!!
(Support The HiSeq & MiSeq system in future)
 RNA-seq
 Exome-seq
 Whole genome Analysis
 Tumor & Normal Paired
Slidegeneratedfrom
HenryYen

4747
Summary
Workflow MSR Local
Version
BaseSpace
Version
Amplicon – DS 2.4 N/A
Assembly 2.4 2.2
Enrichment 2.4 2.2
Generate FASTQ 2.4 2.2
Library QC 2.4 2.2
Metagenomics 2.4 2.2
PCR Amplicon 2.4 2.2
Resequencing 2.4 2.2
Small RNA 2.4 2.2
Targeted RNA 2.4 N/A
TruSeq Amplicon 2.4 2.2
BaseSpace Dual Mode Replicates
Analysis Locally on MiSeq
• Selectable option in MCS
• Allows customers to compare and
evaluate MSR Local vs. BaseSpace
• Retains local copy of all files for
customers reluctant to rely on 100%
remote storage
Slidegeneratedfrom
HenryYen

48
Questions?
…..or Tired?
Slidegeneratedfrom
HenryYen

LUGM-Update of the Illumina Analysis Pipeline

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LUGM-Update of the Illumina Analysis Pipeline

Similar to LUGM-Update of the Illumina Analysis Pipeline (20)

LUGM-Update of the Illumina Analysis Pipeline