SlideShare a Scribd company logo
1 of 48
Download to read offline
© 2011 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera,
Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names
contained herein are the property of their respective owners.
Update of the Illumina
Analysis Pipeline
顏海威 Henry Yen
Bioinformatics FAS
均泰生物科技有限公司
techsupport@gtbiotech.com.tw
Slidegeneratedfrom
HenryYen
2
Course Objectives
By the end of this course, you will be able to:
Illumina Data Analysis Overview
The Workflow in MiSeq Reporter
Powerful Annotation Tool - VariantStudio
Illumina iCloud - BaseSpace
Slidegeneratedfrom
HenryYen
3
Illumina Data Analysis Overview
Slidegeneratedfrom
HenryYen
4
Data Visualization
Secondary Analysis
Primary Analysis
Data Analysis Workflow
Slidegeneratedfrom
HenryYen
5
Alignments and
Variant Detection
Images/TIFF files
Base CallingIntensities
Outputs Outputs
Primary and Secondary Analysis Overview
Analysis Type
Primary Analysis
(RTA)
Secondary Analysis
(MSR / BaseSpace)
Sequencing
(MCS/NCS/HCS)
Slidegeneratedfrom
HenryYen
6
MiSeq Analysis Workflow
RTA
Resequencing Amplicon Small RNA
De novo
Assembly
16S
Metagenomics
Base calls &
Quality Scores
Instrument
Control
Software
(MCS)
Images and Intensities
Limited Visualization via HTTP interface
Application-specific additional analysis
Alignment/FASTQ, Variants, Statistics
Enrichment
MiSeq Reporter
I’m All-in-One
Sequencer
Slidegeneratedfrom
HenryYen
7
Why We use the MiSeq Reporter
Automatic
– Auto start after sequencing
Simply
– Start-to-end workflow
Powerful
– Support different analysis required
Friendly
– Graphical User Interface
Slidegeneratedfrom
HenryYen
8
The Workflow in MiSeq Reporter
Slidegeneratedfrom
HenryYen
9
Workflows from MiSeq Reporter
AssemblyCapture-based Taxonomy
Reference Non Reference
Whole genome
Targeted-Seq
PCR-based
Resequencing
Library QC
Enrichment
Amplicon
Amplicon-DS
PCR-Amplicon
mtDNA
RNA
Small RNA
Targeted-RNA
De novo
Assembly
Metagenomics
MiSeq Reporter
Slidegeneratedfrom
HenryYen
10
Resequencing Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
 Reads are aligned to reference genome.
 Variants are noted
 Output the fastq, .bam, .vcf, .gVCF
 Report the on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Report
Fastq file
BAM file
VCF file
PDF file
Duplicated Flag
Resequencing
Slidegeneratedfrom
HenryYen
11
Library QC Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
 Analyzed the data by BWA.
 Reads are aligned to reference genome.
 Non Variants calling
 Output the fastq, .bam,
Alignment
Indel Realignment
Bin / Sort
Alignment Statistics
Fastq file
BAM file
Duplicated Flag
LibraryQC
Slidegeneratedfrom
HenryYen
12
Enrichment Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
 Reads are aligned to targeted region.
 Analyzed data from probe captured
 Output the fastq, .bam, .vcf, .gVCF
 Report the aligned rate, on-targeted rate, coverage & variants
summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Targeted Statistics
Fastq file
BAM file
VCF file
CSV file
Duplicated Flag
Targeted Region
Enrichment
Slidegeneratedfrom
HenryYen
13
Amplicon Workflows
Adapter Masking
Reads Demultiplexing
Amplicon workflow:
 Analyzed the data from short-range PCR.
 Reads are aligned to targeted region.
 Customer targeted design from Illumina
 Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
TruSeq Amplicon
Amplicon Viewer
Excel file
Slidegeneratedfrom
HenryYen
14
Amplicon-DS Workflows
Adapter Masking
Reads Demultiplexing
Amplicon-DS workflow:
 Analyzed the data from TruSight Tumor.
 Variants check by double strand.
 Filtering FFPE sample false-positive variants
 Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
(Somatic)
Fastq file
BAM file
VCF file
Targeted Region
Variants filtering
Amplicon-DS
Slidegeneratedfrom
HenryYen
15
Two manifest file :
1. downstream locus-specific oligos (DLSO)
2. upstream locus-specific oligos (ULSO)
The DNA Deamination bias corrected
 The Amplicon Double-Stranded workflow can remove the
FFPE sample DNA deamination bias (C -> T)Slidegeneratedfrom
HenryYen
16
PCR Amplicon Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
 Analyzed the data from long-range PCR.
 Reads are aligned to targeted region.
 Targeted design by customer
 Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
Duplicated Flag
PCR AmpliconSlidegeneratedfrom
HenryYen
17
mtDNA Workflows
Adapter Masking
Reads Demultiplexing
mtDNA workflow:
 Analyzed the data by forensic.
 Reads are aligned to rRCS.
 Output the fastq, .bam, viewer file & excel file
 It can be used to trace maternal lineage
Alignment with rRCS
Bin / Sort
Show by mtDNA viewer
Fastq file
BAM file
Excel file
Viewer file generated
Viewer file
mtDNA
Slidegeneratedfrom
HenryYen
18
Small RNA Workflows
Adapter Masking
Reads Demultiplexing
Small RNA workflow:
 Analyzed the data by Bowtie.
 Reads are aligned to miRBase.
 Non Variants calling
 Output the fastq, .bam, pi chart & reads count for miRNA
Alignment
Bin / Sort
Reads count
Fastq file
BAM file
TXT file
Small RNASlidegeneratedfrom
HenryYen
19
Targeted RNA Workflows
Adapter Masking
Reads Demultiplexing
Targeted RNA workflow::
Reads are aligned against custom manifest file (banded Smith-Waterman)
Reports relative expression of genes and isoforms between several samples
Outputs:
FASTQ, BAM, HTML report
Alignment
Bin / Sort
Different Expression Analysis
Fastq file
BAM file
HTML file
Targeted
RNA
Slidegeneratedfrom
HenryYen
20
De novo assembly Workflows
Adapter Masking
Reads Demultiplexing
De novo Assembly workflow:
 The data Assembly by Velvet.
 Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
 Output the fastq, .fasta & dot plot
Assembly
Indel Realignment
Dot plot
Fastq file
Fasta file
De Novo
Assembly
Slidegeneratedfrom
HenryYen
21
Metagenomics Workflows
Adapter Masking
Reads Demultiplexing
Metagenomics workflow:
 Bacteria population analysis based on 16S rRNA amplicons .
 Assembly of small (<20MB) genome from reads, without the use of a
genomic reference
 Output the fastq, .fasta & dot plot
Reads Classification
Current Taxonomy
Pi chart
Fastq file
Fasta file
Metagenomics
Slidegeneratedfrom
HenryYen
22
 Greengenes database 13.5 (May 2013) to perform taxonomic classification
– http://greengenes.lbl.gov/
– Illumina-curated version
– Filter entries with 16S length <1250 bp
– Filter entries with incomplete annotation
 Bayesian classification method to assign taxonomies
 RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)
 Short sub-sequences are extracted from each read and compared to the
 database by the classifier
 Uses full length Illumina paired-end reads
 Classification down to genus/species-level
16S metagenomics in MiSeq Reporter 2.4
Slidegeneratedfrom
HenryYen
23
 Top 20 classification results
 Ordered by Taxonomic level
New HTML Output in Metagenomics Workflow
Slidegeneratedfrom
HenryYen
24
Read Stitch in MiSeq Reporter
≥ 10 bps
Read 1
Read 2
Stitch Read
MiSeq Reporter has the PE reads stitch function
 Read 1 and Read 2 have minimum 10 bps overlapping
 Bases Match Score need ≥ 0.9
 Bases Match Score = 1- [Base Mismatch Rate]
 Overlapping PE reads can be stitch one read.
 Cannot be stitched PE reads are converted to two single reads in the FASTQfile.
Slidegeneratedfrom
HenryYen
25
Powerful Annotation Tool
VaraintStudio
Slidegeneratedfrom
HenryYen
26
Illumina VariantStudio
Intuitive analysis and interpretation
Import
Data
Annotate Filter Classify Report
• Intuitive user interface
• Rich annotations
• Flexible and comprehensive set of filters
• Streamlined variant classification
• Easy and customizable report generation
Insight
Slidegeneratedfrom
HenryYen
27
Illumina VariantStudio Workflow
Data in, biological knowledge out
Import VCF or gVCF Files
Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client
Export Report of interpreted variants
VariantStudio
Annotation Database
Slidegeneratedfrom
HenryYen
28
Annotation & Filtering
Leveraging a broad range of annotation sources to enrich data with
biological context
NHLBI
Exome Variant Server
1,000,000s
Detected Variants
1,000,000s
Detected Variants
10,000s
Coding Variants
10,000s
Coding Variants
100s
Deleterious
Variants
100s
Deleterious
Variants
Few
Causal
Variants
Few
Causal
Variants
Big Data
Easy to validate
Slidegeneratedfrom
HenryYen
29
Clinical Panels and VariantStudio
Streamlined workflow from sample to report
Align +
Call Variant Annotate Filter
Generate
Report
Classify
Easy!! Correctly !! Rapid!!
Slidegeneratedfrom
HenryYen
30
Illumina iCloud
BaseSpace
Slidegeneratedfrom
HenryYen
31
The Illumina Analysis iCloud : BaseSpace
Slidegeneratedfrom
HenryYen
32
BaseSpace Creates a Sequencing Ecosystem
Accelerates Analysis and Sharing of Genomic Data
Electronic
Medical
Record
Electronic
Medical
Record
Medical
History
Medical
History
Drugs &
Immunization
Drugs &
Immunization
Patient
Schedule
Patient
Schedule
Reference
Content
Reference
Content
Lab DataLab Data
Genomic
Data
Diagnostic
Images
Diagnostic
Images
Scanned
Charts
Scanned
Charts
App Space
Public Databases
Slidegeneratedfrom
HenryYen
33
Run data is automatically
sent to Projects in
BaseSpace
Runs and Projects have
separate permissions
Core labs will be able to
transfer ownership of a
project
Runs and Projects
Slidegeneratedfrom
HenryYen
34
Enrichment Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
BWA Enrichment
ILLUMINA, INC
The core algorithms in the BWA Enrichment
workflow are the BWA Genome Alignment Software
and the GATK Variant Caller.
Isaac Enrichment
ILLUMINA, INC
The core algorithms in the Isaac
Enrichment workflow are the Isaac
Genome Alignment Software and the
Isaac Variant Caller.
 Only for Human hg 19
 Read length of at least 32bp
 Support Paired-end run
Free
Slidegeneratedfrom
HenryYen
35
Resequencing Analyzed Apps on BaseSpace
Push-Button, Step by Step App Analysis
BWA Whole Genome Sequencing
ILLUMINA, INC.
BWA/GATK Whole Genome Sequencing processes
whole-genome sequencing data using BWA for
alignment and variant detection using GATK.
Isaac Whole Genome Sequencing v2
ILLUMINA, INC.
The Isaac Whole Genome Sequencing workflow
performs read mapping using Isaac Genome
Alignment Software and Isaac Variant Detection
(SNVs, small indels, copy number anomalies and
structural variations).
HiSeq Isaac Human WGS Workflow
ILLUMINA INC.
Isaac Genome Alignment Software and Isaac
Variant Caller for human samples.
Free
Free
Free
Slidegeneratedfrom
HenryYen
36
 About 12 species reference genome to aligned
 Read length 21 ~ 150 bps
( Isaac is 35 ~150bps)
 Support the Paired end runs
 Does not support the Mate-paired reads
 Detected CNV & Structure Variants result
[VCF file]
Isaac & BWA Whole Genome Sequencing
ILLUMINA, INC
Whole genome Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Slidegeneratedfrom
HenryYen
37
Tumor/Normal Paired Analysis Apps on BaseSpace
Push-Button, Step by Step App Analysis
Tumor Normal
ILLUMINA, INC
The Tumor/Normal Sequencing App is designed to detect somatic
variants from a tumor and matched normal sample pair
 Only support human hg 19
 Read length 50 ~ 150 bps
 Support the Paired end runs
 40X for normal sample & 80X for tumor
(recommend)
 Detected the somatic mutation in tumor
Free
Slidegeneratedfrom
HenryYen
38
16S Metagenomics
ILLUMINA, INC.
The 16S Metagenomics app performs taxonomic
classification of 16S rRNA targeted amplicon reads
using an Illumina-curated version of the
GreenGenes taxonomic database.
16s Metagenomics Apps Release on BaseSpace Now
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
39
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Align, assemble & analyze reads
DNASTAR, INC.
DNASTAR software for comprehensive next-gen
sequence assembly and analysis.
Assemble bacteria de novo - FREE
DNASTAR, INC.
DNASTAR SeqMan NGen allows you to perform
de novo assembly of bacterial genome
sequences.
Slidegeneratedfrom
HenryYen
40
SPAdes
ALGORITHMIC BIOLOGY LAB
SPAdes 3.0 - St. Petersburg Genome Assembler -
is intended for both standard isolates and single-
cell MDA bacterial assemblies.
BayesHammer + SPAdes
BayesHammer – read error correction tool, which works well on both single-cell and standard data sets.
SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the
set of K-mer length values depending on the reads length.
 Support MDA (Multiple displacement
amplification) singel-cell bacterial
assemblies
 Supports paired-end reads, mate-pairs
and unpaired reads.
De novo assembly Apps in BaseSpace
Push-Button, Step by Step App Analysis
Free
Slidegeneratedfrom
HenryYen
41
The Algorithm for de Bruijn graph
You should setting the K-mer
in your assemblies
Slidegeneratedfrom
HenryYen
4242
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Software : TopHat2 v2.0.7
Aligner : Bowtie 0.12.9
Assembly & Gene Expression : Cufflinks 2.1.1
Variants Caller: Isaac Variant Caller 2.0.5
Alignment Statistics : Picard tools 1.72
What can the App do ?
A. Alignment to hg19 human genome
B. FPKM value for genes or transcripts
C. Splice Junctions & fusions gene detect
D. cSNPs finding
E. Different expression gene discovery
TopHat Alignment Cufflinks Assembly & DE
FreeSlidegeneratedfrom
HenryYen
43
 Support 3 species (Human, Mouse, Rat)
 Can call gene fusion
 Only can trim adapter from TruSeq
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Slidegeneratedfrom
HenryYen
44
Biological Interpretation for RNA-seq Data in BaseSpace
FreeiPathwayGuide (Supports Human datasets only)
ADVAITA BIO
An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will
perform the following analyses:
 DE Gene Analysis
 Gene Ontology Analysis for Biological Processes, Molecular Functions,
and Cellular Components
 Pathway Analysis with Impact Analysis modeled on KEGG Pathways
 Coherent Cascade Analysis on Pathways
 Downstream Gene Perturbation Analysis
 Drug Interaction Analysis
 Disease Analysis based on enrichment
Slidegeneratedfrom
HenryYen
4545
Overview the Core Apps for BaseSpace
BWA Enrichment
BWA Whole Genome Sequencing
Tumor Normal Paired
TopHat Alignment
Cufflinks Assembly & DE
Slidegeneratedfrom
HenryYen
4646
BaseSpace Onsite System
 Easy to Use from
sample to Answer
 Secure, Safe and
Local Environment
 Push-Button Data
Processing
Two 6 cores CPUs with 128GB RAM
Can only do the LIMS for NextSeq 500 now!!
(Support The HiSeq & MiSeq system in future)
 RNA-seq
 Exome-seq
 Whole genome Analysis
 Tumor & Normal Paired
Slidegeneratedfrom
HenryYen
4747
Summary
Workflow MSR Local
Version
BaseSpace
Version
Amplicon – DS 2.4 N/A
Assembly 2.4 2.2
Enrichment 2.4 2.2
Generate FASTQ 2.4 2.2
Library QC 2.4 2.2
Metagenomics 2.4 2.2
PCR Amplicon 2.4 2.2
Resequencing 2.4 2.2
Small RNA 2.4 2.2
Targeted RNA 2.4 N/A
TruSeq Amplicon 2.4 2.2
BaseSpace Dual Mode Replicates
Analysis Locally on MiSeq
• Selectable option in MCS
• Allows customers to compare and
evaluate MSR Local vs. BaseSpace
• Retains local copy of all files for
customers reluctant to rely on 100%
remote storage
Slidegeneratedfrom
HenryYen
48
Questions?
…..or Tired?
Slidegeneratedfrom
HenryYen

More Related Content

What's hot

What's hot (20)

ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
DNA_Services
DNA_ServicesDNA_Services
DNA_Services
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 

Similar to LUGM-Update of the Illumina Analysis Pipeline

Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Elsa von Licy
 
VerticaPy_original - Anritsu.pdf
VerticaPy_original - Anritsu.pdfVerticaPy_original - Anritsu.pdf
VerticaPy_original - Anritsu.pdf
Amzath3
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
Justin Johnson
 
Palo Alto Networks PAN-OS 4.0 New Features
Palo Alto Networks PAN-OS 4.0 New FeaturesPalo Alto Networks PAN-OS 4.0 New Features
Palo Alto Networks PAN-OS 4.0 New Features
lukky753
 

Similar to LUGM-Update of the Illumina Analysis Pipeline (20)

BioWeka
BioWekaBioWeka
BioWeka
 
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
 
Software defined network and Virtualization
Software defined network and VirtualizationSoftware defined network and Virtualization
Software defined network and Virtualization
 
VerticaPy_original - Anritsu.pdf
VerticaPy_original - Anritsu.pdfVerticaPy_original - Anritsu.pdf
VerticaPy_original - Anritsu.pdf
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Flink for Everyone: Self-Service Data Analytics with StreamPipes
Flink for Everyone: Self-Service Data Analytics with StreamPipesFlink for Everyone: Self-Service Data Analytics with StreamPipes
Flink for Everyone: Self-Service Data Analytics with StreamPipes
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
Open64 compiler
Open64 compilerOpen64 compiler
Open64 compiler
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Palo Alto Networks PAN-OS 4.0 New Features
Palo Alto Networks PAN-OS 4.0 New FeaturesPalo Alto Networks PAN-OS 4.0 New Features
Palo Alto Networks PAN-OS 4.0 New Features
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Bio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsBio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
Bio-IT 2017 - Session 7: Next-Gen Sequencing Informatics
 
Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
FastQC and Prinseqlite
FastQC and PrinseqliteFastQC and Prinseqlite
FastQC and Prinseqlite
 
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS Analytics
 

LUGM-Update of the Illumina Analysis Pipeline

  • 1. © 2011 Illumina, Inc. All rights reserved. Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Update of the Illumina Analysis Pipeline 顏海威 Henry Yen Bioinformatics FAS 均泰生物科技有限公司 techsupport@gtbiotech.com.tw Slidegeneratedfrom HenryYen
  • 2. 2 Course Objectives By the end of this course, you will be able to: Illumina Data Analysis Overview The Workflow in MiSeq Reporter Powerful Annotation Tool - VariantStudio Illumina iCloud - BaseSpace Slidegeneratedfrom HenryYen
  • 3. 3 Illumina Data Analysis Overview Slidegeneratedfrom HenryYen
  • 4. 4 Data Visualization Secondary Analysis Primary Analysis Data Analysis Workflow Slidegeneratedfrom HenryYen
  • 5. 5 Alignments and Variant Detection Images/TIFF files Base CallingIntensities Outputs Outputs Primary and Secondary Analysis Overview Analysis Type Primary Analysis (RTA) Secondary Analysis (MSR / BaseSpace) Sequencing (MCS/NCS/HCS) Slidegeneratedfrom HenryYen
  • 6. 6 MiSeq Analysis Workflow RTA Resequencing Amplicon Small RNA De novo Assembly 16S Metagenomics Base calls & Quality Scores Instrument Control Software (MCS) Images and Intensities Limited Visualization via HTTP interface Application-specific additional analysis Alignment/FASTQ, Variants, Statistics Enrichment MiSeq Reporter I’m All-in-One Sequencer Slidegeneratedfrom HenryYen
  • 7. 7 Why We use the MiSeq Reporter Automatic – Auto start after sequencing Simply – Start-to-end workflow Powerful – Support different analysis required Friendly – Graphical User Interface Slidegeneratedfrom HenryYen
  • 8. 8 The Workflow in MiSeq Reporter Slidegeneratedfrom HenryYen
  • 9. 9 Workflows from MiSeq Reporter AssemblyCapture-based Taxonomy Reference Non Reference Whole genome Targeted-Seq PCR-based Resequencing Library QC Enrichment Amplicon Amplicon-DS PCR-Amplicon mtDNA RNA Small RNA Targeted-RNA De novo Assembly Metagenomics MiSeq Reporter Slidegeneratedfrom HenryYen
  • 10. 10 Resequencing Workflows Adapter Masking Reads Demultiplexing Enrichment workflow:  Reads are aligned to reference genome.  Variants are noted  Output the fastq, .bam, .vcf, .gVCF  Report the on-targeted rate, coverage & variants summary Alignment Indel Realignment Bin / Sort Variants Calling Report Fastq file BAM file VCF file PDF file Duplicated Flag Resequencing Slidegeneratedfrom HenryYen
  • 11. 11 Library QC Workflows Adapter Masking Reads Demultiplexing PCR Amplicon workflow:  Analyzed the data by BWA.  Reads are aligned to reference genome.  Non Variants calling  Output the fastq, .bam, Alignment Indel Realignment Bin / Sort Alignment Statistics Fastq file BAM file Duplicated Flag LibraryQC Slidegeneratedfrom HenryYen
  • 12. 12 Enrichment Workflows Adapter Masking Reads Demultiplexing Enrichment workflow:  Reads are aligned to targeted region.  Analyzed data from probe captured  Output the fastq, .bam, .vcf, .gVCF  Report the aligned rate, on-targeted rate, coverage & variants summary Alignment Indel Realignment Bin / Sort Variants Calling Targeted Statistics Fastq file BAM file VCF file CSV file Duplicated Flag Targeted Region Enrichment Slidegeneratedfrom HenryYen
  • 13. 13 Amplicon Workflows Adapter Masking Reads Demultiplexing Amplicon workflow:  Analyzed the data from short-range PCR.  Reads are aligned to targeted region.  Customer targeted design from Illumina  Output the fastq, .bam, .vcf, .gVCF Alignment Indel Realignment Bin / Sort Variants Calling Fastq file BAM file VCF file Targeted Region TruSeq Amplicon Amplicon Viewer Excel file Slidegeneratedfrom HenryYen
  • 14. 14 Amplicon-DS Workflows Adapter Masking Reads Demultiplexing Amplicon-DS workflow:  Analyzed the data from TruSight Tumor.  Variants check by double strand.  Filtering FFPE sample false-positive variants  Output the fastq, .bam, .vcf, .gVCF Alignment Indel Realignment Bin / Sort Variants Calling (Somatic) Fastq file BAM file VCF file Targeted Region Variants filtering Amplicon-DS Slidegeneratedfrom HenryYen
  • 15. 15 Two manifest file : 1. downstream locus-specific oligos (DLSO) 2. upstream locus-specific oligos (ULSO) The DNA Deamination bias corrected  The Amplicon Double-Stranded workflow can remove the FFPE sample DNA deamination bias (C -> T)Slidegeneratedfrom HenryYen
  • 16. 16 PCR Amplicon Workflows Adapter Masking Reads Demultiplexing PCR Amplicon workflow:  Analyzed the data from long-range PCR.  Reads are aligned to targeted region.  Targeted design by customer  Output the fastq, .bam, .vcf, .gVCF Alignment Indel Realignment Bin / Sort Variants Calling Fastq file BAM file VCF file Targeted Region Duplicated Flag PCR AmpliconSlidegeneratedfrom HenryYen
  • 17. 17 mtDNA Workflows Adapter Masking Reads Demultiplexing mtDNA workflow:  Analyzed the data by forensic.  Reads are aligned to rRCS.  Output the fastq, .bam, viewer file & excel file  It can be used to trace maternal lineage Alignment with rRCS Bin / Sort Show by mtDNA viewer Fastq file BAM file Excel file Viewer file generated Viewer file mtDNA Slidegeneratedfrom HenryYen
  • 18. 18 Small RNA Workflows Adapter Masking Reads Demultiplexing Small RNA workflow:  Analyzed the data by Bowtie.  Reads are aligned to miRBase.  Non Variants calling  Output the fastq, .bam, pi chart & reads count for miRNA Alignment Bin / Sort Reads count Fastq file BAM file TXT file Small RNASlidegeneratedfrom HenryYen
  • 19. 19 Targeted RNA Workflows Adapter Masking Reads Demultiplexing Targeted RNA workflow:: Reads are aligned against custom manifest file (banded Smith-Waterman) Reports relative expression of genes and isoforms between several samples Outputs: FASTQ, BAM, HTML report Alignment Bin / Sort Different Expression Analysis Fastq file BAM file HTML file Targeted RNA Slidegeneratedfrom HenryYen
  • 20. 20 De novo assembly Workflows Adapter Masking Reads Demultiplexing De novo Assembly workflow:  The data Assembly by Velvet.  Assembly of small (<20MB) genome from reads, without the use of a genomic reference  Output the fastq, .fasta & dot plot Assembly Indel Realignment Dot plot Fastq file Fasta file De Novo Assembly Slidegeneratedfrom HenryYen
  • 21. 21 Metagenomics Workflows Adapter Masking Reads Demultiplexing Metagenomics workflow:  Bacteria population analysis based on 16S rRNA amplicons .  Assembly of small (<20MB) genome from reads, without the use of a genomic reference  Output the fastq, .fasta & dot plot Reads Classification Current Taxonomy Pi chart Fastq file Fasta file Metagenomics Slidegeneratedfrom HenryYen
  • 22. 22  Greengenes database 13.5 (May 2013) to perform taxonomic classification – http://greengenes.lbl.gov/ – Illumina-curated version – Filter entries with 16S length <1250 bp – Filter entries with incomplete annotation  Bayesian classification method to assign taxonomies  RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)  Short sub-sequences are extracted from each read and compared to the  database by the classifier  Uses full length Illumina paired-end reads  Classification down to genus/species-level 16S metagenomics in MiSeq Reporter 2.4 Slidegeneratedfrom HenryYen
  • 23. 23  Top 20 classification results  Ordered by Taxonomic level New HTML Output in Metagenomics Workflow Slidegeneratedfrom HenryYen
  • 24. 24 Read Stitch in MiSeq Reporter ≥ 10 bps Read 1 Read 2 Stitch Read MiSeq Reporter has the PE reads stitch function  Read 1 and Read 2 have minimum 10 bps overlapping  Bases Match Score need ≥ 0.9  Bases Match Score = 1- [Base Mismatch Rate]  Overlapping PE reads can be stitch one read.  Cannot be stitched PE reads are converted to two single reads in the FASTQfile. Slidegeneratedfrom HenryYen
  • 26. 26 Illumina VariantStudio Intuitive analysis and interpretation Import Data Annotate Filter Classify Report • Intuitive user interface • Rich annotations • Flexible and comprehensive set of filters • Streamlined variant classification • Easy and customizable report generation Insight Slidegeneratedfrom HenryYen
  • 27. 27 Illumina VariantStudio Workflow Data in, biological knowledge out Import VCF or gVCF Files Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client Export Report of interpreted variants VariantStudio Annotation Database Slidegeneratedfrom HenryYen
  • 28. 28 Annotation & Filtering Leveraging a broad range of annotation sources to enrich data with biological context NHLBI Exome Variant Server 1,000,000s Detected Variants 1,000,000s Detected Variants 10,000s Coding Variants 10,000s Coding Variants 100s Deleterious Variants 100s Deleterious Variants Few Causal Variants Few Causal Variants Big Data Easy to validate Slidegeneratedfrom HenryYen
  • 29. 29 Clinical Panels and VariantStudio Streamlined workflow from sample to report Align + Call Variant Annotate Filter Generate Report Classify Easy!! Correctly !! Rapid!! Slidegeneratedfrom HenryYen
  • 31. 31 The Illumina Analysis iCloud : BaseSpace Slidegeneratedfrom HenryYen
  • 32. 32 BaseSpace Creates a Sequencing Ecosystem Accelerates Analysis and Sharing of Genomic Data Electronic Medical Record Electronic Medical Record Medical History Medical History Drugs & Immunization Drugs & Immunization Patient Schedule Patient Schedule Reference Content Reference Content Lab DataLab Data Genomic Data Diagnostic Images Diagnostic Images Scanned Charts Scanned Charts App Space Public Databases Slidegeneratedfrom HenryYen
  • 33. 33 Run data is automatically sent to Projects in BaseSpace Runs and Projects have separate permissions Core labs will be able to transfer ownership of a project Runs and Projects Slidegeneratedfrom HenryYen
  • 34. 34 Enrichment Apps Release on BaseSpace Now Push-Button, Step by Step App Analysis BWA Enrichment ILLUMINA, INC The core algorithms in the BWA Enrichment workflow are the BWA Genome Alignment Software and the GATK Variant Caller. Isaac Enrichment ILLUMINA, INC The core algorithms in the Isaac Enrichment workflow are the Isaac Genome Alignment Software and the Isaac Variant Caller.  Only for Human hg 19  Read length of at least 32bp  Support Paired-end run Free Slidegeneratedfrom HenryYen
  • 35. 35 Resequencing Analyzed Apps on BaseSpace Push-Button, Step by Step App Analysis BWA Whole Genome Sequencing ILLUMINA, INC. BWA/GATK Whole Genome Sequencing processes whole-genome sequencing data using BWA for alignment and variant detection using GATK. Isaac Whole Genome Sequencing v2 ILLUMINA, INC. The Isaac Whole Genome Sequencing workflow performs read mapping using Isaac Genome Alignment Software and Isaac Variant Detection (SNVs, small indels, copy number anomalies and structural variations). HiSeq Isaac Human WGS Workflow ILLUMINA INC. Isaac Genome Alignment Software and Isaac Variant Caller for human samples. Free Free Free Slidegeneratedfrom HenryYen
  • 36. 36  About 12 species reference genome to aligned  Read length 21 ~ 150 bps ( Isaac is 35 ~150bps)  Support the Paired end runs  Does not support the Mate-paired reads  Detected CNV & Structure Variants result [VCF file] Isaac & BWA Whole Genome Sequencing ILLUMINA, INC Whole genome Analysis Apps on BaseSpace Push-Button, Step by Step App Analysis Slidegeneratedfrom HenryYen
  • 37. 37 Tumor/Normal Paired Analysis Apps on BaseSpace Push-Button, Step by Step App Analysis Tumor Normal ILLUMINA, INC The Tumor/Normal Sequencing App is designed to detect somatic variants from a tumor and matched normal sample pair  Only support human hg 19  Read length 50 ~ 150 bps  Support the Paired end runs  40X for normal sample & 80X for tumor (recommend)  Detected the somatic mutation in tumor Free Slidegeneratedfrom HenryYen
  • 38. 38 16S Metagenomics ILLUMINA, INC. The 16S Metagenomics app performs taxonomic classification of 16S rRNA targeted amplicon reads using an Illumina-curated version of the GreenGenes taxonomic database. 16s Metagenomics Apps Release on BaseSpace Now Push-Button, Step by Step App Analysis Free Slidegeneratedfrom HenryYen
  • 39. 39 De novo assembly Apps in BaseSpace Push-Button, Step by Step App Analysis Align, assemble & analyze reads DNASTAR, INC. DNASTAR software for comprehensive next-gen sequence assembly and analysis. Assemble bacteria de novo - FREE DNASTAR, INC. DNASTAR SeqMan NGen allows you to perform de novo assembly of bacterial genome sequences. Slidegeneratedfrom HenryYen
  • 40. 40 SPAdes ALGORITHMIC BIOLOGY LAB SPAdes 3.0 - St. Petersburg Genome Assembler - is intended for both standard isolates and single- cell MDA bacterial assemblies. BayesHammer + SPAdes BayesHammer – read error correction tool, which works well on both single-cell and standard data sets. SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the set of K-mer length values depending on the reads length.  Support MDA (Multiple displacement amplification) singel-cell bacterial assemblies  Supports paired-end reads, mate-pairs and unpaired reads. De novo assembly Apps in BaseSpace Push-Button, Step by Step App Analysis Free Slidegeneratedfrom HenryYen
  • 41. 41 The Algorithm for de Bruijn graph You should setting the K-mer in your assemblies Slidegeneratedfrom HenryYen
  • 42. 4242 New RNA-seq End-to-End Analysis Apps in “BaseSpace” Software : TopHat2 v2.0.7 Aligner : Bowtie 0.12.9 Assembly & Gene Expression : Cufflinks 2.1.1 Variants Caller: Isaac Variant Caller 2.0.5 Alignment Statistics : Picard tools 1.72 What can the App do ? A. Alignment to hg19 human genome B. FPKM value for genes or transcripts C. Splice Junctions & fusions gene detect D. cSNPs finding E. Different expression gene discovery TopHat Alignment Cufflinks Assembly & DE FreeSlidegeneratedfrom HenryYen
  • 43. 43  Support 3 species (Human, Mouse, Rat)  Can call gene fusion  Only can trim adapter from TruSeq New RNA-seq End-to-End Analysis Apps in “BaseSpace” Slidegeneratedfrom HenryYen
  • 44. 44 Biological Interpretation for RNA-seq Data in BaseSpace FreeiPathwayGuide (Supports Human datasets only) ADVAITA BIO An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will perform the following analyses:  DE Gene Analysis  Gene Ontology Analysis for Biological Processes, Molecular Functions, and Cellular Components  Pathway Analysis with Impact Analysis modeled on KEGG Pathways  Coherent Cascade Analysis on Pathways  Downstream Gene Perturbation Analysis  Drug Interaction Analysis  Disease Analysis based on enrichment Slidegeneratedfrom HenryYen
  • 45. 4545 Overview the Core Apps for BaseSpace BWA Enrichment BWA Whole Genome Sequencing Tumor Normal Paired TopHat Alignment Cufflinks Assembly & DE Slidegeneratedfrom HenryYen
  • 46. 4646 BaseSpace Onsite System  Easy to Use from sample to Answer  Secure, Safe and Local Environment  Push-Button Data Processing Two 6 cores CPUs with 128GB RAM Can only do the LIMS for NextSeq 500 now!! (Support The HiSeq & MiSeq system in future)  RNA-seq  Exome-seq  Whole genome Analysis  Tumor & Normal Paired Slidegeneratedfrom HenryYen
  • 47. 4747 Summary Workflow MSR Local Version BaseSpace Version Amplicon – DS 2.4 N/A Assembly 2.4 2.2 Enrichment 2.4 2.2 Generate FASTQ 2.4 2.2 Library QC 2.4 2.2 Metagenomics 2.4 2.2 PCR Amplicon 2.4 2.2 Resequencing 2.4 2.2 Small RNA 2.4 2.2 Targeted RNA 2.4 N/A TruSeq Amplicon 2.4 2.2 BaseSpace Dual Mode Replicates Analysis Locally on MiSeq • Selectable option in MCS • Allows customers to compare and evaluate MSR Local vs. BaseSpace • Retains local copy of all files for customers reluctant to rely on 100% remote storage Slidegeneratedfrom HenryYen