Your SlideShare is downloading. ×
0
Sample & Assay Technologies

Next Generation Sequencing:
Data analysis for genetic profiling
Ravi Vijaya Satya, Ph.D.
Seni...
Sample & Assay Technologies

Welcome to the three-part webinar series
Next Generation Sequencing and its role in cancer bi...
Sample & Assay Technologies

Agenda

NGS Data Analysis
Read Mapping
Variant Calling
Variant Annotation
Targeted Enrichment...
Sample & Assay Technologies

Read Mapping
Reads mapped to a reference genome

Millions of reads
from a single run

Alignme...
Sample & Assay Technologies

Variant Calling
Determine if there is enough statistical support to call a variant

Reference...
Sample & Assay Technologies

Variant Representation
VCF – Variant Call Format

http://www.1000genomes.org/wiki/Analysis/Va...
Variant Annotation

Sample & Assay Technologies

dbSNP/COSMIC ID
Chro
m
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
chr1
...
Sample & Assay Technologies

GeneRead DNAseq Gene Panel: Targeted Sequencing

What is targeted sequencing?
Sequencing a su...
Sample & Assay Technologies

Target Enrichment - Methodology

Multiplex PCR
Small DNA input (< 100ng)
Short processing tim...
Sample & Assay Technologies

Variants Identifiable through Multiplex PCR

SNPs – single nucleotide polymorphisms
Indels
In...
Sample & Assay Technologies

GeneRead DNAseq Gene Panel

Multiplex PCR technology based targeted enrichment for DNA sequen...
Sample & Assay Technologies

GeneRead DNAseq Gene Panel
Focus on your Disease of Interest
Comprehensive Cancer Panel (124 ...
Sample & Assay Technologies

GeneRead DNAseq Custom Panel

13
Sample & Assay Technologies

Data Analysis for Targeted Sequencing
GeneRead data analysis work flow

Read
Mapping

Primer
...
Sample & Assay Technologies

Reads from Targeted Sequencing

Typical NGS raw read from targeted sequencing
Adapter

Barcod...
Sample & Assay Technologies

Read Mapping
Align reads to the reference genome

Reference sequence

Amplicon 1

Amplicon 2
...
Primer Trimming

Sample & Assay Technologies

Primer sequences must be trimmed for accurate variant calling
Reference sequ...
Sample & Assay Technologies

GeneRead Variant Calling Overview

BAM file
(w/ flow space info)

BED file for
amplicons used...
Sample & Assay Technologies

Indel realignment

DePristo MA, et al. A framework for variation discovery and genotyping usi...
Sample & Assay Technologies

Base Quality Recalibration

DePristo MA, et al. A framework for variation discovery and genot...
Sample & Assay Technologies

Variant Filtration

Variant Frequency
Somatic mode
SNPs with frequency < 4% and indels with f...
Sample & Assay Technologies

Specificity Analysis

Specificity: the percentage of sequences that map to the intended targe...
Sample & Assay Technologies

Sequencing Depth

Coverage depth (or depth of coverage): how many times each base has been se...
Sample & Assay Technologies

NGS Data Analysis: Uniformity

Coverage uniformity: measure the evenness of the coverage dept...
Sample & Assay Technologies

GeneRead Data Analysis Web Portal

FREE Complete & Easy to use Data Analysis with Web-based S...
Sample & Assay Technologies

GeneRead Data Analysis Web Portal

Title, Location, Date

26
Sample & Assay Technologies

Title, Location, Date

27
Sample & Assay Technologies

Title, Location, Date

28
Sample & Assay Technologies

Note: Runtimes depend on the number of reads in the input files. Typical runtimes are 20-60 m...
Sample & Assay Technologies

Title, Location, Date

30
Sample & Assay Technologies

Title, Location, Date

31
Sample & Assay Technologies

Title, Location, Date

32
Sample & Assay Technologies

Summary

Run Summary
Specificity
Coverage
Uniformity
Numbers of SNPs and Indels

Summary By G...
Sample & Assay Technologies

Features of Variant Report

SNP detection
Indel detection

34
Sample & Assay Technologies

QIAGEN’s GeneRead DNAseq Gene Panel System
FOCUS ON YOUR RELEVANT GENES
Focused:
Biologically...
Sample & Assay Technologies

Upcoming webinars
Next Generation Sequencing and its role in cancer biology

Webinar 3:
Date:...
Upcoming SlideShare
Loading in...5
×

Ngs part iii 2013

188

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
188
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Ngs part iii 2013"

  1. 1. Sample & Assay Technologies Next Generation Sequencing: Data analysis for genetic profiling Ravi Vijaya Satya, Ph.D. Senior Scientist, R&D Ravi.VijayaSatya@QIAGEN.com
  2. 2. Sample & Assay Technologies Welcome to the three-part webinar series Next Generation Sequencing and its role in cancer biology Webinar 1: Next-generation sequencing, an introduction to technology and applications Date: April 4, 2013 Speaker: Quan Peng, Ph.D. Webinar 2: Date: Speaker: Next-generation sequencing for cancer research April 11, 2013 Vikram Devgan, Ph.D., MBA Webinar 3: Date: Speaker: Next-generation sequencing data analysis for genetic profiling April 18, 2013 Ravi Vijaya Satya, Ph.D. Title, Location, Date 2
  3. 3. Sample & Assay Technologies Agenda NGS Data Analysis Read Mapping Variant Calling Variant Annotation Targeted Enrichment GeneRead Gene Panels GeneRead Data Analysis Portal Background Workflow Data Interpretation 3
  4. 4. Sample & Assay Technologies Read Mapping Reads mapped to a reference genome Millions of reads from a single run Alignment Mapping Quality Programs for read-mapping Hash-based MAQ, ELAND, SOAP, Novoalign Suffix array/Burrows Wheeler Transform based BWA, BowTie, BowTie2, SOAP2 Title, Location, Date 4
  5. 5. Sample & Assay Technologies Variant Calling Determine if there is enough statistical support to call a variant Reference sequence ACAGTTAAGCCTGAACTAGACTAGGATCGTCCTAGATAGTCTCGATAGCTCGATATC Aligned reads AACTAGACTAGGATCGTCCTAGATAGTCTCG AACTAGACTAGGATCGTCCTACATAGTCTCG AACTAGACTAGGATCGTCCTACATAGTCTCG GATCGTCCTAGATAGTCTCGATAGCTCGAT GATCGTCCTAGATAGTCTCGATAGCTCGAT GATCGTCCTAGATAGTCTCGATAGCTCGAT Multiple factors are considered in calling variants No. of reads with the variant Mapping qualities of the reads Base qualities at the variant position Strand bias (variant is seen in only one of the strands) Variant Calling Software GATK Unified Genotyper, Torrent Variant Caller, SamTools, Mutect, … Title, Location, Date 5
  6. 6. Sample & Assay Technologies Variant Representation VCF – Variant Call Format http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 Header lines ##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> ##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias"> ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality"> ##INFO=<ID=OND,Number=1,Type=Float,Description="Overall non-diploid ratio (alleles/(alleles+nonalleles))"> ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth"> ##contig=<ID=chrM,length=16571,assembly=hg19> ##contig=<ID=chr1,length=249250621,assembly=hg19> ##contig=<ID=chr2,length=243199373,assembly=hg19> Column labels ##contig=<ID=chr3,length=198022430,assembly=hg19> ##contig=<ID=chr4,length=191154276,assembly=hg19> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 11181327 rs11121691 C T 100.0 PASS DP=1000;MQ=87.67 GT:AD:DP 0/1:146,45:191 chr1 11190646 rs2275527 G A 100.0 PASS DP=1000;MQ=67.38 GT:AD:DP 0/1:462,121:583 chr1 11205058 rs1057079 C T 100.0 PASS DP=1000;MQ=79.57 GT:AD:DP 0/1:49,143:192 Variant calls Title, Location, Date 6
  7. 7. Variant Annotation Sample & Assay Technologies dbSNP/COSMIC ID Chro m chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr2 R ef 11181327 rs11121691 C 11190646 rs2275527 G 11205058 rs1057079 C 11288758 rs1064261 G 11300344 rs191073707 C 11301714 rs1135172 A 11322628 rs2295080 G 186641626 rs2853805 G 186642429 rs2206593 A 186643058 rs5275 A 186645927 rs2066826 C 29415792 rs1728828 G Pos ID Alt T A T A T G T A G G T A Actual change and position within the codon or amino acid sequence Gene Mutation Name type MTOR SNP MTOR SNP MTOR SNP MTOR SNP MTOR SNP MTOR SNP MTOR SNP PTGS2 SNP PTGS2 SNP PTGS2 SNP PTGS2 SNP ALK SNP chr2 29416366 rs1881421 G C ALK SNP chr2 29416481 rs1881420 T C ALK SNP chr2 29416572 rs1670283 T C ALK SNP chr2 29419591 chr2 29445458 chr2 29446184 rs1670284 G T rs3795850 G T rs2276550 C G ALK ALK ALK SNP SNP SNP Effect of the variant on protein coding Codon AA Filtered Variant Allele Frequency snpEff Effect Change Change Coverage Frequency c.6909C>T p.L2303 C=0.761 T=0.239 SYNONYMOUS_CODING 1,924 0.239 c.5553G>A p.S1851 G=0.791 A=0.208 SYNONYMOUS_CODING 5,842 0.208 c.4731C>T p.A1577 C=0.254 T=0.746 SYNONYMOUS_CODING 1,928 0.746 c.2997G>A p.N999 G=0.212 A=0.788 SYNONYMOUS_CODING 5,186 0.788 C=0.924 T=0.076 INTRON 210 0.076 c.1437A>G p.D479 A=0.248 G=0.752 SYNONYMOUS_CODING 3,965 0.752 G=0.239 T=0.755 UPSTREAM 339 0.755 G=0.0 A=1.0 UTR_3_PRIME 97 1 A=0.167 G=0.833 UTR_3_PRIME 3,552 0.833 A=0.759 G=0.241 UTR_3_PRIME 237 0.241 C=0.88 T=0.12 INTRON 209 0.12 G=0.0 A=1.0 UTR_3_PRIME 2,520 1 NON_SYNONYMOUS_CODI c.4587G>C p.D1529E G=0.907 C=0.093 NG 4,361 0.093 NON_SYNONYMOUS_CODI c.4472T>C p.K1491R T=0.954 C=0.045 NG 3,061 0.045 NON_SYNONYMOUS_CODI c.4381T>C p.I1461V T=0.0 C=0.999 NG 5,834 0.999 G=0.093 T=0.907 INTRON 739 0.907 c.3375G>T p.G1125 G=0.917 T=0.082 SYNONYMOUS_CODING 1,776 0.082 C=0.895 G=0.105 INTRON 475 0.105 SIFT score Predicts the deleterious effect of an amino acid change based on how conserved the sequence is among related species Polyphen score Predicts the impact of the variant on protein structure Title, Location, Date 7
  8. 8. Sample & Assay Technologies GeneRead DNAseq Gene Panel: Targeted Sequencing What is targeted sequencing? Sequencing a sub set of regions in the whole-genome Why do we need targeted sequencing? Not all regions in the genome are of interest or relevant to specific study Exome Sequencing: sequencing most of the exonic regions of the genome (exome). Protein-coding regions constitute less than 2% of the entire genome Focused panel/hot spot sequencing: focused on the genes or regions of interest What are the advantages of focused panel sequencing? More coverage per sample, more sensitive mutation detection More samples per run, lower cost per sample Title, Location, Date 8
  9. 9. Sample & Assay Technologies Target Enrichment - Methodology Multiplex PCR Small DNA input (< 100ng) Short processing time (several hrs) Relatively small throughput (KB - MB region) Sample preparation (DNA isolation) PCR target enrichment (2 hours) Title, Location, Date Library construction Sequencing Data analysis 9
  10. 10. Sample & Assay Technologies Variants Identifiable through Multiplex PCR SNPs – single nucleotide polymorphisms Indels Indels < 20 bp in length Variants not callable Structural variants Large indels Inversions Copy Number Variants (CNVs) Large insertion Inversion CNV Title, Location, Date 10
  11. 11. Sample & Assay Technologies GeneRead DNAseq Gene Panel Multiplex PCR technology based targeted enrichment for DNA sequencing Cover all human exons (coding region + UTR) Division of gene primers sets into 4 tubes; up to 1200 plex in each tube 11
  12. 12. Sample & Assay Technologies GeneRead DNAseq Gene Panel Focus on your Disease of Interest Comprehensive Cancer Panel (124 genes) Disease Focused Gene Panels (20 genes) Breast cancer Colon Cancer Gastric cancer Leukemia Liver cancer Genes Involved in Disease Lung Cancer Ovarian Cancer Prostate Cancer Genes with High Relevance 12
  13. 13. Sample & Assay Technologies GeneRead DNAseq Custom Panel 13
  14. 14. Sample & Assay Technologies Data Analysis for Targeted Sequencing GeneRead data analysis work flow Read Mapping Primer Trimming Variant Calling Variant Annotation Read mapping Identify the possible position of the read within the reference Align the read sequence to reference sequences Primer trimming Remove primer sequences from the reads Variant calling Identify differences between the reference and reads Variant filtering and annotation Functional information about the variant Title, Location, Date 14
  15. 15. Sample & Assay Technologies Reads from Targeted Sequencing Typical NGS raw read from targeted sequencing Adapter Barcode Primer Insert sequence Primer Adapter -3’ 5’Removal of adapters and de-multiplexing Primer Insert sequence Primer -3’ 5’-3’ 5’Read length can vary: only part of the insert 5’or the 3’ primer may be present 5’- -3’ -3’ Title, Location, Date 15
  16. 16. Sample & Assay Technologies Read Mapping Align reads to the reference genome Reference sequence Amplicon 1 Amplicon 2 Aligned reads Title, Location, Date 16
  17. 17. Primer Trimming Sample & Assay Technologies Primer sequences must be trimmed for accurate variant calling Reference sequence Amplicon 1 Frequency of `C` without primer trimming = 4/13 = 31% C C C C Aligned reads Amplicon 2 Title, Location, Date Frequency of `C` after primer trimming = 4/7 = 57% 17
  18. 18. Sample & Assay Technologies GeneRead Variant Calling Overview BAM file (w/ flow space info) BED file for amplicons used Run Parameters Annotation Variant calling and filtering Torrent Variant Caller (TVC) vcf GATK Variant Annotator IonTorrent vcf snpEff (basic annotation) vcf Seq. platform vcf GATK Unified Genotyper MiSeq GATK Indel Realigner bam bam GATK Base Quality Recalibrator GATK Variant Filtration vcf Additional filtering (based on frequency and coverage) SnpSift (links to dbSNP, Cosmic and computation of Sift scores, etc.) VCF to excel dbSNP Cosmic dbNSFP Variants in excel format Title, Location, Date 18
  19. 19. Sample & Assay Technologies Indel realignment DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011 May; 13(5):191-8. PMID: 21178889 Eliminates some false-positive variant calls around indels Read aligners can not eliminate these alignment errors since they align reads independently Multiple sequence alignment can identify these errors and correct them Title, Location, Date 19
  20. 20. Sample & Assay Technologies Base Quality Recalibration DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011 May; 13(5):191-8. PMID: 21178889 Eliminates sequencer-specific biases Lane-specific/sample-specific biases Instrument-specific under-reporting/over-reporting of quality scores Systematic errors based on read position Di-nucleotide-specific sequencing errors Recalibrartion leads to improved variant calls Title, Location, Date 20
  21. 21. Sample & Assay Technologies Variant Filtration Variant Frequency Somatic mode SNPs with frequency < 4% and indels with frequency < 20% Germline mode SNPs with frequency < 20% and indels with frequency < 25% Strand Bias SNPs with FS ≤ 60 Indels with FS ≤ 200 Mapping Quality SNPs with MQ ≤ 40.0 C C C Haplotype Score SNPs with HaplotypeScore ≤ 13.0 Not applicable for pooled samples Title, Location, Date Strand Bias: variants that are present in reads from only one of the two strands 21
  22. 22. Sample & Assay Technologies Specificity Analysis Specificity: the percentage of sequences that map to the intended targets region of interest number of on-target reads / total number of reads Reference sequence ROI 1 ROI 2 NGS reads Off-target reads On-target reads Title, Location, Date On-target reads 22
  23. 23. Sample & Assay Technologies Sequencing Depth Coverage depth (or depth of coverage): how many times each base has been sequenced Unlike Sanger sequencing, in which each sample is sequenced 1-3 times to be confident of its nucleotide identity, NGS generally needs to cover each position many times to make a confident base call, due to relative high error rate (0.1 - 1% vs 0.001 – 0.01%) Increasing coverage depth is also helpful to identify low frequent mutations in heterogenous samples such as cancer sample Reference sequence NGS reads coverage depth = 4 Title, Location, Date coverage depth = 3 coverage depth = 2 23
  24. 24. Sample & Assay Technologies NGS Data Analysis: Uniformity Coverage uniformity: measure the evenness of the coverage depth of target position Reference sequence NGS reads coverage depth = 10 coverage depth = 3 coverage depth = 2 Title, Location, Date 24
  25. 25. Sample & Assay Technologies GeneRead Data Analysis Web Portal FREE Complete & Easy to use Data Analysis with Web-based Software 25
  26. 26. Sample & Assay Technologies GeneRead Data Analysis Web Portal Title, Location, Date 26
  27. 27. Sample & Assay Technologies Title, Location, Date 27
  28. 28. Sample & Assay Technologies Title, Location, Date 28
  29. 29. Sample & Assay Technologies Note: Runtimes depend on the number of reads in the input files. Typical runtimes are 20-60 minutes. Title, Location, Date 29
  30. 30. Sample & Assay Technologies Title, Location, Date 30
  31. 31. Sample & Assay Technologies Title, Location, Date 31
  32. 32. Sample & Assay Technologies Title, Location, Date 32
  33. 33. Sample & Assay Technologies Summary Run Summary Specificity Coverage Uniformity Numbers of SNPs and Indels Summary By Gene Specificity Coverage Uniformity # of SNPs and Indels 33
  34. 34. Sample & Assay Technologies Features of Variant Report SNP detection Indel detection 34
  35. 35. Sample & Assay Technologies QIAGEN’s GeneRead DNAseq Gene Panel System FOCUS ON YOUR RELEVANT GENES Focused: Biologically relevant content selection enables deep sequencing on relevant genes and identification of rare mutations Flexible: Mix and match any gene of interest NGS platform independent: Functionally validated for PGM, MiSeq/HiSeq Integrated controls: Enabling quality control of prepared library before sequencing Free, complete and easy of use data analysis tool
  36. 36. Sample & Assay Technologies Upcoming webinars Next Generation Sequencing and its role in cancer biology Webinar 3: Date: Speaker: Next-generation sequencing data analysis for genetic profiling April 18, 2013 Ravi Vijaya Satya, Ph.D. Webinar 1: Next-generation sequencing, an introduction to technology and applications Date: May 3, 2013 Speaker: Quan Peng, Ph.D. Webinar 2: Date: Speaker: Next-generation sequencing for cancer research May 10, 2013 Vikram Devgan, Ph.D., MBA Title, Location, Date 36
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×