SlideShare a Scribd company logo
Analysis of PCR Duplicates and Library Diversity in RNA-Seq 
Smita Pathak1, Irina Khrebtukova1, Angelica Barr1, Felix Schlesinger1, Tim Hill1, Lisa Watson1, 
Abstract 
In DNA sequencing, duplicates or reads that map to the same position are discarded but in RNA sequencing (RNA-Seq), 
these reads can represent highly expressed genes. The issue of duplicates in RNA-Seq is even more complicated in low 
input or degraded samples. Higher percentages of duplicates in very low input and degraded samples are routinely 
observed in RNA-Seq using standard bioinformatics tools such as Picard but the source of duplicates is commonly 
misunderstood. Under normal assay conditions, and with recommended input levels, three different RNA-Seq assays give 
different apparent numbers of duplicates on the same standard UHRR and Brain RNA samples. These differences are not 
necessarily due to PCR artifacts but occur because of the differences in complexity between the coding regions, the 
mRNA, and the total RNA of a cell. When we measure true PCR duplicates using a molecular barcoding approach, it 
becomes clear that there are much lower levels of potential PCR duplicates in standard RNA-Seq preps. However, we find 
that when reducing input amounts for any of these three assays to 10ng or less, we observe dramatic increases in 
percentage of duplicates. This value then becomes an important metric for overall efficacy of the experiment. 
I. UMI Barcoding 
The standard TruSeq® forked adaptor was modified to include a Unique Molecular Index (UMI), a 5 base random N 
sequence in the index of read 1. The read 2 index was not modified, allowing pooled samples to demultiplex by read 2 
only. The sequence of the UMI tag was used in combination with alignment information to count true PCR duplicates. Only 
fragments that had the same alignment position and the same UMI sequence were considered true PCR duplicates. When 
using the Illumina TopHat BaseSpace® Application, the duplicates metric is calculated at a read depth of 4M reads. 
Figure 1: UMI Barcode in read1 Index of TruSeq forked adaptor allows separation of PCR duplicates 
from “apparent” duplicates 
Ryan Kelley1, Tatjana Singer1, and Gary P. Schroth1 
1Illumina, Inc. 5200 Illumina Way, San Diego, CA 92122 
Modied Forked Adaptor 
NNNNN 
NNNNN 
II. Three RNA Sample Preparation Workflows 
UMIs were used to track individual molecules through three different sample preparation workflows. TruSeq Stranded 
mRNA uses oligo dT beads to capture poly-A tails of RNA, TruSeq RNA Access uses enrichment to select for the coding 
region of the transcriptome using capture probes followed by purification with magnetic streptavidin beads, and the 
TruSeq Stranded Total RNA workflow removes rRNA and mtRNA via specific cRNA probes and removal with capture 
beads. 
A B 
Total RNA 
5’ 3’ 
TruSeq Total RNA 
TruSeq RNA Access cDNA Library from Total RNA 
Fragmentation 
(Fresh Frozen RNA) 
Priming with random hexamers 
5’ 3’ 
TruSeq mRNA 
5’ 3’ 
AAAAAAAA 
TTTTTTTT 
DNA-RNA Hybrid First Strand Synthesis 
Second Strand Synthesis 
3’ 5’ 
5’ 3’ 
with dUTP Double Stranded cDNA U U U U 
3’ 5’ 
cDNA with Forked Adaptor U U U U A- Tailing and Adaptor Ligation 
PCR 
Fragmented RNA/FFPE 
3’ 5’ 
5’ 3’ 
3’ 5’ 
p7 Adaptor p5 Adaptor 
Final cDNA Library 
with Strand Specicity 
3’ 5’ 
Hybridization with Biotinylated 
Exome Capture Probes 
Streptavidin - Magnetic 
Bead Binding 
Biotinyated Probe Hybrid 
Capture 
3’ 
3’ 5’ 
Removal of unbound and nonspecically 
bound material by heated washing 
Elution from Bead 
PCR 
Final exome-targeted 
cDNA Library 
5’ 
3’ 5’ 
3’ 5’ 
p7 Adaptor p5 Adaptor 
Sequencing 
for mRNA/Total RNA 
Sequencing for RNA Access 
Figure 2: Sample Preparation Workflows 
(A) Library Preparation for 3 different workflows: mRNA selects for coding regions of RNA via poly-A selection, RNA Access selects by enrichment, 
and the Total RNA workflow depletes rRNA/mtRNA. Sequencing is performed after library preparation for mRNA and Total RNA workflows. 
(B) Enrichment workflow for RNA Access only 
III. PCR Cycling Study 
In order to determine the effects of PCR cycling, we used the TruSeq Stranded mRNA workflow with the standard 100ng 
input and varied the number of PCR cycles from 0 to 35 cycles in increments of 5 cycles. All samples were sequenced 
on an Illumina NextSeq® 500 sequencing system, using 2 x 76 bp paired-end run. Universal Human Reference RNA 
(UHRR) and Human Brain RNA (Brain) samples all had less than 6% duplicates as measured with the UMI across all 
cycling conditions, based on our standard BaseSpace TopHat Alignment Application. Both showed the same trend of 
increasing percent duplicates with increasing PCR cycles from less than 1% UMI duplicates at 0 cycles to 6% at 10 cycles. 
After 10 cycles, no increase in duplicates was observed. Note that in standard TruSeq RNA prep kits, we only recommend 
15 cycles of PCR. 
A 
B C 
D E F 
Figure 3: Duplicates from PCR Cycling Study 
(A) Duplicates and Yield for UHRR with increasing number of PCR cycles. Yield increases dramatically but % duplicates does not increase. 
(B) Duplicates and Yield for Brain with increaseing number of PCR Cycles. Yield increases but % duplicates does not. 
(C) Differential Expression of UHRR to Brain for 0 vs 35 cycles of PCR. 
(D-F) FPKM Correlation of low amounts of PCR (0 vs. 5 cycles), high amounts of PCR (20 vs. 25 cycles), and low vs. high (0 vs. 35 cycles). 
IV. Effect of Lower Input on PCR Duplicates 
In order to test the effect of duplicates, we pushed the lower limits of input for all of the protocols shown in Figure 2. For 
instance, for the TruSeq Stranded mRNA kit, we overloaded the kit with with 500% of the recommended input amount 
(100ng) as well as under-loaded with 3% of the recommended input amount. These experiments are summarized in Table 
1 below. All inputs were run with replicates for both UHRR and Brain. All of the samples were generated using an 
automated version of the protocol on the Hamilton Star Liquid Handling Workstation. 
Sample Prep Method Sample Type RNA Input Sequencing 
TruSeq RNA Access UHRR and Brain 0.3, 2.5, 10, 50ng 2 x 76, NextSeq 500 
TruSeq Stranded Total RNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 
TruSeq Stranded mRNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 
Table 1: Summary of Low Input Experimental Conditions. The recommended input amount is highlighted. 
A B C 
FPKM Correlation: 3ng vs. 3ng FPKM Correlation: 100ng vs. 3ng FPKM Correlation: 100ng vs. 100ng 
D E 
Figure 4: Duplicates in Low Input Conditions of three TruSeq Workflows 
(A) FPKM correlation of 3ng vs. 3ng replicate condition in TruSeq mRNA workflow (B) FPKM correlation of 100ng vs. 3ng condition 
in TruSeq mRNA workflow (C) FPKM correlation of 100ng vs. 100ng replicate condition in TruSeq mRNA workflow (D) Differential 
Expression correlation of 100ng input vs. 3ng input for TruSeq mRNA workflow (E) Plot of % Duplicates vs. Read Number for 
different input conditions (TruSeq mRNA workflow) 
V. A Closer Look at Duplicates 
In order to test whether or not duplicate removal makes a difference in the final data, we used a standard tool to remove 
duplicates (Picard Tools). We calculated differential expression ratios of UHRR to Brain and compared the data with or 
without duplicate removal. For all input levels tested, we found good correlation of the data with or without removal of 
duplicates. Finally, we show examples of two genes at different input levels with or without duplicate removal in the 
Integrative Genomics Brower (IGV). 
A 
B 
Figure 5: Comparison of Data with Duplicates Removed to Data Without Duplicates Removed 
(A) Differential Expression plots or log2(fold change) of UHRR to Brain of samples with duplicates removed compared to samples without 
duplicates removed at 3 different input conditions: 5100ng, 25ng and 3ng. Data shows that removing duplicates from the data still has good 
correlation with data without duplicates removed. Unique vs duplicate data. 
(B) IGV browser shots of two different genes (GAPDH and ApoE), sequenced at 40M reads, at 2 different input conditions: 100ng and 3ng. 
For each input condition, data is shown without duplicates removed, duplicates only, and with duplicates removed. For the 100ng condition, 
the “duplicates only” track represents 49% of the reads whereas the “no duplicates” track represents 51% of the reads. For the 3ng condition, 
the “duplicates only” track represents 82% of the reads whereas the “no duplicates” track represents 18% of the reads. Data shows that 
duplicates are not biased and are amplified uniformly by PCR. 
VI. Conclusions 
The issue of PCR duplicates in RNA-Seq has been a concern for the field for many years. Our study shows that PCR 
cycling itself has very little effects on absolute numbers of dupliates under recommended assay conditions (Section III). 
Even under conditions where we create duplicates, such as low input, as described in Sections IV and V, the duplicated 
data still accurately calls gene expression levels. Duplicates are amplified uniformly and the percent duplicates becomes 
more of a measure of lack of complexity of the input sample than a measure of PCR bias. 
FOR RESEARCH USE ONLY © 2014 Illumina, Inc. All rights reserved. 
Illumina, HiSeq, MiSeq, Nextera, and the pumpkin orange color are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.

More Related Content

What's hot

RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
Paolo Dametto
 
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
QIAGEN
 
Dominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy
 
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNAAn NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
Thermo Fisher Scientific
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
Douglas Wu
 
19_21Translation
19_21Translation19_21Translation
19_21TranslationKaren Lewis
 
PCR Types
PCR TypesPCR Types
6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfpHimanshu Sethi
 
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationZachary Smith
 
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
Christine Canet
 
How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625
Biogazelle
 
Technical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the GuideTechnical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the Guide
QIAGEN
 
Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)
Rishabh Jain
 
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Thermo Fisher Scientific
 
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...Zachary Smith
 
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
QIAGEN
 
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
QIAGEN
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 

What's hot (20)

RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Q biomarkercn
Q biomarkercnQ biomarkercn
Q biomarkercn
 
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
 
Dominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy PCR paper
Dominique McCoy PCR paper
 
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNAAn NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
 
19_21Translation
19_21Translation19_21Translation
19_21Translation
 
PCR Types
PCR TypesPCR Types
PCR Types
 
6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp
 
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
 
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
 
How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625
 
JClinChem_2003
JClinChem_2003JClinChem_2003
JClinChem_2003
 
Technical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the GuideTechnical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the Guide
 
Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)
 
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
 
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
 
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
 
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 

Similar to Ashg poster sp_compressed

A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
Thermo Fisher Scientific
 
Microarray validation
Microarray validationMicroarray validation
Microarray validationElsa von Licy
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
CharupriyaChauhan1
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Thermo Fisher Scientific
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Jonathan Eisen
 
PCR
PCRPCR
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers
Darya Vanichkina
 
Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...
Thermo Fisher Scientific
 
Prabhakar singh ii sem-paper v-rt pcr
Prabhakar singh  ii sem-paper v-rt pcrPrabhakar singh  ii sem-paper v-rt pcr
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-posterElsa von Licy
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-posterElsa von Licy
 
Polymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCRPolymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCR
Ahmad Qudah
 
Real time pcr
Real time pcrReal time pcr
Real time pcr
ali h
 
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell SamplesRapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Thermo Fisher Scientific
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochureElsa von Licy
 
POLYMERASE CHAIN REACTION.pptx
POLYMERASE   CHAIN   REACTION.pptxPOLYMERASE   CHAIN   REACTION.pptx
POLYMERASE CHAIN REACTION.pptx
Harsharankaur36
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
Koppolu Ravi
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
KAVIRAJ M
 
Overview of the glossary related to pcr
Overview of the glossary related to pcrOverview of the glossary related to pcr
Overview of the glossary related to pcr
MohammadAtif41
 

Similar to Ashg poster sp_compressed (20)

A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
PCR
PCRPCR
PCR
 
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers
 
Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...
 
Prabhakar singh ii sem-paper v-rt pcr
Prabhakar singh  ii sem-paper v-rt pcrPrabhakar singh  ii sem-paper v-rt pcr
Prabhakar singh ii sem-paper v-rt pcr
 
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-poster
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-poster
 
Polymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCRPolymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCR
 
Som aacr2011poster
Som aacr2011posterSom aacr2011poster
Som aacr2011poster
 
Real time pcr
Real time pcrReal time pcr
Real time pcr
 
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell SamplesRapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochure
 
POLYMERASE CHAIN REACTION.pptx
POLYMERASE   CHAIN   REACTION.pptxPOLYMERASE   CHAIN   REACTION.pptx
POLYMERASE CHAIN REACTION.pptx
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
 
Overview of the glossary related to pcr
Overview of the glossary related to pcrOverview of the glossary related to pcr
Overview of the glossary related to pcr
 

Recently uploaded

Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 

Recently uploaded (20)

Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 

Ashg poster sp_compressed

  • 1. Analysis of PCR Duplicates and Library Diversity in RNA-Seq Smita Pathak1, Irina Khrebtukova1, Angelica Barr1, Felix Schlesinger1, Tim Hill1, Lisa Watson1, Abstract In DNA sequencing, duplicates or reads that map to the same position are discarded but in RNA sequencing (RNA-Seq), these reads can represent highly expressed genes. The issue of duplicates in RNA-Seq is even more complicated in low input or degraded samples. Higher percentages of duplicates in very low input and degraded samples are routinely observed in RNA-Seq using standard bioinformatics tools such as Picard but the source of duplicates is commonly misunderstood. Under normal assay conditions, and with recommended input levels, three different RNA-Seq assays give different apparent numbers of duplicates on the same standard UHRR and Brain RNA samples. These differences are not necessarily due to PCR artifacts but occur because of the differences in complexity between the coding regions, the mRNA, and the total RNA of a cell. When we measure true PCR duplicates using a molecular barcoding approach, it becomes clear that there are much lower levels of potential PCR duplicates in standard RNA-Seq preps. However, we find that when reducing input amounts for any of these three assays to 10ng or less, we observe dramatic increases in percentage of duplicates. This value then becomes an important metric for overall efficacy of the experiment. I. UMI Barcoding The standard TruSeq® forked adaptor was modified to include a Unique Molecular Index (UMI), a 5 base random N sequence in the index of read 1. The read 2 index was not modified, allowing pooled samples to demultiplex by read 2 only. The sequence of the UMI tag was used in combination with alignment information to count true PCR duplicates. Only fragments that had the same alignment position and the same UMI sequence were considered true PCR duplicates. When using the Illumina TopHat BaseSpace® Application, the duplicates metric is calculated at a read depth of 4M reads. Figure 1: UMI Barcode in read1 Index of TruSeq forked adaptor allows separation of PCR duplicates from “apparent” duplicates Ryan Kelley1, Tatjana Singer1, and Gary P. Schroth1 1Illumina, Inc. 5200 Illumina Way, San Diego, CA 92122 Modied Forked Adaptor NNNNN NNNNN II. Three RNA Sample Preparation Workflows UMIs were used to track individual molecules through three different sample preparation workflows. TruSeq Stranded mRNA uses oligo dT beads to capture poly-A tails of RNA, TruSeq RNA Access uses enrichment to select for the coding region of the transcriptome using capture probes followed by purification with magnetic streptavidin beads, and the TruSeq Stranded Total RNA workflow removes rRNA and mtRNA via specific cRNA probes and removal with capture beads. A B Total RNA 5’ 3’ TruSeq Total RNA TruSeq RNA Access cDNA Library from Total RNA Fragmentation (Fresh Frozen RNA) Priming with random hexamers 5’ 3’ TruSeq mRNA 5’ 3’ AAAAAAAA TTTTTTTT DNA-RNA Hybrid First Strand Synthesis Second Strand Synthesis 3’ 5’ 5’ 3’ with dUTP Double Stranded cDNA U U U U 3’ 5’ cDNA with Forked Adaptor U U U U A- Tailing and Adaptor Ligation PCR Fragmented RNA/FFPE 3’ 5’ 5’ 3’ 3’ 5’ p7 Adaptor p5 Adaptor Final cDNA Library with Strand Specicity 3’ 5’ Hybridization with Biotinylated Exome Capture Probes Streptavidin - Magnetic Bead Binding Biotinyated Probe Hybrid Capture 3’ 3’ 5’ Removal of unbound and nonspecically bound material by heated washing Elution from Bead PCR Final exome-targeted cDNA Library 5’ 3’ 5’ 3’ 5’ p7 Adaptor p5 Adaptor Sequencing for mRNA/Total RNA Sequencing for RNA Access Figure 2: Sample Preparation Workflows (A) Library Preparation for 3 different workflows: mRNA selects for coding regions of RNA via poly-A selection, RNA Access selects by enrichment, and the Total RNA workflow depletes rRNA/mtRNA. Sequencing is performed after library preparation for mRNA and Total RNA workflows. (B) Enrichment workflow for RNA Access only III. PCR Cycling Study In order to determine the effects of PCR cycling, we used the TruSeq Stranded mRNA workflow with the standard 100ng input and varied the number of PCR cycles from 0 to 35 cycles in increments of 5 cycles. All samples were sequenced on an Illumina NextSeq® 500 sequencing system, using 2 x 76 bp paired-end run. Universal Human Reference RNA (UHRR) and Human Brain RNA (Brain) samples all had less than 6% duplicates as measured with the UMI across all cycling conditions, based on our standard BaseSpace TopHat Alignment Application. Both showed the same trend of increasing percent duplicates with increasing PCR cycles from less than 1% UMI duplicates at 0 cycles to 6% at 10 cycles. After 10 cycles, no increase in duplicates was observed. Note that in standard TruSeq RNA prep kits, we only recommend 15 cycles of PCR. A B C D E F Figure 3: Duplicates from PCR Cycling Study (A) Duplicates and Yield for UHRR with increasing number of PCR cycles. Yield increases dramatically but % duplicates does not increase. (B) Duplicates and Yield for Brain with increaseing number of PCR Cycles. Yield increases but % duplicates does not. (C) Differential Expression of UHRR to Brain for 0 vs 35 cycles of PCR. (D-F) FPKM Correlation of low amounts of PCR (0 vs. 5 cycles), high amounts of PCR (20 vs. 25 cycles), and low vs. high (0 vs. 35 cycles). IV. Effect of Lower Input on PCR Duplicates In order to test the effect of duplicates, we pushed the lower limits of input for all of the protocols shown in Figure 2. For instance, for the TruSeq Stranded mRNA kit, we overloaded the kit with with 500% of the recommended input amount (100ng) as well as under-loaded with 3% of the recommended input amount. These experiments are summarized in Table 1 below. All inputs were run with replicates for both UHRR and Brain. All of the samples were generated using an automated version of the protocol on the Hamilton Star Liquid Handling Workstation. Sample Prep Method Sample Type RNA Input Sequencing TruSeq RNA Access UHRR and Brain 0.3, 2.5, 10, 50ng 2 x 76, NextSeq 500 TruSeq Stranded Total RNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 TruSeq Stranded mRNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 Table 1: Summary of Low Input Experimental Conditions. The recommended input amount is highlighted. A B C FPKM Correlation: 3ng vs. 3ng FPKM Correlation: 100ng vs. 3ng FPKM Correlation: 100ng vs. 100ng D E Figure 4: Duplicates in Low Input Conditions of three TruSeq Workflows (A) FPKM correlation of 3ng vs. 3ng replicate condition in TruSeq mRNA workflow (B) FPKM correlation of 100ng vs. 3ng condition in TruSeq mRNA workflow (C) FPKM correlation of 100ng vs. 100ng replicate condition in TruSeq mRNA workflow (D) Differential Expression correlation of 100ng input vs. 3ng input for TruSeq mRNA workflow (E) Plot of % Duplicates vs. Read Number for different input conditions (TruSeq mRNA workflow) V. A Closer Look at Duplicates In order to test whether or not duplicate removal makes a difference in the final data, we used a standard tool to remove duplicates (Picard Tools). We calculated differential expression ratios of UHRR to Brain and compared the data with or without duplicate removal. For all input levels tested, we found good correlation of the data with or without removal of duplicates. Finally, we show examples of two genes at different input levels with or without duplicate removal in the Integrative Genomics Brower (IGV). A B Figure 5: Comparison of Data with Duplicates Removed to Data Without Duplicates Removed (A) Differential Expression plots or log2(fold change) of UHRR to Brain of samples with duplicates removed compared to samples without duplicates removed at 3 different input conditions: 5100ng, 25ng and 3ng. Data shows that removing duplicates from the data still has good correlation with data without duplicates removed. Unique vs duplicate data. (B) IGV browser shots of two different genes (GAPDH and ApoE), sequenced at 40M reads, at 2 different input conditions: 100ng and 3ng. For each input condition, data is shown without duplicates removed, duplicates only, and with duplicates removed. For the 100ng condition, the “duplicates only” track represents 49% of the reads whereas the “no duplicates” track represents 51% of the reads. For the 3ng condition, the “duplicates only” track represents 82% of the reads whereas the “no duplicates” track represents 18% of the reads. Data shows that duplicates are not biased and are amplified uniformly by PCR. VI. Conclusions The issue of PCR duplicates in RNA-Seq has been a concern for the field for many years. Our study shows that PCR cycling itself has very little effects on absolute numbers of dupliates under recommended assay conditions (Section III). Even under conditions where we create duplicates, such as low input, as described in Sections IV and V, the duplicated data still accurately calls gene expression levels. Duplicates are amplified uniformly and the percent duplicates becomes more of a measure of lack of complexity of the input sample than a measure of PCR bias. FOR RESEARCH USE ONLY © 2014 Illumina, Inc. All rights reserved. Illumina, HiSeq, MiSeq, Nextera, and the pumpkin orange color are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.