SlideShare a Scribd company logo
1 of 1
Download to read offline
Analysis of PCR Duplicates and Library Diversity in RNA-Seq 
Smita Pathak1, Irina Khrebtukova1, Angelica Barr1, Felix Schlesinger1, Tim Hill1, Lisa Watson1, 
Abstract 
In DNA sequencing, duplicates or reads that map to the same position are discarded but in RNA sequencing (RNA-Seq), 
these reads can represent highly expressed genes. The issue of duplicates in RNA-Seq is even more complicated in low 
input or degraded samples. Higher percentages of duplicates in very low input and degraded samples are routinely 
observed in RNA-Seq using standard bioinformatics tools such as Picard but the source of duplicates is commonly 
misunderstood. Under normal assay conditions, and with recommended input levels, three different RNA-Seq assays give 
different apparent numbers of duplicates on the same standard UHRR and Brain RNA samples. These differences are not 
necessarily due to PCR artifacts but occur because of the differences in complexity between the coding regions, the 
mRNA, and the total RNA of a cell. When we measure true PCR duplicates using a molecular barcoding approach, it 
becomes clear that there are much lower levels of potential PCR duplicates in standard RNA-Seq preps. However, we find 
that when reducing input amounts for any of these three assays to 10ng or less, we observe dramatic increases in 
percentage of duplicates. This value then becomes an important metric for overall efficacy of the experiment. 
I. UMI Barcoding 
The standard TruSeq® forked adaptor was modified to include a Unique Molecular Index (UMI), a 5 base random N 
sequence in the index of read 1. The read 2 index was not modified, allowing pooled samples to demultiplex by read 2 
only. The sequence of the UMI tag was used in combination with alignment information to count true PCR duplicates. Only 
fragments that had the same alignment position and the same UMI sequence were considered true PCR duplicates. When 
using the Illumina TopHat BaseSpace® Application, the duplicates metric is calculated at a read depth of 4M reads. 
Figure 1: UMI Barcode in read1 Index of TruSeq forked adaptor allows separation of PCR duplicates 
from “apparent” duplicates 
Ryan Kelley1, Tatjana Singer1, and Gary P. Schroth1 
1Illumina, Inc. 5200 Illumina Way, San Diego, CA 92122 
Modied Forked Adaptor 
NNNNN 
NNNNN 
II. Three RNA Sample Preparation Workflows 
UMIs were used to track individual molecules through three different sample preparation workflows. TruSeq Stranded 
mRNA uses oligo dT beads to capture poly-A tails of RNA, TruSeq RNA Access uses enrichment to select for the coding 
region of the transcriptome using capture probes followed by purification with magnetic streptavidin beads, and the 
TruSeq Stranded Total RNA workflow removes rRNA and mtRNA via specific cRNA probes and removal with capture 
beads. 
A B 
Total RNA 
5’ 3’ 
TruSeq Total RNA 
TruSeq RNA Access cDNA Library from Total RNA 
Fragmentation 
(Fresh Frozen RNA) 
Priming with random hexamers 
5’ 3’ 
TruSeq mRNA 
5’ 3’ 
AAAAAAAA 
TTTTTTTT 
DNA-RNA Hybrid First Strand Synthesis 
Second Strand Synthesis 
3’ 5’ 
5’ 3’ 
with dUTP Double Stranded cDNA U U U U 
3’ 5’ 
cDNA with Forked Adaptor U U U U A- Tailing and Adaptor Ligation 
PCR 
Fragmented RNA/FFPE 
3’ 5’ 
5’ 3’ 
3’ 5’ 
p7 Adaptor p5 Adaptor 
Final cDNA Library 
with Strand Specicity 
3’ 5’ 
Hybridization with Biotinylated 
Exome Capture Probes 
Streptavidin - Magnetic 
Bead Binding 
Biotinyated Probe Hybrid 
Capture 
3’ 
3’ 5’ 
Removal of unbound and nonspecically 
bound material by heated washing 
Elution from Bead 
PCR 
Final exome-targeted 
cDNA Library 
5’ 
3’ 5’ 
3’ 5’ 
p7 Adaptor p5 Adaptor 
Sequencing 
for mRNA/Total RNA 
Sequencing for RNA Access 
Figure 2: Sample Preparation Workflows 
(A) Library Preparation for 3 different workflows: mRNA selects for coding regions of RNA via poly-A selection, RNA Access selects by enrichment, 
and the Total RNA workflow depletes rRNA/mtRNA. Sequencing is performed after library preparation for mRNA and Total RNA workflows. 
(B) Enrichment workflow for RNA Access only 
III. PCR Cycling Study 
In order to determine the effects of PCR cycling, we used the TruSeq Stranded mRNA workflow with the standard 100ng 
input and varied the number of PCR cycles from 0 to 35 cycles in increments of 5 cycles. All samples were sequenced 
on an Illumina NextSeq® 500 sequencing system, using 2 x 76 bp paired-end run. Universal Human Reference RNA 
(UHRR) and Human Brain RNA (Brain) samples all had less than 6% duplicates as measured with the UMI across all 
cycling conditions, based on our standard BaseSpace TopHat Alignment Application. Both showed the same trend of 
increasing percent duplicates with increasing PCR cycles from less than 1% UMI duplicates at 0 cycles to 6% at 10 cycles. 
After 10 cycles, no increase in duplicates was observed. Note that in standard TruSeq RNA prep kits, we only recommend 
15 cycles of PCR. 
A 
B C 
D E F 
Figure 3: Duplicates from PCR Cycling Study 
(A) Duplicates and Yield for UHRR with increasing number of PCR cycles. Yield increases dramatically but % duplicates does not increase. 
(B) Duplicates and Yield for Brain with increaseing number of PCR Cycles. Yield increases but % duplicates does not. 
(C) Differential Expression of UHRR to Brain for 0 vs 35 cycles of PCR. 
(D-F) FPKM Correlation of low amounts of PCR (0 vs. 5 cycles), high amounts of PCR (20 vs. 25 cycles), and low vs. high (0 vs. 35 cycles). 
IV. Effect of Lower Input on PCR Duplicates 
In order to test the effect of duplicates, we pushed the lower limits of input for all of the protocols shown in Figure 2. For 
instance, for the TruSeq Stranded mRNA kit, we overloaded the kit with with 500% of the recommended input amount 
(100ng) as well as under-loaded with 3% of the recommended input amount. These experiments are summarized in Table 
1 below. All inputs were run with replicates for both UHRR and Brain. All of the samples were generated using an 
automated version of the protocol on the Hamilton Star Liquid Handling Workstation. 
Sample Prep Method Sample Type RNA Input Sequencing 
TruSeq RNA Access UHRR and Brain 0.3, 2.5, 10, 50ng 2 x 76, NextSeq 500 
TruSeq Stranded Total RNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 
TruSeq Stranded mRNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 
Table 1: Summary of Low Input Experimental Conditions. The recommended input amount is highlighted. 
A B C 
FPKM Correlation: 3ng vs. 3ng FPKM Correlation: 100ng vs. 3ng FPKM Correlation: 100ng vs. 100ng 
D E 
Figure 4: Duplicates in Low Input Conditions of three TruSeq Workflows 
(A) FPKM correlation of 3ng vs. 3ng replicate condition in TruSeq mRNA workflow (B) FPKM correlation of 100ng vs. 3ng condition 
in TruSeq mRNA workflow (C) FPKM correlation of 100ng vs. 100ng replicate condition in TruSeq mRNA workflow (D) Differential 
Expression correlation of 100ng input vs. 3ng input for TruSeq mRNA workflow (E) Plot of % Duplicates vs. Read Number for 
different input conditions (TruSeq mRNA workflow) 
V. A Closer Look at Duplicates 
In order to test whether or not duplicate removal makes a difference in the final data, we used a standard tool to remove 
duplicates (Picard Tools). We calculated differential expression ratios of UHRR to Brain and compared the data with or 
without duplicate removal. For all input levels tested, we found good correlation of the data with or without removal of 
duplicates. Finally, we show examples of two genes at different input levels with or without duplicate removal in the 
Integrative Genomics Brower (IGV). 
A 
B 
Figure 5: Comparison of Data with Duplicates Removed to Data Without Duplicates Removed 
(A) Differential Expression plots or log2(fold change) of UHRR to Brain of samples with duplicates removed compared to samples without 
duplicates removed at 3 different input conditions: 5100ng, 25ng and 3ng. Data shows that removing duplicates from the data still has good 
correlation with data without duplicates removed. Unique vs duplicate data. 
(B) IGV browser shots of two different genes (GAPDH and ApoE), sequenced at 40M reads, at 2 different input conditions: 100ng and 3ng. 
For each input condition, data is shown without duplicates removed, duplicates only, and with duplicates removed. For the 100ng condition, 
the “duplicates only” track represents 49% of the reads whereas the “no duplicates” track represents 51% of the reads. For the 3ng condition, 
the “duplicates only” track represents 82% of the reads whereas the “no duplicates” track represents 18% of the reads. Data shows that 
duplicates are not biased and are amplified uniformly by PCR. 
VI. Conclusions 
The issue of PCR duplicates in RNA-Seq has been a concern for the field for many years. Our study shows that PCR 
cycling itself has very little effects on absolute numbers of dupliates under recommended assay conditions (Section III). 
Even under conditions where we create duplicates, such as low input, as described in Sections IV and V, the duplicated 
data still accurately calls gene expression levels. Duplicates are amplified uniformly and the percent duplicates becomes 
more of a measure of lack of complexity of the input sample than a measure of PCR bias. 
FOR RESEARCH USE ONLY © 2014 Illumina, Inc. All rights reserved. 
Illumina, HiSeq, MiSeq, Nextera, and the pumpkin orange color are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.

More Related Content

What's hot

RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...QIAGEN
 
Dominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy
 
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNAAn NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNAThermo Fisher Scientific
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqDouglas Wu
 
19_21Translation
19_21Translation19_21Translation
19_21TranslationKaren Lewis
 
6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfpHimanshu Sethi
 
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationZachary Smith
 
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...Christine Canet
 
How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625Biogazelle
 
Technical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the GuideTechnical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the GuideQIAGEN
 
Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)Rishabh Jain
 
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...Thermo Fisher Scientific
 
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...Zachary Smith
 
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...QIAGEN
 
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical NoteQIAGEN
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 

What's hot (20)

RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Q biomarkercn
Q biomarkercnQ biomarkercn
Q biomarkercn
 
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
Multicopy reference assay (MRef) — a superior normalizer of sample input in D...
 
Dominique McCoy PCR paper
Dominique McCoy PCR paperDominique McCoy PCR paper
Dominique McCoy PCR paper
 
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNAAn NGS workflow to detect down to 0.1% allelic frequency in cfDNA
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
 
19_21Translation
19_21Translation19_21Translation
19_21Translation
 
PCR Types
PCR TypesPCR Types
PCR Types
 
6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp6236_protein-expression-vs-gfp
6236_protein-expression-vs-gfp
 
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
 
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
A new specific and low cost technique to detect alk, ros, and ret rearrangeme...
 
How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625How to do successful gene expression analysis - Siena 20100625
How to do successful gene expression analysis - Siena 20100625
 
JClinChem_2003
JClinChem_2003JClinChem_2003
JClinChem_2003
 
Technical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the GuideTechnical Guide to Qiagen PCR Arrays - Download the Guide
Technical Guide to Qiagen PCR Arrays - Download the Guide
 
Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)Lectut btn-202-ppt-l29. applications of pcr-i (1)
Lectut btn-202-ppt-l29. applications of pcr-i (1)
 
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
 
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
NEBNext Ultra Directional RNA Library Kit for Illumina NGS_Biomek FXP Automat...
 
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...
 
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note10 Tips to maximize your Real Time PCR Success - Download the Technical Note
10 Tips to maximize your Real Time PCR Success - Download the Technical Note
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 

Similar to Ashg poster sp_compressed

A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...Thermo Fisher Scientific
 
Microarray validation
Microarray validationMicroarray validation
Microarray validationElsa von Licy
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsThermo Fisher Scientific
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers Darya Vanichkina
 
Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...Thermo Fisher Scientific
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-posterElsa von Licy
 
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-posterElsa von Licy
 
Polymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCRPolymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCRAhmad Qudah
 
Real time pcr
Real time pcrReal time pcr
Real time pcrali h
 
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell SamplesRapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell SamplesThermo Fisher Scientific
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochureElsa von Licy
 
POLYMERASE CHAIN REACTION.pptx
POLYMERASE   CHAIN   REACTION.pptxPOLYMERASE   CHAIN   REACTION.pptx
POLYMERASE CHAIN REACTION.pptxHarsharankaur36
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012Koppolu Ravi
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reactionKAVIRAJ M
 
Overview of the glossary related to pcr
Overview of the glossary related to pcrOverview of the glossary related to pcr
Overview of the glossary related to pcrMohammadAtif41
 

Similar to Ashg poster sp_compressed (20)

A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...A computational framework for large-scale analysis of TCRβ immune repertoire ...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
PCR
PCRPCR
PCR
 
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers
 
Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...Hotspot mutation and fusion transcript detection from the same non-small cell...
Hotspot mutation and fusion transcript detection from the same non-small cell...
 
Prabhakar singh ii sem-paper v-rt pcr
Prabhakar singh  ii sem-paper v-rt pcrPrabhakar singh  ii sem-paper v-rt pcr
Prabhakar singh ii sem-paper v-rt pcr
 
Aai 2007-pcr array-poster
Aai 2007-pcr array-posterAai 2007-pcr array-poster
Aai 2007-pcr array-poster
 
Ascb 2007-pcr array-poster
Ascb 2007-pcr array-posterAscb 2007-pcr array-poster
Ascb 2007-pcr array-poster
 
Polymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCRPolymerase Chain Reaction - PCR
Polymerase Chain Reaction - PCR
 
Som aacr2011poster
Som aacr2011posterSom aacr2011poster
Som aacr2011poster
 
Real time pcr
Real time pcrReal time pcr
Real time pcr
 
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell SamplesRapid Detection of Aneuploidy from Multiplexed Single Cell Samples
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochure
 
POLYMERASE CHAIN REACTION.pptx
POLYMERASE   CHAIN   REACTION.pptxPOLYMERASE   CHAIN   REACTION.pptx
POLYMERASE CHAIN REACTION.pptx
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
 
Overview of the glossary related to pcr
Overview of the glossary related to pcrOverview of the glossary related to pcr
Overview of the glossary related to pcr
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

Ashg poster sp_compressed

  • 1. Analysis of PCR Duplicates and Library Diversity in RNA-Seq Smita Pathak1, Irina Khrebtukova1, Angelica Barr1, Felix Schlesinger1, Tim Hill1, Lisa Watson1, Abstract In DNA sequencing, duplicates or reads that map to the same position are discarded but in RNA sequencing (RNA-Seq), these reads can represent highly expressed genes. The issue of duplicates in RNA-Seq is even more complicated in low input or degraded samples. Higher percentages of duplicates in very low input and degraded samples are routinely observed in RNA-Seq using standard bioinformatics tools such as Picard but the source of duplicates is commonly misunderstood. Under normal assay conditions, and with recommended input levels, three different RNA-Seq assays give different apparent numbers of duplicates on the same standard UHRR and Brain RNA samples. These differences are not necessarily due to PCR artifacts but occur because of the differences in complexity between the coding regions, the mRNA, and the total RNA of a cell. When we measure true PCR duplicates using a molecular barcoding approach, it becomes clear that there are much lower levels of potential PCR duplicates in standard RNA-Seq preps. However, we find that when reducing input amounts for any of these three assays to 10ng or less, we observe dramatic increases in percentage of duplicates. This value then becomes an important metric for overall efficacy of the experiment. I. UMI Barcoding The standard TruSeq® forked adaptor was modified to include a Unique Molecular Index (UMI), a 5 base random N sequence in the index of read 1. The read 2 index was not modified, allowing pooled samples to demultiplex by read 2 only. The sequence of the UMI tag was used in combination with alignment information to count true PCR duplicates. Only fragments that had the same alignment position and the same UMI sequence were considered true PCR duplicates. When using the Illumina TopHat BaseSpace® Application, the duplicates metric is calculated at a read depth of 4M reads. Figure 1: UMI Barcode in read1 Index of TruSeq forked adaptor allows separation of PCR duplicates from “apparent” duplicates Ryan Kelley1, Tatjana Singer1, and Gary P. Schroth1 1Illumina, Inc. 5200 Illumina Way, San Diego, CA 92122 Modied Forked Adaptor NNNNN NNNNN II. Three RNA Sample Preparation Workflows UMIs were used to track individual molecules through three different sample preparation workflows. TruSeq Stranded mRNA uses oligo dT beads to capture poly-A tails of RNA, TruSeq RNA Access uses enrichment to select for the coding region of the transcriptome using capture probes followed by purification with magnetic streptavidin beads, and the TruSeq Stranded Total RNA workflow removes rRNA and mtRNA via specific cRNA probes and removal with capture beads. A B Total RNA 5’ 3’ TruSeq Total RNA TruSeq RNA Access cDNA Library from Total RNA Fragmentation (Fresh Frozen RNA) Priming with random hexamers 5’ 3’ TruSeq mRNA 5’ 3’ AAAAAAAA TTTTTTTT DNA-RNA Hybrid First Strand Synthesis Second Strand Synthesis 3’ 5’ 5’ 3’ with dUTP Double Stranded cDNA U U U U 3’ 5’ cDNA with Forked Adaptor U U U U A- Tailing and Adaptor Ligation PCR Fragmented RNA/FFPE 3’ 5’ 5’ 3’ 3’ 5’ p7 Adaptor p5 Adaptor Final cDNA Library with Strand Specicity 3’ 5’ Hybridization with Biotinylated Exome Capture Probes Streptavidin - Magnetic Bead Binding Biotinyated Probe Hybrid Capture 3’ 3’ 5’ Removal of unbound and nonspecically bound material by heated washing Elution from Bead PCR Final exome-targeted cDNA Library 5’ 3’ 5’ 3’ 5’ p7 Adaptor p5 Adaptor Sequencing for mRNA/Total RNA Sequencing for RNA Access Figure 2: Sample Preparation Workflows (A) Library Preparation for 3 different workflows: mRNA selects for coding regions of RNA via poly-A selection, RNA Access selects by enrichment, and the Total RNA workflow depletes rRNA/mtRNA. Sequencing is performed after library preparation for mRNA and Total RNA workflows. (B) Enrichment workflow for RNA Access only III. PCR Cycling Study In order to determine the effects of PCR cycling, we used the TruSeq Stranded mRNA workflow with the standard 100ng input and varied the number of PCR cycles from 0 to 35 cycles in increments of 5 cycles. All samples were sequenced on an Illumina NextSeq® 500 sequencing system, using 2 x 76 bp paired-end run. Universal Human Reference RNA (UHRR) and Human Brain RNA (Brain) samples all had less than 6% duplicates as measured with the UMI across all cycling conditions, based on our standard BaseSpace TopHat Alignment Application. Both showed the same trend of increasing percent duplicates with increasing PCR cycles from less than 1% UMI duplicates at 0 cycles to 6% at 10 cycles. After 10 cycles, no increase in duplicates was observed. Note that in standard TruSeq RNA prep kits, we only recommend 15 cycles of PCR. A B C D E F Figure 3: Duplicates from PCR Cycling Study (A) Duplicates and Yield for UHRR with increasing number of PCR cycles. Yield increases dramatically but % duplicates does not increase. (B) Duplicates and Yield for Brain with increaseing number of PCR Cycles. Yield increases but % duplicates does not. (C) Differential Expression of UHRR to Brain for 0 vs 35 cycles of PCR. (D-F) FPKM Correlation of low amounts of PCR (0 vs. 5 cycles), high amounts of PCR (20 vs. 25 cycles), and low vs. high (0 vs. 35 cycles). IV. Effect of Lower Input on PCR Duplicates In order to test the effect of duplicates, we pushed the lower limits of input for all of the protocols shown in Figure 2. For instance, for the TruSeq Stranded mRNA kit, we overloaded the kit with with 500% of the recommended input amount (100ng) as well as under-loaded with 3% of the recommended input amount. These experiments are summarized in Table 1 below. All inputs were run with replicates for both UHRR and Brain. All of the samples were generated using an automated version of the protocol on the Hamilton Star Liquid Handling Workstation. Sample Prep Method Sample Type RNA Input Sequencing TruSeq RNA Access UHRR and Brain 0.3, 2.5, 10, 50ng 2 x 76, NextSeq 500 TruSeq Stranded Total RNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 TruSeq Stranded mRNA UHRR and Brain 3, 25, 100, 500ng 2 x 76, NextSeq 500 Table 1: Summary of Low Input Experimental Conditions. The recommended input amount is highlighted. A B C FPKM Correlation: 3ng vs. 3ng FPKM Correlation: 100ng vs. 3ng FPKM Correlation: 100ng vs. 100ng D E Figure 4: Duplicates in Low Input Conditions of three TruSeq Workflows (A) FPKM correlation of 3ng vs. 3ng replicate condition in TruSeq mRNA workflow (B) FPKM correlation of 100ng vs. 3ng condition in TruSeq mRNA workflow (C) FPKM correlation of 100ng vs. 100ng replicate condition in TruSeq mRNA workflow (D) Differential Expression correlation of 100ng input vs. 3ng input for TruSeq mRNA workflow (E) Plot of % Duplicates vs. Read Number for different input conditions (TruSeq mRNA workflow) V. A Closer Look at Duplicates In order to test whether or not duplicate removal makes a difference in the final data, we used a standard tool to remove duplicates (Picard Tools). We calculated differential expression ratios of UHRR to Brain and compared the data with or without duplicate removal. For all input levels tested, we found good correlation of the data with or without removal of duplicates. Finally, we show examples of two genes at different input levels with or without duplicate removal in the Integrative Genomics Brower (IGV). A B Figure 5: Comparison of Data with Duplicates Removed to Data Without Duplicates Removed (A) Differential Expression plots or log2(fold change) of UHRR to Brain of samples with duplicates removed compared to samples without duplicates removed at 3 different input conditions: 5100ng, 25ng and 3ng. Data shows that removing duplicates from the data still has good correlation with data without duplicates removed. Unique vs duplicate data. (B) IGV browser shots of two different genes (GAPDH and ApoE), sequenced at 40M reads, at 2 different input conditions: 100ng and 3ng. For each input condition, data is shown without duplicates removed, duplicates only, and with duplicates removed. For the 100ng condition, the “duplicates only” track represents 49% of the reads whereas the “no duplicates” track represents 51% of the reads. For the 3ng condition, the “duplicates only” track represents 82% of the reads whereas the “no duplicates” track represents 18% of the reads. Data shows that duplicates are not biased and are amplified uniformly by PCR. VI. Conclusions The issue of PCR duplicates in RNA-Seq has been a concern for the field for many years. Our study shows that PCR cycling itself has very little effects on absolute numbers of dupliates under recommended assay conditions (Section III). Even under conditions where we create duplicates, such as low input, as described in Sections IV and V, the duplicated data still accurately calls gene expression levels. Duplicates are amplified uniformly and the percent duplicates becomes more of a measure of lack of complexity of the input sample than a measure of PCR bias. FOR RESEARCH USE ONLY © 2014 Illumina, Inc. All rights reserved. Illumina, HiSeq, MiSeq, Nextera, and the pumpkin orange color are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.