Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improved Algorithm for Amplicon Sequencing Assay Designs

1,130 views

Published on

Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Improved Algorithm for Amplicon Sequencing Assay Designs

  1. 1. ABSTRACT Ion AmpliSeq™ sequencing is one of the most promising applications of the Ion Torrent NGS platform. It involves multiplex PCR for target enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to customers to assist assay designs. While more and more people are adopting Ion AmpliSeq technologies, challenges for assay designs started to emerge. Here we present bioinformatics approaches to improve the following areas of assay design: 1) assay specificity; 2) primer quality control; 3) SNP under primer; and 4) flexibility to adapt to different applications of Ion AmpliSeq sequencing including variant calling, copy number variation detection, RNA expression, gene fusion detection, and metagenomics. Design algorithms are developed to ensure high coverage with controlled risk of amplification efficiency, off-target reads and SNP effects. With the optimized design algorithm, numerous custom and community research panels have been created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and CFTR Panel. Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com Guoying Liu, Manimozhi Manivannan, Heinz Breu, Adam Broomer, Alexander Atkins, Kate Rhodes, Cristina Van Loy, Fiona Hyland, Mark Andersen, Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA, USA, 94080 Improved Algorithm for Amplicon Sequencing Assay Designs 1)  Gene symbol, SNP rsID, or COSMIC mutation ID from human or mouse genome 2)  Region/SNP coordinates from any pre-loaded genomes 3)  Region/SNP coordinates from customer uploaded DNA sequences, which can be from any genome or even artificially built 1)  Against a reference genome pre- loaded in AmpliSeq designer 2)  Against the set of sequence contigs submitted for AmpliSeq design Primer Specificity Search 1)  SNPs in dbSNP for reference genomes pre-loaded in Ampliseq 2)  SNPs in customer submitted sequence contigs, specified by customer Check for Known Polymorphic Sites at Primer Biding Sites 1.  A piece of DNA sequence 2.  The part of the sequence (targets) to amplify One set of primer- pairs that specifically amplify the targets Translates to I. Ion AmpliSeq DNA Design Overview Table 1. Scenarios of Design Input human, mouse   cow, chicken, pig, sheep, rice, maize, soybean, and tomato (case I)   human, mouse, cow, chicken, pig, sheep, rice, maize, soybean, and tomato (case II)   custom reference contigs from other genomes Type of design   1) DNA 2) DNA Hotspot 3) RNA   1) DNA 2) DNA Hotspot   1) DNA 2) DNA Hotspot   1) DNA 2) DNA Hotspot   Submission of design targets   1) chromosomal coordinates 2)gene names 3) rsID or COSMIC ID   chromosomal coordinates   Custom proprietary sequence contigs plus targets listed as contig coordinates   custom reference sequence contigs plus targets listed as contig coordinates   Primer Specificity check   human or mouse reference genome   Respective reference genome   Respective reference genome   1) One of the 10 supported reference genomes as proxy 2) None (specificity check against custome contigs only)   SNPs to avoid   Common human or mouse SNPs from dbSNP   SNPs (if publicly available) for respective genome   1) SNPs or variation regions on custom contigs 2) None   1) SNPs or variation regions on custom contigs 2) None   Scenario_1 Scenario_2 Scenario_3 Scenario_4 Figure 2. How existence of SNP at primer binding site affects read count. Category 0: SNP is homozygous in NA12878 and primer sequence matches genomic DNA; Category 1: SNP is heterozygous in NA12878, primer sequence matches half of the genomic DNA; Category 2: SNP is homozygous in NA12878 and primer sequence does not match genomic DNA. Normalized SNP Position – SNP position in primer sequence, counting from 3’ end, normalized to a theoretical 33bp primer and binned by 3. II. Avoid SNPs for Primer Design Cutoff of Similarity-Hits (as defined below) Figure 3. Effect of primer specificity on off-target reads. Primer specificity means two things: 1) Number of locations the primer binds to (even though not perfectly) to background DNA – termed Similarity-Hits; 2) How well the primer binds to non-target DNA. Shown above illustrates how off-target reads can be controlled by limit primer Similarity-Hits. III. Control Primer Specificity to Avoid Off-target Reads Target Target Identify one set of amplicons with: 1.  Maximum coverage of target; 2.  Minimum overall amplicon cost (the lower the cost, the better the amplicon quality) Tiling Pooling Pool 1: Pool 2: Unpooled: B C Target Target Retain input amplicons meant to be “must-have”. Tiling and Pooling in one step Pool 1: Pool 2: IV. Ion AmpliSeq Designer Tiling and Pooling A. Figure 4. Ion AmpliSeq designer tiling/pooling scheme for regular DNA region and gene designs. A) a diagram illustrates the process; B) an example of selected amplicons covering a region target. Figure 5. Ion AmpliSeq designer tiling/pooling scheme for one- pool DNA Hotspot designs. A) and B) show how an amplicon would be selected by the tiling/pooling scheme shown in Figure 4; C) shows the amplicon selected by the tiling scheme specified for one-pool DNA Hotspot design. A Figure 6. Ion AmpliSeq designer tiling/pooling scheme when a set of pre-selected amplicons (shown in red in the graph) are specified to be included in a new panel. Conclusions Design algorithms of Ion AmpliSeq designer are continuously improved to ensure amplicon sequencing designs lead to successful next-gen sequencing applications like variant calling and copy number analysis. More information can be found at AmpliSeq.com. Acknowledgements The authors would like to thank Niranjan Vissa, Dong Kim, Annie Titus, Chris Lasher, Ryan Kumsher, Winston Cheng, Poorva Soni, Antonio Martinez Alcantara, Denise Topacio, Nisha Mulakken, Nriti Garg, Pius Brzoska, Fangqi Hu, Francisco Hernandez-Guzman, David Kopp, Arvind Kothandaraman and Anup Parikh for their contributions and support. Category ReadCountReadCount Normalized SNP Position Category 0 Category 1 Category 2 Normalized SNP PositionNormalized SNP Position IV-1. Tiling/Pooling for DNA region design IV-2. Tiling/Pooling for single pool DNA hotspot design IV-3. Tiling/Pooling for DNA designs with subsetting B Figure 1. Diagram for an overview of the DNA design workflow. © 2015 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. For Research Use Only. Not for use in diagnostic procedures.

×