Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Use of NCBI Databases in qPCR Assay Design


Published on

Published in: Science, Technology
  • Be the first to comment

Use of NCBI Databases in qPCR Assay Design

  1. 1. Integrated DNA Technologies Use of NCBI Databases in qPCR Assay Design Elisabeth Wagner, PhD Scientific Applications Specialist
  2. 2. 1 Session Outcomes  You will:  Learn which NCBI tools are useful for designing qPCR assays  Become proficient using tools for qPCR design in the IDT SciTools® suite  Navigate the features and tools available on the NCBI website  Obtain sequence information for your gene of interest  Perform a BLAST search for assay specificity  Search for SNPs  Understand how to proceed with a basic qPCR design
  3. 3. 2 qPCR Design Covers A Lot of Ground There are many uses for quantitative PCR. For some examples:  Gene expression  Copy number variation  Genotyping  Multi-species analysis  Splice variant specific (or common) expression We will address the general considerations for design in this session, and cover more specific examples later this afternoon.
  4. 4. 3 SciTools® Overview   Several Tools are available in the IDT SciTools® suite to assist with qPCR design  1. RealTime PCR Tool  2. PrimerQuest® Tool  3. OligoAnalyzer® Tool  4. PrimeTime® Predesigned qPCR Assay Database
  5. 5. 4 NCBI Databases Overview:  1. Obtain sequence information for your gene of interest-  NCBI Nucleotide or Gene  2. Perform a BLAST search for assay specificity  NCBI BLAST  3. Search for SNPs  NCBI dbSNP NCBI enables you to access all of this information necessary for design in one location.
  6. 6. 5 Using NCBI Databases for Custom qPCR Assay Design
  7. 7. NCBI Overview (National Center for Biotechnology and Information)  Founded in 1988 as part of the United States National Library of Medicine  Houses a series of databases relevant to biotechnology and biomedicine  Curates Genbank, a database of over 1x1012 bp of DNA sequences  Gene database, which integrates gene-specific information from numerous species  dbSNP, which is a database of reported Single Nucleotide Polymorphisms (SNPs)  Contains the BLAST sequence similarity search program  Maintains PubMed, a journal database for biomedical literature  Much, much more information! 6
  8. 8. NCBI Database Search: Sequence Information for qPCR Assay Design 7
  9. 9. NCBI Sequence Files Files:  Can be entered by anyone  May or may not be checked for accuracy  May contain contaminated sequence (plasmid or other)  May contain annotation errors Accession numbers:  Letters at the beginning indicate the type of file  Nucleotide sequences start with 1 or 2 letters: 8
  10. 10. The RefSeq Database  non-redundant  explicitly linked nucleotide and protein sequences  ongoing curation by NCBI staff and collaborators, with reviewed records indicated  includes data validation and format consistency  distinct accession numbers  all accessions include an underscore '_' character  Different versions are tracked 9
  11. 11. RefSeq Accession Numbers  mRNAs and Proteins  NM_123456 Curated mRNA  NP_123456 Curated Protein  NR_123456 Curated non-coding RNA  XM_123456 Predicted mRNA  XP_123456 Predicted Protein  XR_123456 Predicted non-coding RNA  Gene Records  NG_123456 Reference Genomic Sequence  Chromosome  NC_123455 Microbial replicons, organelle genomes, human chromosomes  AC-123455 Alternate assemblies  Assemblies  NT_123456 Contig  NW_123456 WGS Supercontig 10
  12. 12. Accessing Sequence Information in NCBI 11 NCBI
  13. 13. NCBI Gene Database Information: Gene Search 12
  14. 14. Sequence Data Searches Using Nucleotide  Sequence Files  mRNA and genomic  Transcript variants 13
  15. 15. Genbank information 14
  16. 16. Data Retrieval: Graphics View 15
  17. 17. Data Retrieval: FASTA Sequence Format 16
  18. 18. 17 Using PrimerQuest® Tool for Custom qPCR Designs
  19. 19. 18 PrimerQuest® Tool for Generating Custom qPCR Designs Highly customizable tool
  20. 20. 19 You Can Use NCBI Accession Number or FASTA Sequence
  21. 21. 20 Once Sequence Entered, 3 Defaults Become Available Often you will need to adjust the parameters of the tool to meet experimental design requirements
  22. 22. 21 PrimerQuest® Tool Assay Output
  23. 23. 22 Changing Parameters Depend on the Assay Required Before changing anything, make sure you have selected the correct assay Sometimes you simply need to increase the number of designs returned It is unlikely that you will need to change these parameters
  24. 24. 23 Directing the Design to a Specific Region Target a particular “junction”
  25. 25. 24 Examples Excluded region 260-280 Excluded region-probe 260-280Target region 260-280
  26. 26. 25 Changing Primer/Probe Parameters If the target is particularly biased (AT or GC rich), you may need to change primer/probe parameters (i.e. length)
  27. 27. 26 Once Initial Design Completed, Back to NCBI Use NCBI tools to:  Check whether assay is specific (BLAST)  Ensure there are no SNPs to worry about (dbSNP) Use IDT OligoAnalzyer® Tool  Check primers (and probe) for secondary structure and dimer formation
  28. 28. 27 Using NCBI BLAST to Check for Primer Specificity
  29. 29. 28 What is BLAST?—Getting to BLAST   Or
  30. 30. 29 What is BLAST (Basic Local Alignment Search Tool)?  BLAST stands for Basic Local Alignment Search Tool and is provided by the National Center for Biotechnology and Information (NCBI)  Aligns a user defined query (sequence) to a wide variety of databases  Can translate the query or the database to align sequences  Can align 2 or more sequences together  Heuristic algorithm to create alignments very fast  Breaks sequences into “words” and searches the database for matches  Reassembles these matches based on the criteria entered
  31. 31. 30 What is BLAST?—Basic BLAST
  32. 32. 31 How BLAST Works—Words  BLAST divides the query sequence into subsets called “words”, which the algorithm uses to perform the alignment  Example (35 nt sequence): CGATCGGGCATCACACAAAGTTATGTAGTAGAAAT  All possible words that can be generated from the sequence are used for the alignment  The max number of words for this sequence is 29 7-letter word
  33. 33. 32 Overview—Definitions  Hit: A sequence to which the query is aligned and is returned in the results of BLAST  Identity: the extent of exact matches between 2 sequences (eg ACGT and ACGG have 75% identity)  Similarity = Positives (in BLAST scoring)
  34. 34. 33 How BLAST Works—Scores  The BLAST raw score is converted to a bit score for each alignment using parameters based on statistics described in Karlin and Altschul (1990) (  A high score does not necessarily indicate that the query is unique  The score is only dependent on the alignment, length of the sequence, and the length of the database  E-value is the expected amount of random sequences that have equivalent sequence alignment  Calculated using the Max bit score and the length of the query and database  Tells you the relative strength of the alignment  Shorter sequences have higher E-values because the probability of finding that sequence is higher  A low E-value does not mean you have a unique match!
  35. 35. 34 BLAST Assessment for qPCR Primers  Go to the BLAST server:  Enter primer sequences separated by 7+ N’s
  36. 36. 35 Select the Correct Database “Others” is the most general but contains a lot of sequences. If possible use Human or Mouse specific databases For species with completed genome projects, consider using “NCBI Genomes” to limit BLAST results
  37. 37. 36 Change the parameters of the BLAST scoring Select less rigorous algorithm Change Word size to “7”
  38. 38. 37 Looking at the Results The Graphic Summary can immediately give you a sense of what the overall results are Hover over each result in the graphic to identify the sequence name
  39. 39. 38 Then Look at Results List Look at E-value and Query Coverage. Look for jumps in either/both. Looks like assay is specific to a single gene by transcript Ignore the “alternate” chromosome assemblies
  40. 40. 39 Investigate details of alignment Check distance between primer binding if looking at mRNA Open Graphics result in a new tab/window
  41. 41. 40 BLAST Shows Primer Aligned to Sequence Zoom out with “-” sign You can grab within window and drag sequence side to side
  42. 42. 41 The Target Gene is on Chromosome 6 This looks promising with primers on different exons.
  43. 43. 42 But We Had Other Chromosomal Hits…… “real” transcript Pseudogene— doesn’t look transcribed Primers (red bar indicates mismatch)
  44. 44. 43 And Another One…… Another pseudogene. But what’s this? Intron of a transcribed gene. So potentially in RNA samples. Recommend avoiding if possible
  45. 45. 44 Using NCBI to Check for SNPs
  46. 46. 45 While Assessing BLAST Results, Also Assess for SNPs
  47. 47. 46 Investigate SNPs in Primer Binding Sites
  48. 48. 47 Assessing SNP Data Tells you it’s a single base substitution Indicates alternate forms (here recorded on opposite strand) Indicates allele frequency if known Sometimes more frequency data at bottom of page
  49. 49. 48 SNP Data Roughly Divided by Risk Trusted source Very low frequency No data, likely not going to be problematic Significant risk. Look to redesign if possible
  50. 50. 49 Using OligoAnalzyer® Tool to Check Primers and Probes
  51. 51. 50 Checking Primers with OligoAnalyzer® Tool  PrimerQuest® design tools give you the “best” assays for the region specified  They check for self- and hetero-dimers, but this is only part of the scoring system used  An assay maybe be “better” even with dimer issues if it scores well on other parameters  Go to the OligoAnalyzer Tool  Perform self-dimer checks for primers and probe  Perform heterodimer checks on all primer/probe combinations (especially important to include all combinations when multiplexing)  Check hairpin structures.  Look for stability of < -9 kcal/mol  Or multiple hairpins forming with < -4 kcal/mol
  52. 52. 51 Assessing Dimer Data Looks stable < -9kcal/mol But this is not “dangerous”, avoid if possible but ok Looks stable < -9kcal/mol Not extendable, not a problem Doesn’t look stable > -9kcal/mol Danger of extension, exponential amplification!
  53. 53. 52 Assessing Hairpin Structures  Based on UNAfold predictions
  54. 54. IDT PrimeTime® Predesigned qPCR Database 53
  55. 55. 54 Primer and Probe Design Criteria for PrimeTime® Assays  Primers  equal Tm (60–63oC)  15–30 bases in length  no runs of 4 or more Gs  amplicon size 50–150 bp (max 400 bp)  Probe  Probe length no longer than 30–35 bases  Tm value 4–10oC higher than primers  no runs of 4 or more consecutive Gs  G+C content 30–80%  no G at the 5′ end
  56. 56. 55 PrimeTime Results
  57. 57. 56  Questions?