Molecular marker and its application to genome mapping and molecular breeding
Molecular Marker and Its Application toGenome Mapping and Molecular Breeding Binying Fu Institute of Crop Sciences The Chinese Academy of Agricultural Sciences Beijing 100081, China Nov-14-2012
Definition of Biological Marker Biological markers can be anything that distinguishes one individual or population from another Can be phenotypic Color: Color: yellow vs white etc Texture: Texture: smooth vs rough etc Shape: Shape: round vs irregular etc Can be a biochemical or genetic difference
Phenotypic Markershttp://cgil.uoguelph.ca/QTL/Fig2_3.htm Weakness: unstable and limited number and polymorphism
Cytological MarkerAny distinct and heritable feature of chromosome structure thatcan be used to follow (usually by microscopy) that chromosomeor chromosome region in breeding experiments.Weakness: side effect and need special technique
Biochemical Marker-Isozyme and ProteinWeakness：limited number, spatio-temporal expressed and need special technique such as Starch Gel with special staining
Characteristics of Ideal Markers Polymorphism Stability, no influences from the environment Wide dispersion through the genome Simplicity of observation Low cost Mendelian Heritability Co-dominancy Reproducibility Portability between species
Molecular Markers ：Define：A molecular selection technique of DNA signposts which allowsthe identification of differences in the nucleotide sequences of the DNA indifferent individuals. Or any genetic element ( locus, allele, DNA sequence orchromosome feature) which can be readily detected by phenotype, cytologicalor molecular techniques, and used to follow a chromosome or chromosomalsegment during genetic analysis. (Also DNA marker)Agriculture: a tool which allows crop geneticists and breeders to locate on aplant chromosome the genes for a trait of interest. It is considered moreefficient than conventional breeding as it has the potential to greatly reducedevelopment times and substitutes laboratory selection for much of thefieldwork. MAS or MDB!Molecular, or DNA-based, markers have been increasingly important in plantbreeding because of their features: Phenotypic stability (not affected byenvironment), Useful polymorphism, Ease of development.
Where does the molecular marker come from?Mutation = heritable (at the cell level) changes in DNAsequence, regardless of whether the change produces anydetectable effect on a gene product. Mutations are the sourceof new variation (polymorphism) upon which natural selectionworks. Inherited mutations that are dispersed through apopulation can become polymorphisms.Polymorphism = presence in the same population of two ormore alternative forms of a DNA sequence, with the mostcommon allele having a frequency of 99% or less. Any twoindividuals have a polymorphic difference every 1,000-10,000base pairs.
Comparison of Mutation FrequenciesClass of Mutation Mechanism Frequency ExampleGenome mutation Chromosome 10-2/cell division Aneuploidy missegregationChromosome mutation Chromosome 6x10-4/cell division Translocation rearrangementGene mutation Base-pair mutation 10-10/base pair/cell division Point mutation 10-5-10-6/locus/generation humans have ~109 base pairs/haploid genome, therefore each person will have 1-100 new mutations 1 in 20 people will have a new gene mutation
Types of Mutations (2)Nucleotide Substitutions Altering Gene Expression• RNA processing mutations (destruction of splice sites, cap sites, poly A sites, or creation of cryptic sites)• Regulatory mutations (promoter mutations)
Types of Mutations (3) Deletions and Insertions (InDels)• Insertion or deletion of small number of bases If number of bases involved is not a multiple of 3, causes frameshift If number of bases involved is a multiple of 3, causes loss or gain of codons• Larger deletions, inversions, and duplications Can create gene syndromes
Brief SummaryThe term MARKER is usually used for “LOCUS MARKER”.Each gene has a particular place along the chromosome calledLOCUS. Due to mutations, genes can be modified in several formsmutually exclusives called ALLELES (or allelic forms). All allelicforms of a gene occur at the same locus on homologouschromosomes. When allelic forms of one locus are identical, thegenotype is called HOMOZYGOTE (at this locus), whereasdifferent allelic forms constituted a HETEROZYGOTE. Indiploid organisms, the GENOTYPE is constituted by the twoallelic forms of the homologous chromosomes.Thus, MOLECULAR MARKERS are all loci markers relatedto DNA (sometimes biochemical or morphological markersincluded).
Molecular Markers ClassesFirst Generation: 1980s -Based on DNA-DNA hybridizations, such as RFLP.Second Generation: 1990s -Based on PCR: Using random primers: RAPD, DAF, ISSR Using specific primers: SSR, SCAR, STS -Based on PCR and restriction cutting: AFLP, CAPsThird Generation: recently -Based on DNA point mutations (SNP), can be detected by SSCP, DASH, DNA chip, sequencing etc.
The Evolution of Markers SNPs on Chips AFLPs on microarrays (2000) Automation SNPs AFLPs on automated sequencers Complete Genomic Sequence (1998)Genomic Era High-throughput marker analysis AFLPs (1996) Hallmark event Hallmark event SCARs cDNA Sequencing-cSSR RAPDs (1990) SSCPs CAPs (1993) Microsatellites (SSRs 1989) Gene –Specific PCROLIGO-Scene PCR (1986)Pre-PCR RFLPs (1980)DNA-Hybridization-Scene Restriction (1968) and Southern Blotting (1975) Allozymes (1960s)Protein-Scene Gel Eletrophoresis (1950s) Morphological Variants (Pre 1950s)
DNA MarkersSimple Sequence Repeats-SSRSingle Nucleotide Polymorphism-SNPSingle Feature Polymorphisms (SFPs)
MicrosatellitesWhat are microsatellites? Simple sequence repeats (SSRs) or microsatellites are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs. SSR length polymorphisms are caused by differences in the number of repeats SSR loci are “individually amplified by PCR using pairs of oligonucleotide primers specific to unique DNA sequences flanking the SSR sequence”. Example Mononucleotide SSR (A)11 AAAAAAAAAAA Dinucleotide SSR (GT)6 GTGTGTGTGTGT Trinucleotide SSR (CTG)4 CTGCTGCTGCTG Tetranucleotide SSR (ACTC)4 ACTCACTCACTCACTC
MicrosatellitesFeature of SSR Marker SSRs tend to be highly polymorphic. SSRs are highly abundant and randomly dispersed throughout most genomes. Most SSR markers are co-dominant and locus specific. Genotyping throughput is high and can be automated.
MicrosatellitesWhere are microsatellites found? Majority are in non-coding region
MicrosatellitesRepeat Motifs AC repeats tend to be more abundant than other di-nucleotide repeat motifs in animals The most abundant di-nucleotide repeat motifs in plants, in descending order, are AT, AG, and AC. Because AT repeats self-anneal, AT-enrichment methods have not been developed. Typically, SSRs are developed for di-, tri-, and tetra-nucleotide repeat motifs. CA and GA have been widely used in plants. SSR markers have been developed for a variety of tri- and tetra-nucleotide repeats in plants. Tetra-nucleotide repeats have the potential to be very highly polymorphic.
SSR Containing Sequences from BAC-ends BAC- 1 % in Corn 0.6 % in Soybean 21% 3% 2bp 3bp 4bp 5-6bp 76 %SSR containing sequences in different BAC ends, there are 1% SSR in Corn,0.6% in Soybean. Among these, most are dinucleotide repeats
Trinucleotide Repeats in Soy BAC-end Sequences BAC- AAT AAC 5% 15% AAG ATG ATC AGG 25% ACT CCT CGT 48% ACC CTGIn the Soybean genome, most of the trinucleotide repeatsin BAC-end sequences are AAT repeats, one quarter ofthem are AAC repeats.
Simple sequence repeats (SSRs). SSRs are particularly useful for developing geneticmarkers. They are believed to vary through DNA replication slippage , and arerelated to genetic instability . In Table 2, we describe SSR content for two sectors,n 6 to 11 units and n >11 units, to emphasize that the number of SSRs droppedsubstantially after 11 units. The SSR content for 93-11 was 1.7% of the genome,lower than in the human, where it was 3%. The overwhelming majority ofrice SSRs were mononucleotides, primarily (A)n or (T)n, and with n is 6 to11. In contrast, for the human, the greatest contributions came from dinucleotides.
From Nipponbare, Goff etal., 2002, Sciences.The most prevalent SSR is tri-nucleotide; Most frequent 2-SSR is AG, 3-SSR isCGG, 4-SSR is CGAT.
MicrosatellitesHow do microsatellites mutate? Replication Slippage Unequal crossing-over during meiosis
Replication SlippageWhen the DNA replicates, the polymerase loses track of its place, and either leavesout repeat units or adds too many repeat units. “Polymerase slippage” or “slipped-strand mispairing.” A commonly observed replication error is the replication slippage, which occurs at the repetitive sequences when the new strand mispairs with the template strand. The microsatellite polymorphism is mainly caused by the replication slippage. If the mutation occurs in a coding region, it could produce abnormal proteins, leading to diseases.
Unequal crossing-over during meiosisThis is thought to explain more drastic changes in numbers of repeats. In thisdiagram, chromosome A obtained too many repeats during crossing-over, andchromosome B obtained too few repeats.
MicrosatellitesWhy do microsatellites exist? "junk" DNA, and the variation is mostly neutral a necessary source of genetic variation regulate gene expression and protein function Moxon, E. R., Wills, C. 1999. "DNA microsatellites: Agents of Evolution?" Scientific American. Jan., pp. 72-77. Kashi, Y. and M. Soller. 1999. "Functional Roles of Microsatellites and Minisatellites." In: Microsatellites: Evolution and Applications. Edited by Goldstein and Schlotterer. Oxford University Press.
Models of Microsatellite Mutation (1)1. Stepwise Mutation Model (SMM)This model holds that when microsatellites mutate, they only gainor lose one repeat. This implies that two alleles that differ by onerepeat are more closely related (have a more recent commonancestor) than alleles that differ by many repeats. In other words,size matters when doing statistical tests of populationsubstructuring. The SMM is generally the preferred model whencalculating relatedness between individuals and populationsubstructuring, although there is the problem of homoplasy.
Models of Microsatellite Mutation(2)2. Infinite Alleles Model (IAM)Each mutation can create any new allele randomly. A 15-repeat allelecould be just as closely related to a 10-repeat allele as a 11-repeat allele.All that matters is that they are different alleles. In other words, size isntimportant. A 15-repeat allele could be just as closely related to a 10-repeat allele as a 11-repeat allele. 15-repeat 11-repeat 10-repeat 8-repeat
Conventional Developmental Steps of SSR Markers Genomic DNA PCR test using diverse genotypes Specific SSR SSR DNA Library SSR probes Positive Clones Sequencing of positive DNA clones
Four Assay Methods1. The customary method for SSR genotyping is denaturing polyacrylamide gel electrophoresis using silver-stained PCR products. These assays can usually distinguish alleles differing by 4 bp and may distinguish alleles differing by 2 bp.2. Semi-automated SSR genotyping can be performed by assaying fluorescently labelled PCR products for length variants on an automated DNA sequencer. Several instruments have been developed (e.g., Applied Biosystems and Li-Cor). Alleles differing by 2 to 4 bp can usually be distinguished.3. SSR length polymorphisms can be assayed using non-denaturing high performance liquid chromatography (Marino et al. 1998). Alleles differing by 2 to 4 bp can usually be distinguished.4. SSR alleles differing by several repeat units can often be distinguished on agarose gels.
SSRs assayed on polyacrylamide gels typically show a characteristic“stuttering”. Stutter bands are artifacts produced by DNA polymerase slippage.Typically, the most prominent stutter bands are +1 and - 1 repeat (e.g., +or - 2 bp for a di-nucleotide repeat), and, if visible, the next most prominentstutter bands are +2 and -2 repeats.
WeaknessesThe development of SSRs is labor intensive（NO insequence-based SSR development) .SSR marker development costs are very high.SSR markers are taxa specific.Start-up costs are high for automated SSR assay methods.Developing PCR multiplexes is difficult and expensive. Somemarkers may not multiplex.
Single Nucleotide PolymorphismsSNP is the molecular basis for most phenotypic differences betweenindividualsSNP is the most common genetic variations.SNPs are highly abundant, stable and distributed throughout the genomeSNP assay is amenable to automation and high throughput.SNP is biallelic. GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG
Single Nucleotide PolymorphismsSNPs in intergenic regions may … Have no genetic effect … Affect genetic regulatory signals … Interfere with RNA splice sites …SNPs in Coding regions (cSNP) may … Synonymously change the codon of an amino acid, which may have no further effect, or may influence e.g. codon bias. non-synonymously alter the encoded amino acid (nsSNP) by a conservative exchange, or non- conservative (radical) mutation.
SNP Variation in Maize and Soybean % 40 35 30 CT GA 25 GC 20 AC 15 GT 10 AT Del 5 0 Maize Soy
Frequency of Candidate SNPs from Different Sources in Maize and Soy Region Maize SoyEST (5’end) 1/1.5kb 1/1.9kbGenomic 1/640bp 1/750bp3’UTR 1/441bp 1/416bp
Identification of Target Specific SNPsSteps:1. Amplify the genes of interests with PCR2. Scan for mutation with various methods -Conformation-based mutation scanning - Single -strand conformation polymorphism analysis - Gel electrophoresis - Chemical and enzymatic mismatch cleavage detection - Denaturing gradient gel electrophoresis - Denaturing HPLC3. Sequence positive PCR products -Sequence multiple individuals -Sequence heterozygotes4. Align sequences from different sources to find SNPs
Technologies for Detecting Known SNPsGel-Based Methods-PCR-restriction fragment length polymorphism analysis-PCR-based allelic specific amplification-Oligonucleotide ligation assay genotyping-Minisequencing(10~20base)Non-Gel-Based High Through Genotyping Technologies-Solution hybridization using fluorescence dyes-Allelic specific ligation-Allelic specific nucleotide incorporation 1. High resolution separation 2. Chemical color reaction-DNA microarray genotyping
（ ） Oligo Ligation Assay（OLA）Two allele-specific oligonucleotide probes (one specific for the wild-type allele and theother specific for the variant allele) and a fluorescent common probe are used in eachassay. The 3 ends of the allele-specific probes are immediately adjacent to the 5 end ofthe common probe. In the presence of thermally stable DNA ligase, ligation of thefluorescently labeled probe to the allele-specific probe(s) occurs only when there is aperfect match between the variant or the wild-type probe and the PCR product template.These ligation products are then separated by electrophoresis, which permits therecognition of the wild-type genotypes, the variants, the heterozygotes, and the unligatedprobes.
Allele-Specific Codominant PCR Strategy Figure. Schematic representation of the allele- specific codominant PCR strategy. Oligonucleotide primers with 3 nucleotides that correspond to an SNP site are used to preferentially amplify specific alleles. A, Primer P1 forms a perfect match with allele 1 but forms a mismatch at the 3 terminus with the DNA sequence of allele 2. Primer P2 similarly forms a perfect match with allele 2 and a 3 terminus mismatch with allele 1. B, Schematic of agarose gel analysis showing the expected outcome for the amplification of organisms homozygous and heterozygous for both alleles using primers P1 and P2. P1, Primer 1; P2, primer 2; A1, allele 1; A2, allele 2.Eliana Drenkard et al. 2000 Plant Physiol 124: 1483-1492
SNP Detection Allele Specific OligohybridizationPrinciple: A 1 bp mismatch in the center of a 15mer will changethe T m by 5 - 10 degrees, therefore a SNP in the middle of a15mer can be genotyped using paired ASOs. PCR amplify target gene (different individual) in 96 well format Prepare dot-blot on nylon filter Hybridize to allele-specific 15mer and detect the signal Wash at stringency temperature Repeat for alternate allele and other SNPs
Single-Strand Conformation Polymorphism AnalysisSingle-stranded DNAs are generated by denaturation of the PCRproducts and separated on a nondenaturing polyacrylamide gel. Afragment with a single-base modification generally forms a differentconformer and migrates differently when compared with wild-typeDNA. Size <200bp, Accuracy: 70%-95% Size >400bp, Accuracy: 50% 1% false positive
SNP Genotyping Using Oligo Chip T genotypeOligo Chip: a set of 15- C genotypenucleotide probes, which consistof different sets of probesoverlapped each other, 14nucleotides were overlapped,among the four probes in one set,the sequences are almost thesame except one A/G/C/T
Direct Sequencing - New Sequencing TechnologyPyrosequencing technology offers rapid and accurate genotyping, allowing fordependable SNP and mutation analysis. This technology utilizes an enzymecascade system that results in the production of measurable light whenever anucleotide forms a base pair with its complimentary base in a DNA templatestrand.Solexa/Illumina SequencingMunroe & Harris, (2010) Third-generation sequencing fireworks at Marco Island.Nature Biotechnology 28: 426–428.
Use of SNPs1. Markers for linkage mapping-Discover SNPs contribute to agronomic traits2. Trace origin of introgression3. Markers for association studies (Linkage Disequilibrium)4. Markers for population genetic analysis
Further Reading:McNally et al., 2009. Genomewide SNP variation reveals relationshipsamong landraces and modern varieties of rice. PNAS 106(30):12273-8.Jones et al., 2009. Development of single nucleotide polymorphism(SNP) markers for use in commercial maize (Zea mays L.) germplasm.Mol Breeding 24 (2):165-176.Varshney et al., 2007. Single nucleotide polymorphisms in rye (Secalecereale L.): discovery, frequency, and applications for genomemapping and diversity studies. TAG 114 (6): 1105-1116.Wu et al., 2010. SNP discovery by high-throughput sequencing insoybean. BMC Genomics 11:469.
Single Feature Polymorphisms (SFPs)SFPs are a consequence either of insertions/deletion (InDel)polymorphisms orrepresent multiple SNPs across the complementary sequences.SFPs identified through hybridization of genomic DNA to whole-genometiling arrays (i.e., Affymetrix Genechips) or home-made microarray.ReferencesYeast: Wodicka, L., H. Dong, M. Mittmann, M.H. Ho, and D.J. Lockhart.1997. Nat Biotechnol 15: 1359-1367.Arabidopsis: Borevitz, J.O., D. Liang, D. Plouffe, H.S. Chang, T. Zhu, D.Weigel, C.C. Berry, E. Winzeler, and J. Chory. 2003. Genome Res 13:513-523.
Further reading:Kumar et al., 2007. Single Feature Polymorphism Discovery in Rice. Plos ONE, 2(3): e284
Principle of Microarray-basedgenotyping of SingleFeature Polymorphisms(SFPs) by Oligo Chip.
Classification of DNA MarkersA. Mutation at restriction sites (RFLP, CAPS, AFLP) or PCR primer sites (RAPD, DAF, AP-PCR, SSR, ISSR)B. Insertion or deletion between restriction sites (RFLP, CAPS, AFLP) or PCR primer sites (RAPD, DAF, AP-PCR, SSR, ISSR)C. Changes in the number of repeat unit between restriction sites or PCR primer sites: SSR, VNTR, ISSRD. Mutations at single nucleotides: SNP
Summary of Common Molecular MarkersSingle Locus DetectionRFLP (restriction fragment length polymorphism) HybridizationCAPS (cleaved amplified polymorphic sequences) PCRSSLP (simple sequence length polymorphism)---- VNTR (variable number of tandem repeat) Hybridization or PCR---- SSR/STR (simple sequence repeats/tandem repeats) PCRSCAR (Sequence characterized amplified region) PCRSNP (Single nucleotide polymorphism)---- DASH (dynamic allele-specific hybridization) Hybridization---- SSCP (single strand conformation polymorphism) Conformation
Summary of Common Molecular MarkersMultiple Loci DetectionAFLP (amplified fragment length polymorphism) PCRRAPD (random amplified polymorphic DNA) PCRAP-PCR (arbitrarily primed-PCR) PCRDAF (DNA amplification fingerprinting) PCRSSLP (simple sequence length polymorphism) PCR when multiple pairs of primers were usedISSR (inter-simple sequence repeat) PCRSNP (Single nucleotide polymorphism)-- SSCP (single strand conformation polymorphism) Conformation when used to scan for randomly located SNPs
ConclusionAll molecular markers are not equal. None isideal. Some are better for some purposes thanothers. However, all are generally preferable tomorphological markers for mapping and markerassisted selection.