The document discusses the use of SNP haplotype maps in plant breeding. It begins with an introduction to genetic variations like SNPs and haplotypes. It then discusses topics like haplotype construction, inference, and factors affecting them. The document presents two case studies, one on developing a hapmap for pepper and the other on introducing novel diversity in Brassica using a concept called Heterotic Haplotype Capture. Key outputs of the studies include genome-scale data and immortal heterotic populations for genomic prediction and understanding heterosis.
3. 3
Genetic Variations
• The genetic variations in DNA sequences (e.g.,
insertions, deletions, and mutations) have a
major impact on genotypic and phenotypic
differences.
– All humans share 99% the same DNA sequence.
– The genetic variations in the coding region may
change the codon of an amino acid and alters the
amino acid sequence.
2/8/2017 PG seminar
4. Allelic variations within a genome of a same species-
1.Differences in the number of tandem repeats at a locus - SSRs
2.Segmental/nucleotide insertions/deletions - InDels
3.Single nucleotide polymorphisms - SNPs
Depending on detection method and throughput-
(1) Low-throughput, hybridization-based markers such as RFLPs
(2) Medium-throughput, PCR-based markers RAPD, AFLP, SSRs
(3) High-throughput (HTP) sequence-based markers: SNPs
2/8/2017 4PG seminar
5. ►A SNP is defined as a single base change in a DNA
sequence that occurs in a significant proportion (more than 1
percent) of a large population.
►SNPs are found in
coding and (mostly) noncoding regions.
►Occur with a very high frequency
about 1 in 1000 bases to 1 in 100 to 300 bases.
►The abundance of SNPs and the ease with which they can be
measured make these genetic variations significant.
►SNPs close to particular gene acts as a marker for that gene.
2/8/2017 5PG seminar
6. Single Nucleotide Polymorphism
• A Single Nucleotide Polymorphisms (SNP), pronounced “snip,”
is a genetic variation when a single nucleotide (i.e., A, T, C, or G)
is altered and kept through heredity.
– SNP: Single DNA base variation found >1%
– Mutation: Single DNA base variation found <1%
C T T A G C T T
C T T A G T T T
SNP
C T T A G C T T
C T T A G T T T
Mutation
94%
6%
99.9%
0.1%
2/8/2017 6PG seminar
7. Sequence Overlap SNP discovery
GTTTAAATAATACTGATCA
GTTTAAATAATACTGATCA
GTTTAAATAGTACTGATCA
GTTTAAATAGTACTGATCA
Genomic DNA mRNA
BAC library RRS Library
or Sampling
cDNA Library
EST OverlapShotgun OverlapBAC Overlap
SNP maps
►Sequence genomes of a
large number of individuals
►Compare the base sequences
to discover SNPs.
►Generate a single map of the
genome containing all
possible SNPs => SNP maps
2/8/2017 7PG seminar
8. What do we know?
• SNPs physically close to one another tend to be inherited
together
• Recombination breaks apart haplotypes and slowly erodes
correlation between neighboring alleles
• Since SNPs are bi-allelic, each SNP defines a partition on the
population sample.
2/8/2017 8PG seminar
9. Haplotype:
A haplotype is a group of genes in an organism that are
inherited together from a single parent.
In temrs of SNP-
A haplotype stands for a set of linked SNPs on the same
chromosome not easily separable by recombination
Within each block, recombination is rare due to tight linkage
and only very few haplotypes really occur
2/8/2017 9PG seminar
10. Haplotypes
• Haplotype: A set of closely linked genetic markers present on one
chromosome which tend to be inherited together (not easily
separable by recombination).
• A haplotype can be simply considered as a binary string since
each SNP is binary.
SNP1 SNP2 SNP3
-A C T T A G C T T-
-A A T T T G C T C-
-A C T T T G C T C-
Haplotype 2
Haplotype 3
C A T
A T C
C T CHaplotype 1
SNP1 SNP2 SNP3
2/8/2017 10PG seminar
11. PG seminar
Haplotype
• Multiple loci in the same chromosome that are
inherited together
• Usually a string of SNPs that are linked
alleles
locus
haplotypes
2/8/2017 11
13. Why Haplotypes
•Haplotypes are more powerful discriminators
between cases and controls in association studies
•Use of haplotypes in association studies reduces the
number of tests to be carried out.
•With haplotypes we can conduct evolutionary studies
•Haplotypes are necessary for linkage analysis
2/8/2017 13PG seminar
14. Genotypes
• The use of haplotype information has been limited because many
genomes are diploid.
– In large sequencing projects, genotypes instead of haplotypes are
collected due to cost consideration.
A
C
G
T
A T
SNP1 SNP2
C G
Haplotype data
SNP1 SNP2
Genotype data
A
C
G
T
SNP
1
SNP
2
A T
C
G
SNP
1
SNP
2
2/8/2017 14PG seminar
15. Problems of Genotypes
• Genotypes only tell us the alleles at each SNP
locus.
– But we don’t know the connection of alleles at
different SNP loci.
– There could be several possible haplotypes for the
same genotype.
A
C
G
T
SNP1 SNP2
Genotype data
or
A T
C G
SNP1 SNP2
A G
C T
SNP1 SNP2
A
C
G
T
SNP1 SNP2
We don’t know which
haplotype pair is real.2/8/2017 15PG seminar
17. PG seminar
Haplotype blocks
• Low recombination rate in the region
• Strong Linkage Disequillibrium
• Small number of SNPs in the block are enough to identify
common haplotypes; tag SNPs
2/8/2017 17
18. Block detection methods
• Four gamete test, Hudson and Kaplan,Genetics, 1985,
A segment of SNPs is a block if between every pair (aA and bB) of SNPs
at most 3 gametes (ab, aB, Ab, AB) are observed.
• P-Value test
– A segment of SNPs is a block if for 95% of the pairs of SNPs
we can reject the hypothesis (with P-value 0.05 or 0.001)
that they are in linkage equilibrium.
• LD-based, Gabriel et al. Science,2002,296:2225-9
2/8/2017 18PG seminar
19. Research Directions of SNPs and
Haplotypes in Recent Years
Haplotype
Inference
Tag SNP
Selection
Maximum
Parsimony
Perfect
Phylogeny
Statistical
Methods
Haplotype
block
LD bin
Prediction
Accuracy
SNP
Database
2/8/2017 19PG seminar
20. Haplotype Blocks and Tag SNPs
• Recent studies have shown that the chromosome can be
partitioned into haplotype blocks interspersed by recombination
hotspots (Daly et al, Patil et al., 2011).
– Within a haplotype block, there is little or no recombination.
– The SNPs within a haplotype block tend to be inherited
together.
• Within a haplotype block, a small subset of SNPs (called tag SNPs)
is sufficient to distinguish each pair of haplotype patterns in the
block.
– We only need to genotype tag SNPs instead of all SNPs within
a haplotype block.
2/8/2017 20PG seminar
21. Recombination Hotspots and Haplotype
Blocks
Recombination
hotspots
Chromosome
Haplotype
blocks
P1 P2 P3 P4
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
SNP
loci
Haplotype patterns
: Major allele
: Minor allele
2/8/2017 21PG seminar
22. Three Problems
1. Estimation of frequency of all possible
haplotypes
2. Reconstruction of haplotype for individuals
3. Detection of all possible haplotypes in a
population
2/8/2017 22PG seminar
23. PG seminar
...Haplotype construction
• Family-based haplotype construction
– Linkage analysis softwares: Simwalk, Merlin,
Genehunter, Allegro...
• Population-based haplotype construction
– Not as reliable as family-based
2/8/2017 23
24. Haplotype reconstruction for individuals
C
A
T G A
A
T
C A
T
haplotype h(h1, h2)
possible associations of alleles to
chromosome
C T A
T G ACp
Cm
This is a mixture modeling problem!
ATGC
sequencing
Heterozygous
diploid individual
TC TG AA
Genotype
pairs of alleles with association of
alleles to chromosomes unknown
G
T
2/8/2017 24PG seminar
25. Haplotype Inference
• The problem of inferring the haplotypes from a
set of genotypes is called haplotype inference.
• Most combinatorial methods consider the
maximum parsimony model to solve this
problem.
– This model assumes that the real haplotypes in
natural population is rare.
– The solution of this problem is a minimum set of
haplotypes that can explain the given genotypes.
2/8/2017 25PG seminar
26. Maximum Parsimony
A Gh3
C Th4
A Th1
C Gh2
A Th1
A Th1
orG1
A
C
SNP1 SNP2
G
T
G2
A
A
SNP1 SNP2
T
T
A G
C T
A T
A T
C G
• Find a minimum set of
haplotypes to explain the
given genotypes.
2/8/2017 26PG seminar
27. Haplotype analysis algorithms
• Given a random sample of multilocus genotypes at a set of SNPs
the following actions can be taken:
– Estimate the frequencies of all possible haplotypes.
– Infer the haplotypes of all individuals.
• Haplotyping Algorithms:
– Clark algorithm
– EM algorithm
• Haplotyping programs:
– HAPINFEREX ( Clark Parsimony algorthm)
– EM-Decoder ( EM algorithm)
– PHASE ( Gibbs Sampler)
– HAPLOTYPER
2/8/2017 27PG seminar
28. Comparison between algorithms
• Clark
– Intuitive
– Fast
• EM
– Complete solution
– Slightly more
accurate than Clark
– Robust to
ambiguity
• PHASE
– Complete solution
– Slightly more accurate
than EM
– Slow version
• Haplotyper (Ligation)
– Fast
– Better than Clark
– Less accurate than EM
or PHASE
2/8/2017 28PG seminar
29. Factors affecting
• SNP allele frequency distribution
• Haplotype allele numbers
• Linkage disequilibrium (LD)
• Difference in power
• Overlap in results of marker types
2/8/2017 29PG seminar
30. Benefits of haplotypes instead of
individual SNPs
• Information content is higher
• Gene function may depend on more than one SNP
• Smaller number of required markers
– The amount of wrong positive association is reduced
• Replacing of missing genotypes by computational methods
• Elimination of genotyping errors
• Challenges:
– Haplotypes are difficult to define directly in the lab; computational
methods
– Defining of block boarders is ambiguous; several different
algorithms
2/8/2017 30PG seminar
31. Haplotype v/s SNP
1. When large number
of SNPs in the genome
(Hamblin and Jannink, 2011)
2. When less number
of SNPs in the genome
2/8/2017 31PG seminar
32. HAPLOTYPE CORRELATION WITH PHENOTYPE
Association of haplotype frequencies with the presence of
desired phenotypic frequencies in the population will help in
utilizing the maximum potential of SNP as a marker.
The “Haplotype centric” approach combines the information
of adjacent SNPs into composite multilocus haplotypes.
Haplotypes are not only more informative but also capture
the regional LD information, which is assumed to be robust and
powerful
2/8/2017 32PG seminar
34. Case study: 1
Aim :
1. To resequence the pepper gnome and to systematically
assess the diversity with capsicum sp.
2. Develop a complete HapMap using SNP
3. Annotating the identified SNPs to the genes
2/8/2017 34PG seminar
35. lines with different chile and bell pepper phenotypes
DNA extraction
Sequencing using illumina HiSeq2000
SNP calling using infinium array technique
Development of PepperSNP16K array
Genotyping with the array
Cluster map developed
2/8/2017 35PG seminar
38. Utility of the study:
Conclusion for the case study:
2/8/2017 38PG seminar
39. Case study: 2
Aim:
• To capture untapped novel diversity in the Brassica sp.
•To introduce the new concept Heterotic Haplotype
Capture (HHC)
2/8/2017 39PG seminar
41. Mixing up the gene pool by de novo allopolyploidisation
• Generated synthetic B. napus derived from de novo interspecific
hybridisation
• de novo synthesis of synthetic B. napus increases recombination
Intergenomic chromosomal rearrangements as a driver for heterosis
• identified large numbers of homoeologous chromosome exchanges by
using SNP haplotypes
• large- scale deletions, duplications, and copy-number variation
• Structural chromosome variants can also have a significant influence on
heterotic potential within and between heterotic pools
2/8/2017 41PG seminar
43. Output of study:
1. Genome-scale data available for the NAM and HHC
populations enable the identification (in any given NAM line)
of haplotype blocks that are predicted to be heterozygous in
combination with a genotyped maternal tester.
2. HHC-like approaches benefit genomic prediction based plant
breeding
3. Availability of immortal heterotic populations, provides a
powerful resource for genome-scale investigations into the
genetic basis of heterosis for yield and other important
agronomic traits.
2/8/2017 43PG seminar
44. Vinay Kumar et al. Plant Biotechnology Journal (2016), pp. 1–9
Other few examples :
2/8/2017 44PG seminar