Use of SNP-HapMaps in plant breeding

Seminar
on
USE OF SNP- HapMaps IN
PLANT BREEDING
2/8/2017 1PG seminar
Anilkumar, C.
PALB 5062
PhD scholar

Introduction
Haplotype construction
Haplotype inference
Factors affecting
Case studies
In this session----

3
Genetic Variations
• The genetic variations in DNA sequences (e.g.,
insertions, deletions, and mutations) have a
major impact on genotypic and phenotypic
differences.
– All humans share 99% the same DNA sequence.
– The genetic variations in the coding region may
change the codon of an amino acid and alters the
amino acid sequence.
2/8/2017 PG seminar

Allelic variations within a genome of a same species-
1.Differences in the number of tandem repeats at a locus - SSRs
2.Segmental/nucleotide insertions/deletions - InDels
3.Single nucleotide polymorphisms - SNPs
Depending on detection method and throughput-
(1) Low-throughput, hybridization-based markers such as RFLPs
(2) Medium-throughput, PCR-based markers RAPD, AFLP, SSRs
(3) High-throughput (HTP) sequence-based markers: SNPs

►A SNP is defined as a single base change in a DNA
sequence that occurs in a significant proportion (more than 1
percent) of a large population.
►SNPs are found in
coding and (mostly) noncoding regions.
►Occur with a very high frequency
about 1 in 1000 bases to 1 in 100 to 300 bases.
►The abundance of SNPs and the ease with which they can be
measured make these genetic variations significant.
►SNPs close to particular gene acts as a marker for that gene.

Single Nucleotide Polymorphism
• A Single Nucleotide Polymorphisms (SNP), pronounced “snip,”
is a genetic variation when a single nucleotide (i.e., A, T, C, or G)
is altered and kept through heredity.
– SNP: Single DNA base variation found >1%
– Mutation: Single DNA base variation found <1%
C T T A G C T T
C T T A G T T T
SNP
C T T A G C T T
C T T A G T T T
Mutation
94%
6%
99.9%
0.1%

Sequence Overlap SNP discovery
GTTTAAATAATACTGATCA
GTTTAAATAATACTGATCA
GTTTAAATAGTACTGATCA
GTTTAAATAGTACTGATCA
Genomic DNA mRNA
BAC library RRS Library
or Sampling
cDNA Library
EST OverlapShotgun OverlapBAC Overlap
SNP maps
►Sequence genomes of a
large number of individuals
►Compare the base sequences
to discover SNPs.
►Generate a single map of the
genome containing all
possible SNPs => SNP maps

What do we know?
• SNPs physically close to one another tend to be inherited
together
• Recombination breaks apart haplotypes and slowly erodes
correlation between neighboring alleles
• Since SNPs are bi-allelic, each SNP defines a partition on the
population sample.

Haplotype:
A haplotype is a group of genes in an organism that are
inherited together from a single parent.
In temrs of SNP-
A haplotype stands for a set of linked SNPs on the same
chromosome not easily separable by recombination
Within each block, recombination is rare due to tight linkage
and only very few haplotypes really occur

Haplotypes
• Haplotype: A set of closely linked genetic markers present on one
chromosome which tend to be inherited together (not easily
separable by recombination).
• A haplotype can be simply considered as a binary string since
each SNP is binary.
SNP1 SNP2 SNP3
-A C T T A G C T T-
-A A T T T G C T C-
-A C T T T G C T C-
Haplotype 2
Haplotype 3
C A T
A T C
C T CHaplotype 1
SNP1 SNP2 SNP3

PG seminar
Haplotype
• Multiple loci in the same chromosome that are
inherited together
• Usually a string of SNPs that are linked
alleles
locus
haplotypes
2/8/2017 11

SNP-Haplotype
DNA Sequence
GATATTCGTACGGA-T
GATGTTCGTACTGAAT
GATATTCGTACGGA-T
GATATTCGTACCGAAT
GATGTTCGTACTGAAT
GATGTTCGTACTGAAT
SNP
SNP
123456
AG- 2/6(BLACK EYE)
GT 3/6(BROWN EYE)
AC 1/6 (BLUE EYE)
Haplotypes
Phenotype
BLACK EYE
BROWN EYE
BLACK EYE
BLUE EYE
BROWN EYE
BROWN EYE

Why Haplotypes
•Haplotypes are more powerful discriminators
between cases and controls in association studies
•Use of haplotypes in association studies reduces the
number of tests to be carried out.
•With haplotypes we can conduct evolutionary studies
•Haplotypes are necessary for linkage analysis

Genotypes
• The use of haplotype information has been limited because many
genomes are diploid.
– In large sequencing projects, genotypes instead of haplotypes are
collected due to cost consideration.
A
C
G
T
A T
SNP1 SNP2
C G
Haplotype data
SNP1 SNP2
Genotype data
A
C
G
T
SNP
1
SNP
2
A T
C
G
SNP
1
SNP
2

Problems of Genotypes
• Genotypes only tell us the alleles at each SNP
locus.
– But we don’t know the connection of alleles at
different SNP loci.
– There could be several possible haplotypes for the
same genotype.
A
C
G
T
SNP1 SNP2
Genotype data
or
A T
C G
SNP1 SNP2
A G
C T
SNP1 SNP2
A
C
G
T
SNP1 SNP2
We don’t know which
haplotype pair is real.2/8/2017 15PG seminar

Steps in map construction

PG seminar
Haplotype blocks
• Low recombination rate in the region
• Strong Linkage Disequillibrium
• Small number of SNPs in the block are enough to identify
common haplotypes; tag SNPs
2/8/2017 17

Block detection methods
• Four gamete test, Hudson and Kaplan,Genetics, 1985,
A segment of SNPs is a block if between every pair (aA and bB) of SNPs
at most 3 gametes (ab, aB, Ab, AB) are observed.
• P-Value test
– A segment of SNPs is a block if for 95% of the pairs of SNPs
we can reject the hypothesis (with P-value 0.05 or 0.001)
that they are in linkage equilibrium.
• LD-based, Gabriel et al. Science,2002,296:2225-9

Research Directions of SNPs and
Haplotypes in Recent Years
Haplotype
Inference
Tag SNP
Selection
Maximum
Parsimony
Perfect
Phylogeny
Statistical
Methods
Haplotype
block
LD bin
Prediction
Accuracy
SNP
Database

Haplotype Blocks and Tag SNPs
• Recent studies have shown that the chromosome can be
partitioned into haplotype blocks interspersed by recombination
hotspots (Daly et al, Patil et al., 2011).
– Within a haplotype block, there is little or no recombination.
– The SNPs within a haplotype block tend to be inherited
together.
• Within a haplotype block, a small subset of SNPs (called tag SNPs)
is sufficient to distinguish each pair of haplotype patterns in the
block.
– We only need to genotype tag SNPs instead of all SNPs within
a haplotype block.

Recombination Hotspots and Haplotype
Blocks
Recombination
hotspots
Chromosome
Haplotype
blocks
P1 P2 P3 P4
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
SNP
loci
Haplotype patterns
: Major allele
: Minor allele

Three Problems
1. Estimation of frequency of all possible
haplotypes
2. Reconstruction of haplotype for individuals
3. Detection of all possible haplotypes in a
population

PG seminar
...Haplotype construction
• Family-based haplotype construction
– Linkage analysis softwares: Simwalk, Merlin,
Genehunter, Allegro...
• Population-based haplotype construction
– Not as reliable as family-based
2/8/2017 23

Haplotype reconstruction for individuals
C
A
T G A
A
T
C A
T
haplotype h(h1, h2)
possible associations of alleles to
chromosome
C T A
T G ACp
Cm
This is a mixture modeling problem!
ATGC
sequencing
Heterozygous
diploid individual
TC TG AA
Genotype
pairs of alleles with association of
alleles to chromosomes unknown
G
T

Haplotype Inference
• The problem of inferring the haplotypes from a
set of genotypes is called haplotype inference.
• Most combinatorial methods consider the
maximum parsimony model to solve this
problem.
– This model assumes that the real haplotypes in
natural population is rare.
– The solution of this problem is a minimum set of
haplotypes that can explain the given genotypes.

Maximum Parsimony
A Gh3
C Th4
A Th1
C Gh2
A Th1
A Th1
orG1
A
C
SNP1 SNP2
G
T
G2
A
A
SNP1 SNP2
T
T
A G
C T
A T
A T
C G
• Find a minimum set of
haplotypes to explain the
given genotypes.

Haplotype analysis algorithms
• Given a random sample of multilocus genotypes at a set of SNPs
the following actions can be taken:
– Estimate the frequencies of all possible haplotypes.
– Infer the haplotypes of all individuals.
• Haplotyping Algorithms:
– Clark algorithm
– EM algorithm
• Haplotyping programs:
– HAPINFEREX ( Clark Parsimony algorthm)
– EM-Decoder ( EM algorithm)
– PHASE ( Gibbs Sampler)
– HAPLOTYPER

Comparison between algorithms
• Clark
– Intuitive
– Fast
• EM
– Complete solution
– Slightly more
accurate than Clark
– Robust to
ambiguity
• PHASE
– Complete solution
– Slightly more accurate
than EM
– Slow version
• Haplotyper (Ligation)
– Fast
– Better than Clark
– Less accurate than EM
or PHASE

Factors affecting
• SNP allele frequency distribution
• Haplotype allele numbers
• Linkage disequilibrium (LD)
• Difference in power
• Overlap in results of marker types

Benefits of haplotypes instead of
individual SNPs
• Information content is higher
• Gene function may depend on more than one SNP
• Smaller number of required markers
– The amount of wrong positive association is reduced
• Replacing of missing genotypes by computational methods
• Elimination of genotyping errors
• Challenges:
– Haplotypes are difficult to define directly in the lab; computational
methods
– Defining of block boarders is ambiguous; several different
algorithms

Haplotype v/s SNP
1. When large number
of SNPs in the genome
(Hamblin and Jannink, 2011)
2. When less number
of SNPs in the genome

HAPLOTYPE CORRELATION WITH PHENOTYPE
 Association of haplotype frequencies with the presence of
desired phenotypic frequencies in the population will help in
utilizing the maximum potential of SNP as a marker.
 The “Haplotype centric” approach combines the information
of adjacent SNPs into composite multilocus haplotypes.
 Haplotypes are not only more informative but also capture
the regional LD information, which is assumed to be robust and
powerful

Source: international HapMap project

Case study: 1
Aim :
1. To resequence the pepper gnome and to systematically
assess the diversity with capsicum sp.
2. Develop a complete HapMap using SNP
3. Annotating the identified SNPs to the genes

lines with different chile and bell pepper phenotypes
DNA extraction
Sequencing using illumina HiSeq2000
SNP calling using infinium array technique
Development of PepperSNP16K array
Genotyping with the array
Cluster map developed

Utility of the study:
Conclusion for the case study:

Case study: 2
Aim:
• To capture untapped novel diversity in the Brassica sp.
•To introduce the new concept Heterotic Haplotype
Capture (HHC)

Mixing up the gene pool by de novo allopolyploidisation
• Generated synthetic B. napus derived from de novo interspecific
hybridisation
• de novo synthesis of synthetic B. napus increases recombination
Intergenomic chromosomal rearrangements as a driver for heterosis
• identified large numbers of homoeologous chromosome exchanges by
using SNP haplotypes
• large- scale deletions, duplications, and copy-number variation
• Structural chromosome variants can also have a significant influence on
heterotic potential within and between heterotic pools

Rejuvenating a depleted breeding pool with novel species diversity

Output of study:
1. Genome-scale data available for the NAM and HHC
populations enable the identification (in any given NAM line)
of haplotype blocks that are predicted to be heterozygous in
combination with a genotyped maternal tester.
2. HHC-like approaches benefit genomic prediction based plant
breeding
3. Availability of immortal heterotic populations, provides a
powerful resource for genome-scale investigations into the
genetic basis of heterosis for yield and other important
agronomic traits.

Vinay Kumar et al. Plant Biotechnology Journal (2016), pp. 1–9
Other few examples :

Use of SNP-HapMaps in plant breeding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Use of SNP-HapMaps in plant breeding

Similar to Use of SNP-HapMaps in plant breeding (20)

More from Anilkumar C

More from Anilkumar C (9)

Recently uploaded

Recently uploaded (20)

Use of SNP-HapMaps in plant breeding