Hong Gao, Bioinformatics, Affymetrix.An automated algorithm that reduces genotype analysis to ~1 hour is described and applied to hexaploid wheat data.
Disentangling the origin of chemical differences using GHOST
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
1. Size
ze Si
Size
BB-BB-BB
by the genotyping AB-BB-BB
algorithm as the
Likewise, the figure on
the right shows a
homozygote cluster
Run
AxiomGT1
Allopolyploid Genotyping Algorithm on
Affymetrix’ Axiom® Arrays
Hong Gao, Ali Pirani, Laurent Bellon, and Teresa A. Webster
Affymetrix, Inc. Santa Clara, CA 95051 USA
ABSTRACT
Background: Many plant species offer unique challenges to accurate genotyping
due to complexity associated with polyploid genomes that have a high density of
segmental duplications. In allopolyploid species, SNPs segregating in one
subgenome tend to exhibit particular clustering behaviors due to the allele-
dosage contribution of the alternate subgenomes. As a result, such allopolyploid
SNPs demonstrate three major genotype clustering patterns; the diploid-like
pattern where sub-genomes’ effects counteract each other (denoted as
AB/(AA)n/(BB)n) and the two compressed, shifted diploid patterns (denoted as
AB/(AA)n or AB/(BB)n). AxiomGT1 is an automated clustering algorithm used by
the Axiom® Genotyping Console™ Software, and features adaptive, dynamic
clustering. However, such allopolyploid patterns pose difficulties with sample
genotyping performance. Therefore, there is a need for continuing improvement
of statistical methods for accurate genotyping of allopolyploid species.
Method: We have developed a statistical approach called “FitAllo” that is
compatible with Affymetrix’ Axiom® Genotyping Solution. FitAllo is derived from
“fitTetra” (Voorrips, et al., BMC Bioinformatics 12:172, 2011) and employs the
Baum-Welch algorithm to ascertain specific SNP patterns [AB/(AA(n)/(BB)n or
AB/(AA)n or AB/(BB)n] in order to derive the center locations of each genotype
cluster. These cluster locations can then be used as SNP-specific priors for the
AxiomGT1 algorithm.
Results
We developed a statistical approach called “FitAllo” derived from “fitTetra”
(Voorrips, et al., BMC Bioinformatics 12:172, 2011). FitAllo chooses the
best fit among the multiple pre-specified allo-generic priors via the Baum-
Welch algorithm (shown in Figure 2). After ascertaining specific SNP
patterns [AB/(AA(n)/(BB)n or AB/(AA)n or AB/(BB)n], FitAllo derives the
center locations of each genotype cluster, which can then be used as SNP-
specific priors for the AxiomGT1 algorithm. Using the prior information
discovered by “FitAllo” allows AxiomGT1 to correct for mislabeling
genotypes and to improve the overall genotyping accuracy.
Application to bread wheat
We applied FitAllo to bread wheat varietals (allohexaploid) genotype data
from Axiom® custom genotyping arrays in collaboration with University of
Bristol, UK. This data set contains 11,603 SNPs genotyped across 89
samples. Figure 3 shows two examples of SNPs rescued by FitAllo. Two
SNP cluster plots on the left are initially classified as “No Minor
Homozygous” (by SNPolisher™ R package) after genotyping by AxiomGT1
algorithm. By using FitAllo, these two SNPs are in “Polymorphic High-
Resolution” category, shown on the right.
Figure 3: Two examples of FitAllo correcting miscalled heterozygote genotypes.
Both homozygotes and heterozygotes are accurately called after using FitAllo.
AxiomGT1 FitAllo + AxiomGT1
Correctly
called as
BB-AA-AA
Classified as
“Polymorphic
High-
Resolution”
Correctly
called as
AA-BB-BB
Miscalled as
AB-AA-AA
Classified as
“No Minor
Homozygous”
Miscalled as
AB-BB-BB
Results and conclusions: We applied FitAllo to bread wheat varietal
(allohexaploid) genotype data generated on Axiom® custom genotyping arrays
and demonstrated substantially improved genotyping accuracy in allopolyploids.
We anticipate FitAllo to be widely applicable in plant genotyping in conjunction
with Axiom® Genotyping Solution.
Background
In allopolyploid species, SNPs tend to segregate in one subgenome, which often
demonstrate the two compressed, shifted diploid patterns (shown on the right),
besides the diploid-like pattern. This usually leads to miscalling homozygotes as
heterozygotes in polyploid genotyping.
Figure 1: Two examples of allohexaploid SNPs with miscalled heterozygote genotypes.
The figure on the left
shows a homozygote
cluster AA-BB-BB is AA-BB-BB BB-AA-AA
miscalled as AB-BB-BB Miscalled as
AA-AA-AA
homozygote cluster
moves to the
heterozygote location.
Miscalled as
AB-AA-AA
BB-AA-AA is miscalled
as AB-AA-AA.
Table 1: SNP classification table of bread wheat for comparison between
AxiomGT1 algorithm and FitAllo+AxiomGT1 algorithm.
Contrast
Process
Figure 2: Schematic illustration of FitAllo analysis pipeline compared with AxiomGT1.
Diploid Generic Prior
Data
11
BRLMM‐P
10.5
10
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
Contrast
Select among 12 Hex
Run FitAllo to
select the best fit
among 12 Allo
Improved performance of FitAllo
After genotyping bread wheat data, we applied SNPolisher R package to
post-processing genotyping results, which included evaluating SNP
quality control metrics and SNP classification into six categories. FitAllo
increased the number of Polymorphic High-Resolution SNPs to 49.0%
from 34.4% by rescuing SNPs falling into the No Minor Homozygous
category by AxiomGT1. Therefore, FitAllo substantially improved
genotyping accuracy in allopolyploids.
Bread Wheat
SNP Category
AxiomGT1
Count Percentage
FitAllo + AxiomGT1
Count Percentage
PolyHighResolution
MonoHighResolution
3996
1484
34.4%
12.8%
5717
1344
49.3%
11.6%
Miscalled as
AB-BB-BB
1. Voorrips R. E., Gort G., Vosman B. Genotype calling in tetraploid species from bi-allelic marker data using mixture models.
BMC Bioinformatics 12:172 (2011).
genericpriors
Contrast
-3 -2 -1 0 1 2 3
Output Allo SNP-specific priors
Run AxiomGT1
11
10.5
10.5
10
10
NoMinorHom
Off‐Target Variant
CRbelowThreshold
Other
Total
3592
559
307
1665
11603
31.0%
4.8%
2.6%
14.3%
100.0%
24
508
1041
2969
11603
0.2%
4.4%
9.0%
25.6%
100.0%
AA-BB-BB