The document describes the Axiom Genome-Wide PanAFR Array Set, which was designed to maximize coverage of common and rare genetic variants in populations of Yoruba ancestry. The array contains over 2.2 million markers selected from several genome projects to provide high coverage of the genome as well as specific regions and gene categories. Testing showed the array performs well with an average call rate over 99% and high concordance and reproducibility of calls.
Recombination DNA Technology (Nucleic Acid Hybridization )
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) Array Set
1. Target SNPs
YRI SNPs from three SNP
discovery projects
N
Y
2.2M selected
markers
provide 90%
coverage of
target SNPs
Update selected SNP list.
SNP selection rule
START
Can we
cover more
target
SNPs?
Selected SNPs
Choose available SNPs that contribute maximally to
coverage, within a narrow range of coverage (r2)
values. Of these, choose the most robust.
STO
P
Candidate SNPs
Validated and available to
be placed on the array
Applica t i o n Brief
Abstract
Genome-wide association studies (GWAS) in African populations pose unique challenges, given the extensive genetic heterogeneity
that has been observed in these populations (Figures 1 and 2). Adding to the GWAS challenge is the increase in the number of known
SNP and insertion/deletion (in/del) markers (driven by sequencing in novel populations). Population-optimized SNP arrays that include
this novel sequencing content and overall high genomic coverage offer great power to detect disease-associated markers. To this end,
the Axiom® Genome-Wide PanAFR Array Set was designed to maximize coverage of novel common and rare variants in populations of
Yoruba (YRI) ancestry. We have also demonstrated the coverage of this array in the following HapMap populations: Africans from the
Southwestern USA (ASW), Maasai (MKK), Luhya (LWK), European (CEU), and Asian (Chinese and Japanese, ASI) (Figure 3).
The Axiom®
Genome-Wide PanAFR Array Set contains ~2.2 million markers (SNPs and insertion/deletion) obtained from HapMap,
dbSNP, the 1000 Genomes Project, and the Southern African Genomes Project (Figure 5). In addition to high genome-wide coverage,
these markers were also selected for coverage of the following categories: chromosomes X and Y, mitochondria, coding regions, recom-
bination hotspots, ADME, miRNA, and disease-associated regions (Figure 4). The array set has been tested using manual and automated
workflows with multiple sample types such as cell line, saliva, and blood-derived DNA. Assay performance was evaluated and shown
to have an average sample call rate greater than 99.0 percent; trio concordance and average sample concordance to independent
DNA genotype information (HapMap) were greater than 99.5 percent, and intra- and inter-run reproducibility were greater than
99.8 percent (Figure 6).
1
SNP genotyping using the Affymetrix® Axiom®
Genome-Wide Pan-African (PanAFR) Array Set
Because African populations
are genetically more diverse
than European and Asian
populations, ~2.5 times more
SNPs are required to provide
the same statistical power
for a GWAS.1
Figure 1: Marker selection strategy for the GWAS challenge
2. 2
Minor allele frequency (%)
Coverage
YRI ASWLWK MKK CEU ASI
1
0.9
0.8
0.7
0.6
0.5
0.4
Y axis = number of intersecting SNPs (million)
YRI + LWK + MKK YRI + LWK YRI + MKK YRI
2
1.5
1
0.5
0
A
YRI + CEU + ASI YRI + CEU YRI + ASI YRI
2
1
0
B
YRI + ASW YRI
2
1.5
1
0.5
0
C
Most YRI SNPs
are also polymorphic
in two other African
populations
Many YRI SNPs are
private relative
to the out-of-Africa
populations
Most YRI SNPs are
also polymorphic
in ASW
Figure 2: Intersection of YRI SNPs with various populations
Figure 3: Overall genetic coverage
Intersection of Axiom Pan-African
markers that exhibit polymorphism
in various populations. Counts of
polymorphic SNPs were determined by
genotyping the ASI, ASW, CEU, LWK,
MKK, and YRI HapMap samples with the
Axiom® Genome-Wide PanAFR Array Set.
Panel A illustrates that the vast majority of
SNPs (~2M) are polymorphic in all three
African populations (YRI, LWK, MKK) while
only a very small subset are polymorphic in
only YRI and LWK but not MKK.
Panel B illustrates that the large majority of
SNPs (2M) are polymorphic in both the YRI
and ASW populations while a small number
(less than 0.25M) are only polymorphic in
YRI but not ASW.
Panel C illustrates that greater than 1M SNPs
are polymorphic in the YRI and CEU and ASI
populations while slightly less than 1M SNPs
are polymorphic in only the YRI population
and not the CEU and ASI populations. A
small subset of SNPs are polymorphic in YRI
and CEU but not ASI and are polymorphic in
YRI and ASI but not CEU.
Genetic coverage: Single-marker coverage is the proportion of SNPs and in/dels with r2 ≥0.8 to a marker on the
Axiom Genome-Wide PanAFR Array.
3. 3
Minor allele frequency (%)
Coverage
1
0.9
0.8
0.7
0.6
0.5
0.4
12M target SNPs comprised of:
1. All HapMap.
2. 1000G Pilot 1 low coverage
SNPs that were validated by
other projects (Affymetrix or
1000 Genomes Trio project).2
3. Southern African Genomes
Project, validated by Affymetrix.
SNP origins: The great majority of SNPs on Axiom GW PanAFR Array were discovered by the 1000 Genomes Pilot Projects.
8.8M candidate SNPs
Subset of target SNPs with
robust genotype cluster
properties and evidence
of polymorphism
2.2M selected SNPs
Figure 4: Coverage of markers in biological categories
Figure 5: Markers for the Axiom® Genome-Wide PanAFR Array are from three discovery projects
Coverage of biologically relevant SNPs: Single-marker coverage (r2 ≥0.8 for YRI) of SNPs and in/dels associated with
selected genes and regions.
1000
Genomes
1.2M
Southern
African
Genomes
Project
13K
51K
Other
821K 92K
2.2K
HapMap
100
10K
Cardio- Immunity and NHGRI Sanger
ADME vascular inflamation MHC disease cancer