SlideShare a Scribd company logo
1 of 1
try to select X markers
Designing GWAS arrays for efficient
imputation-based coverage
Yiping Zhan, Yontao Lu, Teresa Webster, Laurent Bellon, and Jeanette Schmidt
Affymetrix, Inc. 3420 Central Expressway Santa Clara, CA 95051
Results for chr20 at different cutoff points
ABSTRACT
Imputation has become increasingly popular as a standard step of GWAS data
analysis. As a result, imputation-based coverage, instead of coverage
based on pairwise tagging only, is becoming a more relevant metric
for the value of a GWAS array with respect to the statistical power it
offers for genome-wide association tests. However, traditionally
whole-genome genotyping arrays have been designed via a tagging
marker selection strategy, aiming for good pairwise tagging results,
which do not align very well with the goal of having efficient
imputation-based coverage.
Taking advantage of the genotype data generated by the 1000 Genomes
Project1, the latest development of imputation tools, and our flexible
in-house greedy algorithm for selecting markers based on LD, we
came up with a GWAS array design strategy that iterates between an
LD-based marker selection step and an imputation-based coverage
evaluation step. The coverage evaluation step identifies and removes
variants from the target set of markers when their genotypes are
imputed well enough based on already selected markers, enabling the
selection of markers more valuable for imputation-based coverage in
the next iteration. This strategy generates markers for designing
genotyping arrays that are better optimized for imputation-based
coverage when compared with a set of markers selected with the
pairwise-tagging-based selection strategy. Our results show that the
new marker selection method makes it possible to design powerful
yet efficient GWAS arrays with better imputation-based coverage
compared to existing arrays that contain much larger sets of markers.
Greedy marker selection algorithm based on
pairwise tagging
Magenta and blue data points correspond to markers selected using the greedy
method and the improved method, respectively. The right-most data points
correspond to the genome-wide marker lists of size 399K. Marker selection was
carried out for CEU but results are shown in both CEU and GBR. The solid
horizontal line corresponds to the coverage achievable with all 215K chr20
candidate markers available for selection. The dashed line shows the coverage by
the 18.5K chr20 markers in OmniExpress.
References
1. The 1000 Genomes Project Consortium. An integrated map of genetic variation
from 1,092 human genomes (to be published)
2. B. N. Howie, P. Donnelly, J. Marchini. A flexible and accurate genotype imputation
method for the next generation of genome-wide association studies. PLoS
Genetics 5(6):e1000529 (2009).
3. B. Howie, J. Marchini, M. Stephens. Genotype imputation with thousands of
genomes. G3: Genes, Genomics, Genetics 1(6):457-470 (2011).
4. B. Howie, C. Fuchsberger, M. Stephens, J. Marchini, G. R. Abecasis. Fast and
accurate genotype imputation in genome-wide association studies through pre-
phasing. Nature Genetics 44(8):955-959 (2012).
target marker set for coverage
(can be different for different
populations)
candidate marker set
(markers that can be selected
for genotyping)
pre-selected marker set
(may be empty)
greedy
marker
selection
ranked markers for
a GWAS panel
Given a target set of markers for each population of interest, a candidate set of
markers to select markers from, and a pre-selected marker set, our in-house
greedy marker selection algorithm considers LD data for each population as well
as other possible factors, like the amount of space it takes to genotype a marker,
to greedily select the most valuable marker to be included in a GWAS panel in
each iteration until pairwise-based coverage is maximized in target populations,
generating a ranked marker list. This process is very efficient in designing GWAS
panels with optimized pairwise-tagging-based coverage (coverage hereby defined
as the fraction of target markers with maximal r2 ≥0.8).
GWAS panels designed with the greedy marker selection algorithm have good
imputation-based coverage as well. However, the marker selection can be further
improved when the objective is to optimize for imputation-based coverage with a
limited number of markers that can be genotyped directly.
Marker selection strategy further optimized for
imputation-based coverage
ranked marker list
pre-selected set target set(s) candidate set
if pairwise-based coverage is
maximized with <X markers,
marker selection finishes
using greedy marker
selection based on
pairwise tagging
if X markers are selected
update the pre-selected set update the target set to remove
to include the newly markers covered by either pairwise
selected markers tagging or imputation
Marker selection strategy further optimized for
imputation-based coverage
 Imputation-based coverage is calculated using IMPUTE2 v2.2.22,3,4. Pre-phased
1000 Genomes Project phase 1 integrated release v3 haplotypes (except
haplotypes for the samples we try to impute genotypes for) are used as the
reference panel for imputing genotypes for about 10% of the samples in the
population of interest at a time. This “cross-validated” imputation approach is
carried out 10 times so that imputed genotypes are available for all samples in a
given population. The r2 is then calculated for each marker based on the
correlation between imputed allele dosage and the genotypes from the 1000
Genomes Project for these samples.
 The number of markers selected in each “cycle” of the greedy marker selection
(X) is determined so that the average inter-marker distance is reasonably large
compared to typical haplotype block lengths. It is also not too small so that only
a reasonable number of cycles (typically about 12-15) are needed for a marker
selection run.
 Using a computer cluster with ~100 virtual nodes (~3GHz each) to carry out the
imputation-based r2 calculations, a genome-wide marker selection process
starting from scratch takes about 4 days.
Marker selection targeting coverage of common
(MAF ≥0.05) 1000 Genomes markers in CEU
Marker selection produced a ranked list of 399K entries for all autosomal
chromosomes and chrX. Compared to the top 399K markers selected using greedy
marker selection only, the achieved imputation-based coverage (using r2=0.8
cutoff) is higher in both CEU and GBR, a 1000 Genomes population that is also
Caucasian but not considered directly during marker selection. The coverage is also
higher/comparable to that achieved with the OmniExpress genotyping panel, which
consists of 727K markers.
Marker panel
Number of markers
in panel
Imputation‐based
coverage in CEU
Imputation‐based
coverage in GBR
Based on greedy
marker selection
Based on the
improved strategy
OmniExpress
399K
399K
727K
0.871
0.903
0.873
0.850
0.871
0.860

More Related Content

Viewers also liked

Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modelingnickmbailey
 
ελβετία
ελβετία ελβετία
ελβετία thodoros
 
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013Le Hibou
 
Memoria abstracta mayas y redes
Memoria abstracta mayas y redesMemoria abstracta mayas y redes
Memoria abstracta mayas y redesLizeth Gonzalez
 

Viewers also liked (6)

Terry AFAM
Terry AFAMTerry AFAM
Terry AFAM
 
Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modeling
 
ελβετία
ελβετία ελβετία
ελβετία
 
SALESRESUME15
SALESRESUME15SALESRESUME15
SALESRESUME15
 
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013
Newsletter #53 - Le Hibou Agence .V. du 31 mai 2013
 
Memoria abstracta mayas y redes
Memoria abstracta mayas y redesMemoria abstracta mayas y redes
Memoria abstracta mayas y redes
 

Similar to Designing GWAS arrays for efficient imputation-based coverage

Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHRafael C. Jimenez
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxPABOLU TEJASREE
 
Mitigating genotyping application note
Mitigating genotyping application noteMitigating genotyping application note
Mitigating genotyping application noteAffymetrix
 
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...SREEKUTTY SREEKUMAR
 
Distributed maximum likelihood classification of linear modulations over noni...
Distributed maximum likelihood classification of linear modulations over noni...Distributed maximum likelihood classification of linear modulations over noni...
Distributed maximum likelihood classification of linear modulations over noni...ieeepondy
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Microarray Data Classification Using Support Vector Machine
Microarray Data Classification Using Support Vector MachineMicroarray Data Classification Using Support Vector Machine
Microarray Data Classification Using Support Vector MachineCSCJournals
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerRaunak Shrestha
 

Similar to Designing GWAS arrays for efficient imputation-based coverage (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Mitigating genotyping application note
Mitigating genotyping application noteMitigating genotyping application note
Mitigating genotyping application note
 
C0451823
C0451823C0451823
C0451823
 
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...
MULTI OBJECTIVE SIMULATED ANNEALING-BASED CLUSTERING OF TISSUE SAMPLES FOR CA...
 
Distributed maximum likelihood classification of linear modulations over noni...
Distributed maximum likelihood classification of linear modulations over noni...Distributed maximum likelihood classification of linear modulations over noni...
Distributed maximum likelihood classification of linear modulations over noni...
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Microarray Data Classification Using Support Vector Machine
Microarray Data Classification Using Support Vector MachineMicroarray Data Classification Using Support Vector Machine
Microarray Data Classification Using Support Vector Machine
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of Cancer
 

More from Affymetrix

Axiom™ Genome-Wide CEU 1 Array Plate
Axiom™ Genome-Wide CEU 1 Array PlateAxiom™ Genome-Wide CEU 1 Array Plate
Axiom™ Genome-Wide CEU 1 Array PlateAffymetrix
 
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAxiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAffymetrix
 
Axiom™ Genome-Wide ASI 1 Array Plate
Axiom™ Genome-Wide ASI 1 Array PlateAxiom™ Genome-Wide ASI 1 Array Plate
Axiom™ Genome-Wide ASI 1 Array PlateAffymetrix
 
Axiom® Biobank Genotyping Arrays
Axiom® Biobank Genotyping ArraysAxiom® Biobank Genotyping Arrays
Axiom® Biobank Genotyping ArraysAffymetrix
 
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...Affymetrix
 
A SNP array for human population genetics studies
A SNP array for human population genetics studiesA SNP array for human population genetics studies
A SNP array for human population genetics studiesAffymetrix
 
Axiom® Genome-Wide LAT 1 Array World Array 4
Axiom®  Genome-Wide LAT 1 Array World Array 4Axiom®  Genome-Wide LAT 1 Array World Array 4
Axiom® Genome-Wide LAT 1 Array World Array 4Affymetrix
 
Axiom® Genome-Wide AFR 1 Array World Array 3
Axiom®  Genome-Wide AFR 1 Array World Array 3Axiom®  Genome-Wide AFR 1 Array World Array 3
Axiom® Genome-Wide AFR 1 Array World Array 3Affymetrix
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochureAffymetrix
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureAffymetrix
 
Download our publication archive list
Download our publication archive listDownload our publication archive list
Download our publication archive listAffymetrix
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Affymetrix
 
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...Affymetrix
 
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...From trials evaluating drugs to trials evaluating treatment algorithms – Focu...
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...Affymetrix
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix
 
Phenotypic identification of subclones in multiple myeloma with different gen...
Phenotypic identification of subclones in multiple myeloma with different gen...Phenotypic identification of subclones in multiple myeloma with different gen...
Phenotypic identification of subclones in multiple myeloma with different gen...Affymetrix
 
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...Affymetrix
 
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom ArraysAllopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom ArraysAffymetrix
 
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...Affymetrix
 
Development of a high-throughput high-density SNP genotyping array for bovine
Development of a high-throughput high-density SNP genotyping array for bovineDevelopment of a high-throughput high-density SNP genotyping array for bovine
Development of a high-throughput high-density SNP genotyping array for bovineAffymetrix
 

More from Affymetrix (20)

Axiom™ Genome-Wide CEU 1 Array Plate
Axiom™ Genome-Wide CEU 1 Array PlateAxiom™ Genome-Wide CEU 1 Array Plate
Axiom™ Genome-Wide CEU 1 Array Plate
 
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAxiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
 
Axiom™ Genome-Wide ASI 1 Array Plate
Axiom™ Genome-Wide ASI 1 Array PlateAxiom™ Genome-Wide ASI 1 Array Plate
Axiom™ Genome-Wide ASI 1 Array Plate
 
Axiom® Biobank Genotyping Arrays
Axiom® Biobank Genotyping ArraysAxiom® Biobank Genotyping Arrays
Axiom® Biobank Genotyping Arrays
 
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...
SNP genotyping using the Affymetrix® Axiom® Genome-Wide Pan-African (PanAFR) ...
 
A SNP array for human population genetics studies
A SNP array for human population genetics studiesA SNP array for human population genetics studies
A SNP array for human population genetics studies
 
Axiom® Genome-Wide LAT 1 Array World Array 4
Axiom®  Genome-Wide LAT 1 Array World Array 4Axiom®  Genome-Wide LAT 1 Array World Array 4
Axiom® Genome-Wide LAT 1 Array World Array 4
 
Axiom® Genome-Wide AFR 1 Array World Array 3
Axiom®  Genome-Wide AFR 1 Array World Array 3Axiom®  Genome-Wide AFR 1 Array World Array 3
Axiom® Genome-Wide AFR 1 Array World Array 3
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochure
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochure
 
Download our publication archive list
Download our publication archive listDownload our publication archive list
Download our publication archive list
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq
 
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...
Use of Affymetrix Arrays (GeneChip® Human Transcriptome 2.0 Array and Cytosca...
 
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...From trials evaluating drugs to trials evaluating treatment algorithms – Focu...
From trials evaluating drugs to trials evaluating treatment algorithms – Focu...
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
 
Phenotypic identification of subclones in multiple myeloma with different gen...
Phenotypic identification of subclones in multiple myeloma with different gen...Phenotypic identification of subclones in multiple myeloma with different gen...
Phenotypic identification of subclones in multiple myeloma with different gen...
 
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...
Statistical methods for off-target variant genotyping on Affymetrix' Axiom Ar...
 
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom ArraysAllopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
Allopolyploid Genotyping Algorithm on Affymetrix' Axiom Arrays
 
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...
SNP genotyping of markers with nearby secondary polymorphisms using Affymetri...
 
Development of a high-throughput high-density SNP genotyping array for bovine
Development of a high-throughput high-density SNP genotyping array for bovineDevelopment of a high-throughput high-density SNP genotyping array for bovine
Development of a high-throughput high-density SNP genotyping array for bovine
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Designing GWAS arrays for efficient imputation-based coverage

  • 1. try to select X markers Designing GWAS arrays for efficient imputation-based coverage Yiping Zhan, Yontao Lu, Teresa Webster, Laurent Bellon, and Jeanette Schmidt Affymetrix, Inc. 3420 Central Expressway Santa Clara, CA 95051 Results for chr20 at different cutoff points ABSTRACT Imputation has become increasingly popular as a standard step of GWAS data analysis. As a result, imputation-based coverage, instead of coverage based on pairwise tagging only, is becoming a more relevant metric for the value of a GWAS array with respect to the statistical power it offers for genome-wide association tests. However, traditionally whole-genome genotyping arrays have been designed via a tagging marker selection strategy, aiming for good pairwise tagging results, which do not align very well with the goal of having efficient imputation-based coverage. Taking advantage of the genotype data generated by the 1000 Genomes Project1, the latest development of imputation tools, and our flexible in-house greedy algorithm for selecting markers based on LD, we came up with a GWAS array design strategy that iterates between an LD-based marker selection step and an imputation-based coverage evaluation step. The coverage evaluation step identifies and removes variants from the target set of markers when their genotypes are imputed well enough based on already selected markers, enabling the selection of markers more valuable for imputation-based coverage in the next iteration. This strategy generates markers for designing genotyping arrays that are better optimized for imputation-based coverage when compared with a set of markers selected with the pairwise-tagging-based selection strategy. Our results show that the new marker selection method makes it possible to design powerful yet efficient GWAS arrays with better imputation-based coverage compared to existing arrays that contain much larger sets of markers. Greedy marker selection algorithm based on pairwise tagging Magenta and blue data points correspond to markers selected using the greedy method and the improved method, respectively. The right-most data points correspond to the genome-wide marker lists of size 399K. Marker selection was carried out for CEU but results are shown in both CEU and GBR. The solid horizontal line corresponds to the coverage achievable with all 215K chr20 candidate markers available for selection. The dashed line shows the coverage by the 18.5K chr20 markers in OmniExpress. References 1. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes (to be published) 2. B. N. Howie, P. Donnelly, J. Marchini. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6):e1000529 (2009). 3. B. Howie, J. Marchini, M. Stephens. Genotype imputation with thousands of genomes. G3: Genes, Genomics, Genetics 1(6):457-470 (2011). 4. B. Howie, C. Fuchsberger, M. Stephens, J. Marchini, G. R. Abecasis. Fast and accurate genotype imputation in genome-wide association studies through pre- phasing. Nature Genetics 44(8):955-959 (2012). target marker set for coverage (can be different for different populations) candidate marker set (markers that can be selected for genotyping) pre-selected marker set (may be empty) greedy marker selection ranked markers for a GWAS panel Given a target set of markers for each population of interest, a candidate set of markers to select markers from, and a pre-selected marker set, our in-house greedy marker selection algorithm considers LD data for each population as well as other possible factors, like the amount of space it takes to genotype a marker, to greedily select the most valuable marker to be included in a GWAS panel in each iteration until pairwise-based coverage is maximized in target populations, generating a ranked marker list. This process is very efficient in designing GWAS panels with optimized pairwise-tagging-based coverage (coverage hereby defined as the fraction of target markers with maximal r2 ≥0.8). GWAS panels designed with the greedy marker selection algorithm have good imputation-based coverage as well. However, the marker selection can be further improved when the objective is to optimize for imputation-based coverage with a limited number of markers that can be genotyped directly. Marker selection strategy further optimized for imputation-based coverage ranked marker list pre-selected set target set(s) candidate set if pairwise-based coverage is maximized with <X markers, marker selection finishes using greedy marker selection based on pairwise tagging if X markers are selected update the pre-selected set update the target set to remove to include the newly markers covered by either pairwise selected markers tagging or imputation Marker selection strategy further optimized for imputation-based coverage  Imputation-based coverage is calculated using IMPUTE2 v2.2.22,3,4. Pre-phased 1000 Genomes Project phase 1 integrated release v3 haplotypes (except haplotypes for the samples we try to impute genotypes for) are used as the reference panel for imputing genotypes for about 10% of the samples in the population of interest at a time. This “cross-validated” imputation approach is carried out 10 times so that imputed genotypes are available for all samples in a given population. The r2 is then calculated for each marker based on the correlation between imputed allele dosage and the genotypes from the 1000 Genomes Project for these samples.  The number of markers selected in each “cycle” of the greedy marker selection (X) is determined so that the average inter-marker distance is reasonably large compared to typical haplotype block lengths. It is also not too small so that only a reasonable number of cycles (typically about 12-15) are needed for a marker selection run.  Using a computer cluster with ~100 virtual nodes (~3GHz each) to carry out the imputation-based r2 calculations, a genome-wide marker selection process starting from scratch takes about 4 days. Marker selection targeting coverage of common (MAF ≥0.05) 1000 Genomes markers in CEU Marker selection produced a ranked list of 399K entries for all autosomal chromosomes and chrX. Compared to the top 399K markers selected using greedy marker selection only, the achieved imputation-based coverage (using r2=0.8 cutoff) is higher in both CEU and GBR, a 1000 Genomes population that is also Caucasian but not considered directly during marker selection. The coverage is also higher/comparable to that achieved with the OmniExpress genotyping panel, which consists of 727K markers. Marker panel Number of markers in panel Imputation‐based coverage in CEU Imputation‐based coverage in GBR Based on greedy marker selection Based on the improved strategy OmniExpress 399K 399K 727K 0.871 0.903 0.873 0.850 0.871 0.860