Exploring GWAS, Genomic Prediction, and
Imputation using SVS
Steve Hystad - Field Application Scientist
20 most promising
Biotech Technology
Providers
Top 10 Analytics
Solution Providers
Hype Cycle for
Life sciences
Use the Questions pane in
your GoToWebinar window
Questions during
the presentation
Core
Features
Packages
Core Features
 Powerful Data Management
 Rich Visualizations
 Robust Statistics
 Flexible
Applications
 Genotype Analysis
 DNA sequence analysis
 CNV Analysis
 RNA-seq differential
expression
SNP & Variation Suite (SVS)
GenomeBrowse
 Powerful visualization software for
DNA and RNA sequencing data
 Supports most standard
bioinformatics file formats
 Fast and responsive for interactive
analysis
 Intuitive controls
 Stream data from the cloud and
from your own remote data servers
Approximate Agenda
GBLUP2
3 Imputation
1 GWAS workflow, correction of data
Simulated Cattle Data
 402 Bos taurus cattle from Bovine
HapMap project
 Illumina 50k genotypes
 Simple oligogenic trait simulation
- 5 SNPs with independent additive
effects
- About 62% of trait explained by
simulated genetic effect
Angus
5%
BeefMaster
4%
Brown Swiss
7%
Charolais
6%
Guernsey
6%
Hereford
4%
Holstein
15%
Jersey
7%
Limousin
7%
N'Dama
7%
Norwegian
Red
5%
Piedmontese
7%
Red Angus
2%
Romagnola
7%
Santa
Gertrudis
7%
Sheko
5%
Demonstration
Q-Q Plots
λ = 2.19
λ = 0.979
λ = 1.049
Overview of Methods
Corr/Trend Test
• Quality Control of
Samples and
Markers
PCA Correction (Eigenstrat
Price 2006)
• Direct correction of
genotype and phenotype
data
• Adding PCs as
covariates to regression
model
EMMAX (Kang 2010)
• Using the Genomic
Relationship Matrix
(IBD) to account for
stratification
Naïve GWAS
GWAS + Correcting
for Population
Stratification
Mixed Model
Approach
Genomic Prediction Methods Available in SVS
 GBLUP
- Assumes all loci contribute to the
phenotype
 Bayes C
- Estimates effects of gene loci
together with parameters
required to define probability
distribution over events
- Prior probability that any SNP will
have no effect fixed
 Bayes C-pi
- Prior probability that any SNP will
have no effect unknown and
allowed to vary
Why Use Genomic Prediction?
 Calculate breeding value (gEBV) for all subjects in a population
- May be more accurate than breeding selection based only on pedigree and trait data
 Predict breeding values for subjects with unknown phenotypes
- May avoid costly and lengthy field trials
- May not always be possible to measure the phenotype
 Identify genetic markers with best predictive power for a trait
- Assist in development of predictive tests and other assays
OR ?
GBLUP
 Assumes all loci contribute to phenotype
 Incorporates genomic relationship matrix (GRM) in mixed linear model
framework to account for relatedness among samples
 Calculates allele substitution effect (ASE) for each SNP
 Computes estimated breeding values (GEBV) and predicted phenotypes
for all samples
 Also calculates:
- Pseudo-heritability of trait
- Genetic component of trait variance
- Error component of trait variance
Why Imputation
 Fill in Missing Genotypes
- Improve quality of GT calls
 Harmonizing Arrays
- Facilitate meta-analyses that combine
studies genotyped on different sets of
variants
 Increase Genotypes
- Increases the power and resolution of
genetic association studies.
- Find candidate susceptibility variants to
guide fine-mapping.
Questions or
more info:
 Email
info@goldenhelix.com
 Request an evaluation of
the software at
www.goldenhelix.com
 Check out our abstract
competition!

A Walk Through GWAS

  • 1.
    Exploring GWAS, GenomicPrediction, and Imputation using SVS Steve Hystad - Field Application Scientist 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences
  • 2.
    Use the Questionspane in your GoToWebinar window Questions during the presentation
  • 3.
    Core Features Packages Core Features  PowerfulData Management  Rich Visualizations  Robust Statistics  Flexible Applications  Genotype Analysis  DNA sequence analysis  CNV Analysis  RNA-seq differential expression SNP & Variation Suite (SVS)
  • 4.
    GenomeBrowse  Powerful visualizationsoftware for DNA and RNA sequencing data  Supports most standard bioinformatics file formats  Fast and responsive for interactive analysis  Intuitive controls  Stream data from the cloud and from your own remote data servers
  • 5.
    Approximate Agenda GBLUP2 3 Imputation 1GWAS workflow, correction of data
  • 6.
    Simulated Cattle Data 402 Bos taurus cattle from Bovine HapMap project  Illumina 50k genotypes  Simple oligogenic trait simulation - 5 SNPs with independent additive effects - About 62% of trait explained by simulated genetic effect Angus 5% BeefMaster 4% Brown Swiss 7% Charolais 6% Guernsey 6% Hereford 4% Holstein 15% Jersey 7% Limousin 7% N'Dama 7% Norwegian Red 5% Piedmontese 7% Red Angus 2% Romagnola 7% Santa Gertrudis 7% Sheko 5%
  • 7.
  • 8.
    Q-Q Plots λ =2.19 λ = 0.979 λ = 1.049
  • 9.
    Overview of Methods Corr/TrendTest • Quality Control of Samples and Markers PCA Correction (Eigenstrat Price 2006) • Direct correction of genotype and phenotype data • Adding PCs as covariates to regression model EMMAX (Kang 2010) • Using the Genomic Relationship Matrix (IBD) to account for stratification Naïve GWAS GWAS + Correcting for Population Stratification Mixed Model Approach
  • 11.
    Genomic Prediction MethodsAvailable in SVS  GBLUP - Assumes all loci contribute to the phenotype  Bayes C - Estimates effects of gene loci together with parameters required to define probability distribution over events - Prior probability that any SNP will have no effect fixed  Bayes C-pi - Prior probability that any SNP will have no effect unknown and allowed to vary
  • 12.
    Why Use GenomicPrediction?  Calculate breeding value (gEBV) for all subjects in a population - May be more accurate than breeding selection based only on pedigree and trait data  Predict breeding values for subjects with unknown phenotypes - May avoid costly and lengthy field trials - May not always be possible to measure the phenotype  Identify genetic markers with best predictive power for a trait - Assist in development of predictive tests and other assays OR ?
  • 13.
    GBLUP  Assumes allloci contribute to phenotype  Incorporates genomic relationship matrix (GRM) in mixed linear model framework to account for relatedness among samples  Calculates allele substitution effect (ASE) for each SNP  Computes estimated breeding values (GEBV) and predicted phenotypes for all samples  Also calculates: - Pseudo-heritability of trait - Genetic component of trait variance - Error component of trait variance
  • 15.
    Why Imputation  Fillin Missing Genotypes - Improve quality of GT calls  Harmonizing Arrays - Facilitate meta-analyses that combine studies genotyped on different sets of variants  Increase Genotypes - Increases the power and resolution of genetic association studies. - Find candidate susceptibility variants to guide fine-mapping.
  • 16.
    Questions or more info: Email info@goldenhelix.com  Request an evaluation of the software at www.goldenhelix.com  Check out our abstract competition!