I was invited to give a presentation at the Journal Club meeting at Debora Marks's lab. Here we have the slides for the presentation.
Please visit my website to learn more about this presentation: https://yosuketanigawa.com/talks/2022-01-28-jclub-Marks-lab
4. PRS predictions are sometimes useful
- Difference in (overlapping) PRS distributions are sometimes useful for
population stratification.
- PRS can be used as instrument variable in causal inference
4
N. R. Wray et al., JAMA Psychiatry (2020); Sakaue*, Kanai*, et al., Nat Med (2020).
PRS(biomarker)
associations with lifespan
5. PRS models often contain many variants
- One challenge in PRS modeling is the LD structure
- Bayesian regression with GWAS summary statistics + LD reference
has been successful
- Genome-wide polygenic risk score (Khera et al) with 6M+ variants
- We don’t assume 6M causal variants for common complex traits
5
Khera, et al., Nat Gen (2018).
6. Sparse regression model with Lasso
- One alternative: regularized regression on individual-level data
- e.g. Lasso
- Challenge: dataset is large (n = 300k, p = 1M+)
- Does not fit on memory, etc.
- We developed Batch screening iterative Lasso (BASIL)
- Efficient screening based on “strong rule” (Tibshirani et al 2012)
- Solves Lasso via iterative procedure
6
Junyang Qian
Qian, Tanigawa, et al. PLOS Gen. (2020).
7. Batch screening iterative Lasso (BASIL)
BASIL (= BAtch Screening Iterative Lasso) in R snpnet package
7
3 steps per iteration
1. Screening
2. Lasso Fit (glmnet)
3. KKT Check
Qian, Tanigawa, et al. PLOS Gen. (2020).
8. BASIL/snpnet model are sparse, yet have
comparable predictive performance
- The snpnet PRS models (Lasso & Elastic-Net) have comparable
predictive performance with SBayesR
- Standing height was one of the most polygenic traits.
- Hight PRS model has 47k variants (5% of non-zero BETAs)
8
Qian, Tanigawa, et al. PLOS Gen. (2020).; Tanigawa, Qian, et al. medRxiv (2021)
Hold-out
test
set
R
2
Hold-out
test
set
AUC
9. Genetics of 35 biomarkers study in UK Biobank
9
349 rare (MAF < 1%) non-synonymous variant associations
1,381 (1,134 novel) associations on non-synonymous variants
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
10. Genetics of 35 biomarkers study in UK Biobank
10
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Polygenic risk scores (PRSs) for 35 biomarkers
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
11. Polygenic risk scores (PRSs) for 35 biomarkers
• Created 70% training/10% validation/ 20% test split for white British
• Tested 4 additional UKB sub-populations of different ancestries
• Limited trans-ethnic predictive performance of PRSs
11
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
12. Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
12
Lewis, C. M. & Vassos, E.
Genome Medicine (2020).
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
13. Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Identify diseases with biomarker PRS
associations
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
13
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
14. Multi-PRS - a linear combination of a disease
PRS and biomarker PRSs
- Multiple observations suggest “biomarkers → disease” links
- PRS-PheWAS analysis
- Biomarkers are more heritable than disease
- Mendelian Randomization
- Multi-PRS is a weighted sum of PRSs
i.e. w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
14
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
15. Weights of multi-PRS comes from Lasso
Multi-PRS: w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
15
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
16. multi-PRS improves disease prevalence prediction
Chronic kidney
disease (CKD)
Other diseases in
UK Biobank
16
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
17. multi-PRS models improves incident disease
prediction in FinnGen
The multi-PRS model is replicated in Finnish cohort (FinnGen)
17
Nina Mars
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
18. - Two complementary approaches to improve predictive performance
- 1) Sample size → increase in power
- 2) Multi-trait analysis
- Why does multi-PRS work?
- Quantitative traits have more power (J. Yang et al 2010)
- Genetic correlation between biomarkers and disease
- Phenotyping challenges in some disease phenotypes
- When does multi-PRS work the best?
- Exact conditions are not fully clear (yet)
- The multi-phenotype model
- multi-PRS:
Genetics → Biomarkers (Molecular traits) → Disease
- Alternatives (other models):
- “Genetic component”-based model
What we learned from multi-PRS?
18
19. Extreme polygenicity & pleiotropy in
the genetics of common complex traits
19
Genetic
variants
Complex
traits
- Polygenicity: many variants - one trait
- Pleiotropy: one variant - many traits
- Large number of associations in
population-based cohorts
- Can we group them together for enhanced interpretation?
21. Low-rank representation of association summary
statistics provides latent components
1. Genome & phenome-wide association summary statistic matrix
2. Truncated-singular value decomposition (TSVD)
3. Quantify the variant & trait-loadings
on each component
“paint” the disease genetics with components!
Summary statistics from
association analysis
(beta or log odds ratio)
21
Tanigawa*, Li*, et al. Nat Comm (2019).
22. Biplot annotation helps interpretation of
DeGAs latent components
22
Tanigawa*, Li*, et al. Nat Comm (2019).
23. DeGAs is subsequently extended to PRS model
- DeGAs-PRS (dPRS)
- Derive “component”-score
- Disease PRS as sum of
component-score
- It offers better interpretation
23
Aguirre, Tanigawa, et al. Eur J Hum Gen (2021).
24. Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the all
1. BASIL/snpnet (Lasso) – sparse PRS models
2. multi-PRS – linear combination of snpnet PRSs
w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
3. DeGAs-PRS – genetic component-based PRSs
w1
(cPRS1
) + w2
(cPRS2
) + w3
(cPRS3
) + …
cPRS comes from tSVD of GWAS associations
SRRR/multiSnpnet fits penalized multivariate multi-response model
24
25. Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the two approaches
25
- One can show (1) and (2) are equivalent. Note: it’s NOT convex
- Group lasso penalty
- We select features that influence on multiple responses (traits)
- DeGAs (tSVD)-based approach offers interpretation
Qian, Tanigawa, et al. Ann Appl Stat (in press).
(1)
(2)
Junyang Qian
26. multiSnpnet/SRRR applied on UK Biobank
- Asthma & clinically related traits
- Predictive performance improvements
for asthma & basophil count
- SVD of the coefficients offer interpretation
26
Qian, Tanigawa, et al. Ann Appl Stat (in press).
27. Summary & future directions
Summary
- Polygenic risk score models (PRSs) computes genetic liability of
diseases by aggregating effects across multiple genetic variants
- Sparse snpnet PRS models have competitive performance
- Multi-trait aware PRS can improve the predictive power
Future direction & discussion
- Integrate with fine-mapping, conservation, variant, gene annotation?
- Incorporate (cell-type-specific) biological knowledge as prior
- It may help improving the predictive performance / transferability?
- Machine-learning-based PRS models
- Non-linear combination of multiple traits
- Incorporate biological priors
27
28. Acknowledgements
Dept. Biomedical Data Science
- Matthew Aguirre
- Manuel A. Rivas
- the Rivas lab
Dept. Statistics
- Junyang Qian
- Trevor Hastie
- Rob Tibshirani
Dept. Genetics, Stanford
- Nasa Sinnott-Armstrong
- Jonathan Pritchard
University of Helsinki
- Nina Mars
- Samuli Ripatti
28
Funding supports:
Nasa Sinnott-Armstrong
Junyang Qian
29. References
- Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. (2021). (PMID: 33462484)
- Genetics of 35 biomarkers, multi-PRS
- Qian, Tanigawa, et al. PLoS Gen. (2020). (PMID: 33095761)
- Batch screening iterative Lasso (BASIL) & R snpnet package
- Qian, Tanigawa, et al. Ann Appl Stat. (in press). (doi: 10.1101/2020.05.30.125252)
- Sparse reduced rank regression (SRRR) & R multiSnpnet package
- Tanigawa, Li, et al. Nat Comm (2019). (PMID: 31492854)
- DeGAs - decomposition of genetic associations
- Aguirre, Tanigawa, et al. Eur J Hum Genet. (2021). (PMID: 33558700)
- DeGAs-PRS (dPRS)
- Tanigawa, Qian, et al. medRxiv (2021) (doi: 10.1101/2021.09.02.21262942)
- Phenome-wide application of BASIL/snpnet
29
31. multiSnpnet efficiently solves SRRR
BASIL-like iterative procedure
31
3 steps per iteration
1. Screening
2. Fitting (SVD & group lasso)
3. KKT Check
Qian, Tanigawa, et al. Ann Appl Stat (in press).
32. Variant prioritization w/ predicted consequence
does not help improving the performance
- Lasso penalty factor.
- Penalty factor = 0 → no regularization on the variable
- Protein-truncating and known pathogenic variants = 0.5
- Protein-altering and known likely-pathogenic variants = 0.75
32
Tanigawa, Qian, et al. medRxiv (2021)
34. Improved genetic prediction of testosterone
levels with sex-specific PRS models
Sex-specific polygenic risk model for testosterone outperforms polygenic
risk scores that combine males and females
34
Flynn, Tanigawa, et al. EJHG (2021).