SlideShare a Scribd company logo
1 of 34
Download to read offline
Multi-trait modeling in polygenic scores
Yosuke Tanigawa
Postdoc @ Computational Biology Lab
(PI: Prof. Manolis Kellis), MIT CSAIL
2022/1/28 (Fri.) 2:30 pm (ET) @ Zoom
Debora Marks Lab Journal Club
1
@yk_tani
https://yosuketanigawa.com/
The main paper for journal club presentation
2
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Joint work w/ Nasa
Sinnott-Armstrong
Polygenic risk scores (PRSs) combine
genetic associations across many variants
- Large-scale cohorts enabled discovery of GWAS associations
- Polygenic risk score (PGS)
(or polygenic score [PGS])
“Inference” to “Prediction”
3
i-th individual
j-th variant
G: genotype
β: effect size
PRS predictions are sometimes useful
- Difference in (overlapping) PRS distributions are sometimes useful for
population stratification.
- PRS can be used as instrument variable in causal inference
4
N. R. Wray et al., JAMA Psychiatry (2020); Sakaue*, Kanai*, et al., Nat Med (2020).
PRS(biomarker)
associations with lifespan
PRS models often contain many variants
- One challenge in PRS modeling is the LD structure
- Bayesian regression with GWAS summary statistics + LD reference
has been successful
- Genome-wide polygenic risk score (Khera et al) with 6M+ variants
- We don’t assume 6M causal variants for common complex traits
5
Khera, et al., Nat Gen (2018).
Sparse regression model with Lasso
- One alternative: regularized regression on individual-level data
- e.g. Lasso
- Challenge: dataset is large (n = 300k, p = 1M+)
- Does not fit on memory, etc.
- We developed Batch screening iterative Lasso (BASIL)
- Efficient screening based on “strong rule” (Tibshirani et al 2012)
- Solves Lasso via iterative procedure
6
Junyang Qian
Qian, Tanigawa, et al. PLOS Gen. (2020).
Batch screening iterative Lasso (BASIL)
BASIL (= BAtch Screening Iterative Lasso) in R snpnet package
7
3 steps per iteration
1. Screening
2. Lasso Fit (glmnet)
3. KKT Check
Qian, Tanigawa, et al. PLOS Gen. (2020).
BASIL/snpnet model are sparse, yet have
comparable predictive performance
- The snpnet PRS models (Lasso & Elastic-Net) have comparable
predictive performance with SBayesR
- Standing height was one of the most polygenic traits.
- Hight PRS model has 47k variants (5% of non-zero BETAs)
8
Qian, Tanigawa, et al. PLOS Gen. (2020).; Tanigawa, Qian, et al. medRxiv (2021)
Hold-out
test
set
R
2
Hold-out
test
set
AUC
Genetics of 35 biomarkers study in UK Biobank
9
349 rare (MAF < 1%) non-synonymous variant associations
1,381 (1,134 novel) associations on non-synonymous variants
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Genetics of 35 biomarkers study in UK Biobank
10
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Polygenic risk scores (PRSs) for 35 biomarkers
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Polygenic risk scores (PRSs) for 35 biomarkers
• Created 70% training/10% validation/ 20% test split for white British
• Tested 4 additional UKB sub-populations of different ancestries
• Limited trans-ethnic predictive performance of PRSs
11
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
12
Lewis, C. M. & Vassos, E.
Genome Medicine (2020).
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Identify diseases with biomarker PRS
associations
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
13
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Multi-PRS - a linear combination of a disease
PRS and biomarker PRSs
- Multiple observations suggest “biomarkers → disease” links
- PRS-PheWAS analysis
- Biomarkers are more heritable than disease
- Mendelian Randomization
- Multi-PRS is a weighted sum of PRSs
i.e. w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
14
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Weights of multi-PRS comes from Lasso
Multi-PRS: w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
15
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
multi-PRS improves disease prevalence prediction
Chronic kidney
disease (CKD)
Other diseases in
UK Biobank
16
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
multi-PRS models improves incident disease
prediction in FinnGen
The multi-PRS model is replicated in Finnish cohort (FinnGen)
17
Nina Mars
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
- Two complementary approaches to improve predictive performance
- 1) Sample size → increase in power
- 2) Multi-trait analysis
- Why does multi-PRS work?
- Quantitative traits have more power (J. Yang et al 2010)
- Genetic correlation between biomarkers and disease
- Phenotyping challenges in some disease phenotypes
- When does multi-PRS work the best?
- Exact conditions are not fully clear (yet)
- The multi-phenotype model
- multi-PRS:
Genetics → Biomarkers (Molecular traits) → Disease
- Alternatives (other models):
- “Genetic component”-based model
What we learned from multi-PRS?
18
Extreme polygenicity & pleiotropy in
the genetics of common complex traits
19
Genetic
variants
Complex
traits
- Polygenicity: many variants - one trait
- Pleiotropy: one variant - many traits
- Large number of associations in
population-based cohorts
- Can we group them together for enhanced interpretation?
Decomposition of genetic associations (DeGAs)
20
Tanigawa*, Li*, et al. Nat Comm (2019).
Low-rank representation of association summary
statistics provides latent components
1. Genome & phenome-wide association summary statistic matrix
2. Truncated-singular value decomposition (TSVD)
3. Quantify the variant & trait-loadings
on each component
“paint” the disease genetics with components!
Summary statistics from
association analysis
(beta or log odds ratio)
21
Tanigawa*, Li*, et al. Nat Comm (2019).
Biplot annotation helps interpretation of
DeGAs latent components
22
Tanigawa*, Li*, et al. Nat Comm (2019).
DeGAs is subsequently extended to PRS model
- DeGAs-PRS (dPRS)
- Derive “component”-score
- Disease PRS as sum of
component-score
- It offers better interpretation
23
Aguirre, Tanigawa, et al. Eur J Hum Gen (2021).
Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the all
1. BASIL/snpnet (Lasso) – sparse PRS models
2. multi-PRS – linear combination of snpnet PRSs
w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
3. DeGAs-PRS – genetic component-based PRSs
w1
(cPRS1
) + w2
(cPRS2
) + w3
(cPRS3
) + …
cPRS comes from tSVD of GWAS associations
SRRR/multiSnpnet fits penalized multivariate multi-response model
24
Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the two approaches
25
- One can show (1) and (2) are equivalent. Note: it’s NOT convex
- Group lasso penalty
- We select features that influence on multiple responses (traits)
- DeGAs (tSVD)-based approach offers interpretation
Qian, Tanigawa, et al. Ann Appl Stat (in press).
(1)
(2)
Junyang Qian
multiSnpnet/SRRR applied on UK Biobank
- Asthma & clinically related traits
- Predictive performance improvements
for asthma & basophil count
- SVD of the coefficients offer interpretation
26
Qian, Tanigawa, et al. Ann Appl Stat (in press).
Summary & future directions
Summary
- Polygenic risk score models (PRSs) computes genetic liability of
diseases by aggregating effects across multiple genetic variants
- Sparse snpnet PRS models have competitive performance
- Multi-trait aware PRS can improve the predictive power
Future direction & discussion
- Integrate with fine-mapping, conservation, variant, gene annotation?
- Incorporate (cell-type-specific) biological knowledge as prior
- It may help improving the predictive performance / transferability?
- Machine-learning-based PRS models
- Non-linear combination of multiple traits
- Incorporate biological priors
27
Acknowledgements
Dept. Biomedical Data Science
- Matthew Aguirre
- Manuel A. Rivas
- the Rivas lab
Dept. Statistics
- Junyang Qian
- Trevor Hastie
- Rob Tibshirani
Dept. Genetics, Stanford
- Nasa Sinnott-Armstrong
- Jonathan Pritchard
University of Helsinki
- Nina Mars
- Samuli Ripatti
28
Funding supports:
Nasa Sinnott-Armstrong
Junyang Qian
References
- Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. (2021). (PMID: 33462484)
- Genetics of 35 biomarkers, multi-PRS
- Qian, Tanigawa, et al. PLoS Gen. (2020). (PMID: 33095761)
- Batch screening iterative Lasso (BASIL) & R snpnet package
- Qian, Tanigawa, et al. Ann Appl Stat. (in press). (doi: 10.1101/2020.05.30.125252)
- Sparse reduced rank regression (SRRR) & R multiSnpnet package
- Tanigawa, Li, et al. Nat Comm (2019). (PMID: 31492854)
- DeGAs - decomposition of genetic associations
- Aguirre, Tanigawa, et al. Eur J Hum Genet. (2021). (PMID: 33558700)
- DeGAs-PRS (dPRS)
- Tanigawa, Qian, et al. medRxiv (2021) (doi: 10.1101/2021.09.02.21262942)
- Phenome-wide application of BASIL/snpnet
29
30
multiSnpnet efficiently solves SRRR
BASIL-like iterative procedure
31
3 steps per iteration
1. Screening
2. Fitting (SVD & group lasso)
3. KKT Check
Qian, Tanigawa, et al. Ann Appl Stat (in press).
Variant prioritization w/ predicted consequence
does not help improving the performance
- Lasso penalty factor.
- Penalty factor = 0 → no regularization on the variable
- Protein-truncating and known pathogenic variants = 0.5
- Protein-altering and known likely-pathogenic variants = 0.75
32
Tanigawa, Qian, et al. medRxiv (2021)
Sex-specific genetic effects for testosterone
33
Emily Flynn
Flynn, Tanigawa, et al. EJHG (2021).
Improved genetic prediction of testosterone
levels with sex-specific PRS models
Sex-specific polygenic risk model for testosterone outperforms polygenic
risk scores that combine males and females
34
Flynn, Tanigawa, et al. EJHG (2021).

More Related Content

What's hot

Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis pptSKVA
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian MethodsCorey Chivers
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Basics of Graphpad prism
Basics of Graphpad prismBasics of Graphpad prism
Basics of Graphpad prismRaeed Altaee
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classificationManu Chandel
 
Nested case control study
Nested case control studyNested case control study
Nested case control studyPrayas Gautam
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 

What's hot (20)

Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis ppt
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Basics of Graphpad prism
Basics of Graphpad prismBasics of Graphpad prism
Basics of Graphpad prism
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
String.pptx
String.pptxString.pptx
String.pptx
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Part 2 Cox Regression
Part 2 Cox RegressionPart 2 Cox Regression
Part 2 Cox Regression
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classification
 
Similarity
SimilaritySimilarity
Similarity
 
Nested case control study
Nested case control studyNested case control study
Nested case control study
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 

Similar to Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab

2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...FOODCROPS
 
Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...ICRISAT
 
Swansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaSwansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaBen Pascoe
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103Farah Diba
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...John Blue
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Md Rahman
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Laurence Dawkins-Hall
 
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...Mandy Brown
 
Final From journal on website
Final From journal on websiteFinal From journal on website
Final From journal on websiteMichael Clawson
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleLaurence Dawkins-Hall
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13Jonathan Eisen
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED
 
A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...Laurence Dawkins-Hall
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 

Similar to Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab (20)

2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
 
Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...
 
Swansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaSwansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteria
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
 
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
 
Final From journal on website
Final From journal on websiteFinal From journal on website
Final From journal on website
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
QTL mapping
QTL mappingQTL mapping
QTL mapping
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5
 
A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 

More from Yosuke Tanigawa

Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Yosuke Tanigawa
 
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜Yosuke Tanigawa
 
20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa publicYosuke Tanigawa
 
20180715 海外大学院留学説明会
20180715 海外大学院留学説明会20180715 海外大学院留学説明会
20180715 海外大学院留学説明会Yosuke Tanigawa
 
Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Yosuke Tanigawa
 
20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料Yosuke Tanigawa
 
ゲノム科学への招待
ゲノム科学への招待ゲノム科学への招待
ゲノム科学への招待Yosuke Tanigawa
 
ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)Yosuke Tanigawa
 
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)Yosuke Tanigawa
 
生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)Yosuke Tanigawa
 

More from Yosuke Tanigawa (10)

Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)
 
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
 
20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public
 
20180715 海外大学院留学説明会
20180715 海外大学院留学説明会20180715 海外大学院留学説明会
20180715 海外大学院留学説明会
 
Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)
 
20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料
 
ゲノム科学への招待
ゲノム科学への招待ゲノム科学への招待
ゲノム科学への招待
 
ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)
 
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
 
生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)
 

Recently uploaded

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Recently uploaded (20)

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab

  • 1. Multi-trait modeling in polygenic scores Yosuke Tanigawa Postdoc @ Computational Biology Lab (PI: Prof. Manolis Kellis), MIT CSAIL 2022/1/28 (Fri.) 2:30 pm (ET) @ Zoom Debora Marks Lab Journal Club 1 @yk_tani https://yosuketanigawa.com/
  • 2. The main paper for journal club presentation 2 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021 Joint work w/ Nasa Sinnott-Armstrong
  • 3. Polygenic risk scores (PRSs) combine genetic associations across many variants - Large-scale cohorts enabled discovery of GWAS associations - Polygenic risk score (PGS) (or polygenic score [PGS]) “Inference” to “Prediction” 3 i-th individual j-th variant G: genotype β: effect size
  • 4. PRS predictions are sometimes useful - Difference in (overlapping) PRS distributions are sometimes useful for population stratification. - PRS can be used as instrument variable in causal inference 4 N. R. Wray et al., JAMA Psychiatry (2020); Sakaue*, Kanai*, et al., Nat Med (2020). PRS(biomarker) associations with lifespan
  • 5. PRS models often contain many variants - One challenge in PRS modeling is the LD structure - Bayesian regression with GWAS summary statistics + LD reference has been successful - Genome-wide polygenic risk score (Khera et al) with 6M+ variants - We don’t assume 6M causal variants for common complex traits 5 Khera, et al., Nat Gen (2018).
  • 6. Sparse regression model with Lasso - One alternative: regularized regression on individual-level data - e.g. Lasso - Challenge: dataset is large (n = 300k, p = 1M+) - Does not fit on memory, etc. - We developed Batch screening iterative Lasso (BASIL) - Efficient screening based on “strong rule” (Tibshirani et al 2012) - Solves Lasso via iterative procedure 6 Junyang Qian Qian, Tanigawa, et al. PLOS Gen. (2020).
  • 7. Batch screening iterative Lasso (BASIL) BASIL (= BAtch Screening Iterative Lasso) in R snpnet package 7 3 steps per iteration 1. Screening 2. Lasso Fit (glmnet) 3. KKT Check Qian, Tanigawa, et al. PLOS Gen. (2020).
  • 8. BASIL/snpnet model are sparse, yet have comparable predictive performance - The snpnet PRS models (Lasso & Elastic-Net) have comparable predictive performance with SBayesR - Standing height was one of the most polygenic traits. - Hight PRS model has 47k variants (5% of non-zero BETAs) 8 Qian, Tanigawa, et al. PLOS Gen. (2020).; Tanigawa, Qian, et al. medRxiv (2021) Hold-out test set R 2 Hold-out test set AUC
  • 9. Genetics of 35 biomarkers study in UK Biobank 9 349 rare (MAF < 1%) non-synonymous variant associations 1,381 (1,134 novel) associations on non-synonymous variants Cardiovascular Bone and Joint Diabetes Liver Hormone Renal Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 10. Genetics of 35 biomarkers study in UK Biobank 10 Cardiovascular Bone and Joint Diabetes Liver Hormone Renal Polygenic risk scores (PRSs) for 35 biomarkers Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 11. Polygenic risk scores (PRSs) for 35 biomarkers • Created 70% training/10% validation/ 20% test split for white British • Tested 4 additional UKB sub-populations of different ancestries • Limited trans-ethnic predictive performance of PRSs 11 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 12. Disease cases are enriched in PRS tails Take extreme in PRS for biomarkers Compare odds ratio for disease outcome relative to 40-60%ile bin Applied PheWAS for ~160 diseases 12 Lewis, C. M. & Vassos, E. Genome Medicine (2020). Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 13. Disease cases are enriched in PRS tails Take extreme in PRS for biomarkers Identify diseases with biomarker PRS associations Compare odds ratio for disease outcome relative to 40-60%ile bin Applied PheWAS for ~160 diseases 13 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 14. Multi-PRS - a linear combination of a disease PRS and biomarker PRSs - Multiple observations suggest “biomarkers → disease” links - PRS-PheWAS analysis - Biomarkers are more heritable than disease - Mendelian Randomization - Multi-PRS is a weighted sum of PRSs i.e. w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 14 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 15. Weights of multi-PRS comes from Lasso Multi-PRS: w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 15 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 16. multi-PRS improves disease prevalence prediction Chronic kidney disease (CKD) Other diseases in UK Biobank 16 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 17. multi-PRS models improves incident disease prediction in FinnGen The multi-PRS model is replicated in Finnish cohort (FinnGen) 17 Nina Mars Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 18. - Two complementary approaches to improve predictive performance - 1) Sample size → increase in power - 2) Multi-trait analysis - Why does multi-PRS work? - Quantitative traits have more power (J. Yang et al 2010) - Genetic correlation between biomarkers and disease - Phenotyping challenges in some disease phenotypes - When does multi-PRS work the best? - Exact conditions are not fully clear (yet) - The multi-phenotype model - multi-PRS: Genetics → Biomarkers (Molecular traits) → Disease - Alternatives (other models): - “Genetic component”-based model What we learned from multi-PRS? 18
  • 19. Extreme polygenicity & pleiotropy in the genetics of common complex traits 19 Genetic variants Complex traits - Polygenicity: many variants - one trait - Pleiotropy: one variant - many traits - Large number of associations in population-based cohorts - Can we group them together for enhanced interpretation?
  • 20. Decomposition of genetic associations (DeGAs) 20 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 21. Low-rank representation of association summary statistics provides latent components 1. Genome & phenome-wide association summary statistic matrix 2. Truncated-singular value decomposition (TSVD) 3. Quantify the variant & trait-loadings on each component “paint” the disease genetics with components! Summary statistics from association analysis (beta or log odds ratio) 21 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 22. Biplot annotation helps interpretation of DeGAs latent components 22 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 23. DeGAs is subsequently extended to PRS model - DeGAs-PRS (dPRS) - Derive “component”-score - Disease PRS as sum of component-score - It offers better interpretation 23 Aguirre, Tanigawa, et al. Eur J Hum Gen (2021).
  • 24. Sparse reduced-rank regression (SRRR) in multiSnpnet package bridge the all 1. BASIL/snpnet (Lasso) – sparse PRS models 2. multi-PRS – linear combination of snpnet PRSs w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 3. DeGAs-PRS – genetic component-based PRSs w1 (cPRS1 ) + w2 (cPRS2 ) + w3 (cPRS3 ) + … cPRS comes from tSVD of GWAS associations SRRR/multiSnpnet fits penalized multivariate multi-response model 24
  • 25. Sparse reduced-rank regression (SRRR) in multiSnpnet package bridge the two approaches 25 - One can show (1) and (2) are equivalent. Note: it’s NOT convex - Group lasso penalty - We select features that influence on multiple responses (traits) - DeGAs (tSVD)-based approach offers interpretation Qian, Tanigawa, et al. Ann Appl Stat (in press). (1) (2) Junyang Qian
  • 26. multiSnpnet/SRRR applied on UK Biobank - Asthma & clinically related traits - Predictive performance improvements for asthma & basophil count - SVD of the coefficients offer interpretation 26 Qian, Tanigawa, et al. Ann Appl Stat (in press).
  • 27. Summary & future directions Summary - Polygenic risk score models (PRSs) computes genetic liability of diseases by aggregating effects across multiple genetic variants - Sparse snpnet PRS models have competitive performance - Multi-trait aware PRS can improve the predictive power Future direction & discussion - Integrate with fine-mapping, conservation, variant, gene annotation? - Incorporate (cell-type-specific) biological knowledge as prior - It may help improving the predictive performance / transferability? - Machine-learning-based PRS models - Non-linear combination of multiple traits - Incorporate biological priors 27
  • 28. Acknowledgements Dept. Biomedical Data Science - Matthew Aguirre - Manuel A. Rivas - the Rivas lab Dept. Statistics - Junyang Qian - Trevor Hastie - Rob Tibshirani Dept. Genetics, Stanford - Nasa Sinnott-Armstrong - Jonathan Pritchard University of Helsinki - Nina Mars - Samuli Ripatti 28 Funding supports: Nasa Sinnott-Armstrong Junyang Qian
  • 29. References - Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. (2021). (PMID: 33462484) - Genetics of 35 biomarkers, multi-PRS - Qian, Tanigawa, et al. PLoS Gen. (2020). (PMID: 33095761) - Batch screening iterative Lasso (BASIL) & R snpnet package - Qian, Tanigawa, et al. Ann Appl Stat. (in press). (doi: 10.1101/2020.05.30.125252) - Sparse reduced rank regression (SRRR) & R multiSnpnet package - Tanigawa, Li, et al. Nat Comm (2019). (PMID: 31492854) - DeGAs - decomposition of genetic associations - Aguirre, Tanigawa, et al. Eur J Hum Genet. (2021). (PMID: 33558700) - DeGAs-PRS (dPRS) - Tanigawa, Qian, et al. medRxiv (2021) (doi: 10.1101/2021.09.02.21262942) - Phenome-wide application of BASIL/snpnet 29
  • 30. 30
  • 31. multiSnpnet efficiently solves SRRR BASIL-like iterative procedure 31 3 steps per iteration 1. Screening 2. Fitting (SVD & group lasso) 3. KKT Check Qian, Tanigawa, et al. Ann Appl Stat (in press).
  • 32. Variant prioritization w/ predicted consequence does not help improving the performance - Lasso penalty factor. - Penalty factor = 0 → no regularization on the variable - Protein-truncating and known pathogenic variants = 0.5 - Protein-altering and known likely-pathogenic variants = 0.75 32 Tanigawa, Qian, et al. medRxiv (2021)
  • 33. Sex-specific genetic effects for testosterone 33 Emily Flynn Flynn, Tanigawa, et al. EJHG (2021).
  • 34. Improved genetic prediction of testosterone levels with sex-specific PRS models Sex-specific polygenic risk model for testosterone outperforms polygenic risk scores that combine males and females 34 Flynn, Tanigawa, et al. EJHG (2021).