Patient Similarity Networks for Precision Medicine
Thi Nguyen, Ph.D. Candidate
Graduate Biomedical Sciences | Immunology Theme
University of Alabama at Birmingham (UAB)
kimthi@uab.edu
Clinical Informatics Journal Club
October 233d, 2018
Outline
• Current landscape in building a predictive risk model
• Patient similarity network (PSN) – emerging paradigm for clinical prediction
• Advantages of PSN
• Examples of 2 PSN: Similarity network Fusion and netDx
• Challenges of PSN analytics
• Vision for PSN-based tool for future clinic
Disease risk calculator
http://www.cvriskcalculator.com/
• ASCVD calculator = 10-year risk of heart disease/stroke
• 13 pieces of information: gender, age, blood lipid levels,
blood pressure, history.
• result of 50 years development/ refinement
• continue to adjust
Risk calculator = set of risk factors -> calculate disease risks to help monitoring,
diagnosis and treatment.
Fig. 1. Developing risk calculators
Ideal model:
• accurate
• generalizable
• reasonable time
• interpretable by clinicians
Methods used in clinical risk models
Genomics in clinical risk models
The rise of genomic area
predictive risk models – current needs
• Integrate diverse data types (genomics, metabolomics, imaging, EHR ...)
• Interpretable
• Handle sparse/ missing data
• Maintain patient private information
• Scale up : keep pace with the scale and complexity of the data
Network science
• New scientific discipline, broadly interdisciplinary approach
to study complex systems
• Developed its formalism from graph theory and uses statistical
physics as conceptual framework.
• Key concept: Regardless of the domain knowledges (computer,
social, biological), all networks are driven by the same fundamental
organizing principles.
• Common set of mathematical tools to explore these systems.
http://networksciencebook.com/
by A.-L. Barabási.
Why network science for new predictive risk models?
• Handle heterogeneous data
• missing data is naturally handled
• easy visualization : when presented as network, grouping/ decision boundary
can be visualized
• Intuitive: Analogous to clinical diagnosis: Physicians relates a patient’s case to
previous patients they have seen : mental database
• PSN doesn’t use direct patient data -> patients privacy -> easier to scale up
• Many existing methods in network sciences allowing to integrate data = fuse
networks.
• NetDX : make use of biological pathway –based feature to improve accuracy
and generalization + increase interpretability of genome data.
Patient similarity networks
• each node = individual
• edge = pairwise similarity for a given feature
• Labelled patients can be grouped (clustering/ unsupervised classification) and
patient with unknown status can be assigned to a group based on their
similarity to a particular group.
• each feature (=view) is represented as a network of pairwise patient similarities
• views can be integrated/fused to identify subgroups / predict outcome.
Similarity Network Fusion
“Similarity network fusion for aggregating data types on a genomic scale.” Nature Methods 2014
1. Construct similarity network for each data type
2. Fuse these networks into a single network using nonlinear combination method
• Data types: mRNA, DNA methylation and miRNA
• Single value decomposition -> cosine similarity -> fuse network by iterative boosting
• This method has been applied to subtype medulloblastoma + pancreatic ductal
adenocarcinoma tumors + subtypes of diabetes.
Similarity Network Fusion
“Similarity network fusion for aggregating data types on a genomic scale.” Nature Methods 2014
node = patient
node size = survival
edge thickness = similarity
mRNA
miRNA
DNA meth SNF-combined
n = 215 patients with GBM
NetDX- a supervised patient classification framework
WORKFLOW
https://www.biorxiv.org/content/early/2018/05/25/084418
NetDX- a supervised patient classification framework
WORKFLOW
https://www.biorxiv.org/content/early/2018/05/25/084418
Network integration:
• use GeneMANIA - network integration algorithm, which reduces redundant networks,
give weights to networks according to their discriminatory power -> linear combination
-> composite network
Input data design:
• any kind of data, as long as the measure of patient similarity can be defined
(Pearson correlation, cosine similarity, normalized age difference)
• address the curse of omics data (too many features/ overfitting), they group
measurements in biological pathways (~2000) -> also increase interpretability.
Feature selection:
• cross-validation to measure sensitivity and specificity
Class prediction:
• patient is assigned to the class with the highest rank, where the patient is the
most similar.
NetDX to predict ependymoma suptypes
• microarray data + clinical data
• Pearson correlation = similarity
• regression to correct batch effects
• Lasso regression in cross validation to
prefilter genes
pathway-based design:
• genes were group into 2118 networks , one per pathway
• pathway info were aggregated from HumanCyc, IOB’s NetPath, Reactome, NCI
curated pathways, mSigDB and Panther.
Challenges for PSN analytics
• large data sizes (thousands of genomes)
• improve feature selections
• improve signal-to-noise ratio automatically
• characterize patient heterogeneity (disease subtypes)
• make best use of complex genomics layers (tissue-specific
variants)
• tuning parameters
• build on prior knowledge/ data, e.g. known gene-gene
interaction, epigenetic information.
Vision for network-based classification tool
for precision medicine
Conclusions
• Patient Similarity Network is an emerging method used to build predictive risk model
• Many advantages compared to other approaches: integrate heterogeneous data types,
tolerate missing data, maintaining patients privacy, and have good interpretability.
• Since it is a new paradigm, there are many challenges to implement
• Similarity network Fusion and NetDX are two frameworks that implemented PSN with
success
• Opportunities
Questions/ Thoughts/ Comments
• Can pairwise comparison capture all the complexity of gene expression in each
patient? Is it a valid question for PSN?
• To what extent should we reduce the dimensions to make sense of the data without
stripping it out of its important nuances?
• Does combining the networks (fusing them) smooth out/ preserve the heterogeneity
underlying the structure of each type of data?
• Does the PSN actually make the network/ grouping similar to the way a clinician
would do?
• Would there be data types that are not compatible to be integrated?

PSN for Precision Medicine

  • 1.
    Patient Similarity Networksfor Precision Medicine Thi Nguyen, Ph.D. Candidate Graduate Biomedical Sciences | Immunology Theme University of Alabama at Birmingham (UAB) kimthi@uab.edu Clinical Informatics Journal Club October 233d, 2018
  • 2.
    Outline • Current landscapein building a predictive risk model • Patient similarity network (PSN) – emerging paradigm for clinical prediction • Advantages of PSN • Examples of 2 PSN: Similarity network Fusion and netDx • Challenges of PSN analytics • Vision for PSN-based tool for future clinic
  • 3.
    Disease risk calculator http://www.cvriskcalculator.com/ •ASCVD calculator = 10-year risk of heart disease/stroke • 13 pieces of information: gender, age, blood lipid levels, blood pressure, history. • result of 50 years development/ refinement • continue to adjust Risk calculator = set of risk factors -> calculate disease risks to help monitoring, diagnosis and treatment.
  • 4.
    Fig. 1. Developingrisk calculators Ideal model: • accurate • generalizable • reasonable time • interpretable by clinicians
  • 5.
    Methods used inclinical risk models
  • 6.
    Genomics in clinicalrisk models The rise of genomic area
  • 7.
    predictive risk models– current needs • Integrate diverse data types (genomics, metabolomics, imaging, EHR ...) • Interpretable • Handle sparse/ missing data • Maintain patient private information • Scale up : keep pace with the scale and complexity of the data
  • 8.
    Network science • Newscientific discipline, broadly interdisciplinary approach to study complex systems • Developed its formalism from graph theory and uses statistical physics as conceptual framework. • Key concept: Regardless of the domain knowledges (computer, social, biological), all networks are driven by the same fundamental organizing principles. • Common set of mathematical tools to explore these systems. http://networksciencebook.com/ by A.-L. Barabási.
  • 9.
    Why network sciencefor new predictive risk models? • Handle heterogeneous data • missing data is naturally handled • easy visualization : when presented as network, grouping/ decision boundary can be visualized • Intuitive: Analogous to clinical diagnosis: Physicians relates a patient’s case to previous patients they have seen : mental database • PSN doesn’t use direct patient data -> patients privacy -> easier to scale up • Many existing methods in network sciences allowing to integrate data = fuse networks. • NetDX : make use of biological pathway –based feature to improve accuracy and generalization + increase interpretability of genome data.
  • 10.
    Patient similarity networks •each node = individual • edge = pairwise similarity for a given feature • Labelled patients can be grouped (clustering/ unsupervised classification) and patient with unknown status can be assigned to a group based on their similarity to a particular group. • each feature (=view) is represented as a network of pairwise patient similarities • views can be integrated/fused to identify subgroups / predict outcome.
  • 11.
    Similarity Network Fusion “Similaritynetwork fusion for aggregating data types on a genomic scale.” Nature Methods 2014 1. Construct similarity network for each data type 2. Fuse these networks into a single network using nonlinear combination method • Data types: mRNA, DNA methylation and miRNA • Single value decomposition -> cosine similarity -> fuse network by iterative boosting • This method has been applied to subtype medulloblastoma + pancreatic ductal adenocarcinoma tumors + subtypes of diabetes.
  • 12.
    Similarity Network Fusion “Similaritynetwork fusion for aggregating data types on a genomic scale.” Nature Methods 2014 node = patient node size = survival edge thickness = similarity mRNA miRNA DNA meth SNF-combined n = 215 patients with GBM
  • 13.
    NetDX- a supervisedpatient classification framework WORKFLOW https://www.biorxiv.org/content/early/2018/05/25/084418
  • 14.
    NetDX- a supervisedpatient classification framework WORKFLOW https://www.biorxiv.org/content/early/2018/05/25/084418 Network integration: • use GeneMANIA - network integration algorithm, which reduces redundant networks, give weights to networks according to their discriminatory power -> linear combination -> composite network Input data design: • any kind of data, as long as the measure of patient similarity can be defined (Pearson correlation, cosine similarity, normalized age difference) • address the curse of omics data (too many features/ overfitting), they group measurements in biological pathways (~2000) -> also increase interpretability. Feature selection: • cross-validation to measure sensitivity and specificity Class prediction: • patient is assigned to the class with the highest rank, where the patient is the most similar.
  • 15.
    NetDX to predictependymoma suptypes • microarray data + clinical data • Pearson correlation = similarity • regression to correct batch effects • Lasso regression in cross validation to prefilter genes pathway-based design: • genes were group into 2118 networks , one per pathway • pathway info were aggregated from HumanCyc, IOB’s NetPath, Reactome, NCI curated pathways, mSigDB and Panther.
  • 16.
    Challenges for PSNanalytics • large data sizes (thousands of genomes) • improve feature selections • improve signal-to-noise ratio automatically • characterize patient heterogeneity (disease subtypes) • make best use of complex genomics layers (tissue-specific variants) • tuning parameters • build on prior knowledge/ data, e.g. known gene-gene interaction, epigenetic information.
  • 17.
    Vision for network-basedclassification tool for precision medicine
  • 18.
    Conclusions • Patient SimilarityNetwork is an emerging method used to build predictive risk model • Many advantages compared to other approaches: integrate heterogeneous data types, tolerate missing data, maintaining patients privacy, and have good interpretability. • Since it is a new paradigm, there are many challenges to implement • Similarity network Fusion and NetDX are two frameworks that implemented PSN with success • Opportunities
  • 19.
    Questions/ Thoughts/ Comments •Can pairwise comparison capture all the complexity of gene expression in each patient? Is it a valid question for PSN? • To what extent should we reduce the dimensions to make sense of the data without stripping it out of its important nuances? • Does combining the networks (fusing them) smooth out/ preserve the heterogeneity underlying the structure of each type of data? • Does the PSN actually make the network/ grouping similar to the way a clinician would do? • Would there be data types that are not compatible to be integrated?