Network-based machine learning approach
for aggregating multi-modal data
So Yeon Kim
December 12, 2019
Advisor: Professor Kyung-Ah Sohn
1
Ph.D. Dissertation Defense
Multi-modality everywhere
* Images from Di Minin et al. Front. Environ. Sci. (2015); Sun et al. Adv Genet. (2016)2
Introduction
Integrating multi-modal data
• How to effectively integrate multi-modal data?
• We can aggregate multi-modal data to better represent the data
• Aggregate similar data into clusters
• Transform them into a high-level feature matrix
* Images from Wikipedia, http://blog.voicebase.com/big-data-aggregation3
Introduction
Key challenges
• Data heterogeneity
• Noisy data can be included in some views
• Inconsistent results (correlated data in one view may not stay
correlated under other views)
• Complex inter- and intra-relationships between data in multiple views
• Developing a model that is robust to noise and effectively handles
complexity and heterogeneity is the key!
4
Introduction
Network-based integrative approach
• Complex and heterogeneous information can be utilized in a network
* Image from twitter, OmicsNet (https://www.omicsnet.ca/)5
Introduction
Network clustering
• Goal: Given the similarity graph of affinity matrix W from the data, we
want to find a partition in the graph which corresponds to k clusters
6
Background
Social-tag network Image network Tagged-image network
Affinity matrix W Similarity graph 𝑮(𝑽, 𝑬, 𝑾)Data points
undirected weighted graph
𝒊
𝒋
𝑾 𝒊, 𝒋 = 𝒆𝒙𝒑 −
𝝆 𝒗𝒊, 𝒗𝒋
𝟐𝜶 𝟐
Spectral clustering
(graph partitioning algorithm)
A
B
𝒊
𝒋
0.1
0.1 3
3
2
2
2
4
Network-based pathway activity inference
• Genomic data can be represented in a network using pathway knowledge
Gene
Samples
𝑔1
𝑔2
𝑔3
…
𝑔 𝑘
𝑔1
𝑔2
𝑔3
𝑷 𝟏
Pathway 𝑷 𝟏
𝑷 𝟐
𝑷 𝟑
…
7
Background
Network-based pathway activity inference
• Pathway activity inference
• It transforms single genomic profile into pathway profile using
activity scoring measure
• Early methods simply summarized expression values of genes
within pathways
• e.g. PLAGE (Tomfor et al. Bioinformatics 2005), CORG (Lee et al. Plos Comput Biol
2008)
• Network-based pathway activity inference
• DART estimates relevance network topology-based pathway
activities (Jiao et al. Bioinformatics, 2011)
• DRW infers pathway activities using directed random walk-
based method (Liu et al. Bioinformatics, 2013)
8
Background
Pathways
Samples
𝑃1
𝑃2
𝑃3
…
𝑃𝑛
Gene
Samples
𝑔1
𝑔2
𝑔3
…
𝑔 𝑘
Pathway-based
gene-gene graph
Thesis goal
• Develop a general and flexible network-based integrative model to
effectively aggregate multi-modal data
• Focus on the algorithm that is robust to noise and is able to handle
complexity and data heterogeneity
• Facilitate an integrative analysis based on the multi-modal network
9
Thesis goal
Data
View 𝟏
View 𝟐
View 𝒏
.
.
.
.
.
.
Multi-view network clustering
.
.
.
.
.
.
Layer 𝟏
Layer 𝟐
Layer 𝒏
Data
Knowledge
(e.g. pathway)
Multi-layered network based pathway activity inference
Thesis outline
• Network-based multi-modal data aggregation techniques
10
Thesis outline
Thesis outline
1) Multi-view network clustering
▪ Experiment: Social-tagged landmark image clustering
2) Multi-layered network based pathway activity inference
▪ Experiment I: Pathway-driven integrative network analysis for a better cancer
prognosis
▪ Experiment II: Robust predictive model on the integrated pathway-based
gene-gene network
▪ Experiment III: Urologic cancer integrative analysis on the multi-layered gene-
gene network
11
Thesis outline
Multi-view network clustering
2017 IEEE International Conference on Image Processing (ICIP 2017)
12
Multi-view network clustering
Multi-view network clustering
Data
View 𝟏
View 𝟐
View 𝒏
.
.
. .
.
.
13
Multi-view network clustering
Experiment: Social-tagged landmark image clustering
• Apply a similarity based multi-view network clustering for social-
tagged landmark image clustering
Textual
feature
Visual
feature
Social-tag network
Image network
Multi-view network clustering
roma, italy, vatican,
church, cathedral
italy, pisa, tower,
campo, attraction
louvre, paris, france,
palace, architecture
wurzburg,
germany,
palace,
architecture
Tagged-image data
landmark
landmark
14
Multi-view network clustering
Similarity Network Fusion (SNF)
Spectral clustering
Performance comparison between different algorithms
Method αv αw λ ARI NMI
Min-disagreement 0.7 0.4 - 0.7318 0.8539
Pairwise co-reg. 0.2 0.7 0.2 0.7528 0.8666
Centroid co-reg. 0.2 1 0.1 0.7555 0.8734
Proposed method 0.3 0.2 - 0.7972 0.8968
15
Multi-view network clustering
Multi-view network analysis
• Robust to the noise when combining different types of network
Social-tag network clustering
Image network clustering
Multi-view network-based tagged-image clustering
“Wurzburg Residence Germany”Social-tag network Image network
Tagged-image
network
16
Multi-view network clustering
Discussion
• It is effective in capturing the structure of the multiple types of
networks for a challenging clustering problem in social media data
• Do not need to know the exact network structure of each view
• It is robust to noise when combining different types of data
• SNF assumes that network for each view shared same nodes, but
different representation
• Not scalable to large network
17
Multi-view network clustering
Multi-layered network based
pathway activity inference
BMC medical genomics (2018); Biology direct (2019); Bioinformatics (In preparation)
18
Multi-layered network based pathway activity inference
Pathway activity inference on multi-layered network
Gene
Samples
GeneGene
𝑮 𝟏
𝑮 𝟐
𝑮 𝟑
Pathways
Samples
𝑃1
𝑃2
𝑃3
…
𝑃𝑛
19
Multi-layered network based pathway activity inference
Pathway-based multi-layered gene-gene graph
Directed unweighted graph 𝐺(𝑉, 𝐸, 𝑋)
Integrative directed random walk-based pathway activity
inference (iDRW) on multi-layered network
Gene
Samples
GeneGene
𝑮 𝟏
𝑮 𝟐
𝑮 𝟑
Pathways
Samples
𝑃1
𝑃2
𝑃3
…
𝑃𝑛
𝑊0 = −𝑙𝑜𝑔(𝑝 𝑣 + 𝜖)
𝑊𝑡+1 = 1 − 𝑟 𝑀 𝑇
𝑊𝑡 + 𝑟𝑊0
Random Walk with Restart (RWR) on directed graph
Random walker
𝑃𝑖 =
σ 𝑘=1
𝑛𝑖
𝑊 𝑣 𝑘 × 𝑠𝑔𝑛 𝑧 𝑘 × 𝑥 𝑘
σ 𝑘=1
𝑛𝑖
𝑊 𝑣 𝑘
2
𝒗 𝒌
For 𝑛𝑖 significant genes within pathway,
𝑝 𝑣 < 0.05
20
Multi-layered network based pathway activity inference
𝒙 𝒌
Experiment I
• Investigate the causal relationships between gene expression and
DNA methylation on the pathway-based gene-gene graph in breast
cancer data
21
Multi-layered network based pathway activity inference
meanAUC
meanAccuracy(%)
iDRW-based method
improved performance
than baseline methods
22
Multi-layered network based pathway activity inference
iDRW identifies breast cancer-related pathways and genes
Pathway name Freq Total EXP Meth
Dorso-ventral axis formation 10/50 27 4 0
Pancreatic secretion 8/50 65 26 3
Neurotrophin signaling pathway 7/50 90 47 3
Prion diseases 7/50 30 12 0
One carbon pool by folate 5/50 33 6 1
alpha-Linolenic acid metabolism 5/50 23 8 1
Pyruvate metabolism 5/50 96 7 1
PPAR signaling pathway 5/50 61 13 1
T cell receptor signaling pathway 5/50 85 52 8
Focal adhesion 5/50 148 83 11
Ribosome 5/50 143 1 0
Glioma 5/50 52 27 0
Circadian rhythm – fly 5/50 8 4 1
Tropane, piperidine and pyridine
alkaloid biosynthesis
5/50 26 1 0
23
Multi-layered network based pathway activity inference
Experiment II
• Investigate the effectiveness of iDRW considering the interactions
between gene expression and copy number variation in breast cancer
and neuroblastoma datasets
24
Multi-layered network based pathway activity inference
Breast cancer Neuroblastoma
iDRW improved survival group
classification performance
than benchmark methods in
both cancer datasets
25
Multi-layered network based pathway activity inference
iDRW showed robust predictive power to the number of
pathway features (k) and samples (n)
26
Multi-layered network based pathway activity inference
iDRW identifies cancer-associated pathways and genes
Dataset Pathway name Total EXP CNA
Breast
cancer
(k = 25)
Olfactory transduction 419 54 268
Ras signaling pathway 232 68 164
Rap1 signaling pathway 206 64 142
Melanogenesis 101 37 73
Neurotrophin signaling pathway 119 38 84
Pathways in cancer 526 166 359
AGE-RAGE signaling pathway in diabetic
complications
99 37 67
Tight junction 170 53 107
Focal adhesion 199 76 125
Neuroactive ligand-receptor interaction 278 64 193
Hepatocellular carcinoma 168 56 112
Calcium signaling pathway 182 59 136
cAMP signaling pathway 198 58 139
Neuroblasto
ma
(k = 5)
Bile secretion 71 13 5
Alcoholism 180 22 7
Metabolic pathways 1273 43 93
Neuroactive ligand-receptor interaction 278 21 24
PI3K-Akt signaling pathway 352 19 31 Gene expression CNV
27
Multi-layered network based pathway activity inference
Experiment III
• TCGA urologic cancer integrative analysis
Directed random walks with restart
on multi-layered graph
Random walker
Pathways
Samples
Genomic profile
Pathway profile
Survival
Time
Survival prediction
Gene expression CNV
Methylation
Between-layer edges
Top-k
pathway
features
𝑷 𝟏
𝑷 𝟐
𝑷 𝟑
…
𝑷 𝒏
𝑷 𝟏
𝑷 𝟐
𝑷 𝟑
Pathways with
significant genes
(p < 0.05)
Samples
Pathway activity inference
Primary
Tumor
Regional
metastasis
Metastasis prediction
M0
M+
N0 samples
T N M
Distant
metastasis
GeneGeneGene
Gene expression
Methylation
CNV
Within-layer edges
28
Multi-layered network based pathway activity inference
iDRW contributes to a better
cancer survival prediction in
both cancer datasets
Model BLCA KIRC
CORG(E) 0.6473 0.7841
PLAGE(E) 0.6651 0.7930
DART(E) 0.7012 0.8048
DRW(E) 0.7400 0.8199
DRW(C) 0.7213 0.7918
DRW(M) 0.7208 0.8141
iDRW(CM) 0.7430 0.8253
iDRW(EC) 0.7414 0.8254
iDRW(EM) 0.7488 0.8251
iDRW(ECM) 0.7518 0.8321
Median C-index
29
Multi-layered network based pathway activity inference
Model Model
iDRW contributes to a better
cancer metastasis prediction in
both cancer datasets
Model BLCA KIRC
Regional
Metastasis
Any
Metastasis
Any
metastasis
Distant
Metastasis
any T / N+ / M0 any T / N+ / M1 any T / N0 / M1
CORG(E) 0.8531 0.8553 0.9445 0.8911
PLAGE(E) 0.8555 0.8903 0.9509 0.8684
DART(E) 0.8993 0.9122 0.9515 0.8802
DRW(E) 0.9291 0.9315 0.9615 0.9372
DRW(C) 0.8887 0.9315 0.9292 0.9180
DRW(M) 0.9007 0.9167 0.9707 0.9610
iDRW(CM) 0.9207 0.9401 0.9650 0.9658
iDRW(EC) 0.9278 0.9309 0.9703 0.9442
iDRW(EM) 0.9140 0.9344 0.9689 0.9531
iDRW(ECM) 0.9279 0.9472 0.9686 0.9621
Area under the Precision-Recall curves
30
Multi-layered network based pathway activity inference
Data Model Pathway name Freq. Total Exp. CNV Meth.
BLCA
Survival
Phototransduction 88 28 4 1 0
Necroptosis 73 164 20 19 15
Intestinal immune network for IgA
production
66 49 5 2 1
Neuroactive ligand-receptor interaction 55 278 33 15 29
Regional
metastasis
(any T / N+ / M0)
Olfactory transduction 78 419 25 39 7
Purine metabolism 67 175 15 13 16
Inflammatory bowel disease (IBD) 65 65 4 5 8
Leishmaniasis 64 74 6 6 9
Apoptosis 63 138 17 10 17
MicroRNAs in cancer 61 299 30 25 12
Complement and coagulation cascades 60 79 20 5 3
Amyotrophic lateral sclerosis (ALS) 60 51 4 3 3
Cholesterol metabolism 57 50 4 3 5
Bile secretion 56 71 8 11 6
Glycosaminoglycan degradation 53 19 4 2 2
PI3K-Akt signaling pathway 53 352 64 33 26
Endocytosis 52 244 39 20 34
Serotonergic synapse 51 113 10 12 6
Hepatitis C 51 131 15 5 13
Pantothenate and CoA biosynthesis 50 19 2 5 4
Cytokine-cytokine receptor interaction 50 270 37 25 11
Toxoplasmosis 50 113 13 9 14
Any metastasis
(any T / N+ / M1)
Olfactory transduction 59 419 37 48 14
Fat digestion and absorption 53 41 3 0 6
KIRC
Distant
metastasis
(any T / N0 / M1)
Fatty acid biosynthesis 80 13 0 3 3
Biotin metabolism 69 3 0 0 1
Caffeine metabolism 60 5 0 0 1
Glycosaminoglycan biosynthesis -
chondroitin sulfate / dermatan sulfate
57 20 0 3 0
Phenylalanine, tyrosine and tryptophan
biosynthesis
56 5 0 0 1
iDRW prioritizes risk-pathways
and genes associated with
cancer survival or metastasis
31
Multi-layered network based pathway activity inference
iDRW facilitates the integrative gene-gene network
analysis
32
Multi-layered network based pathway activity inference
Discussion
• Multi-layered gene-gene network based pathway activity inference to
transform multiple genomic profiles into a single pathway profile
• Showed the effectiveness of iDRW on various experimental settings
for several types of cancer data
• Contribute to an improved outcome prediction performance
• Jointly identify cancer-associated pathways and genes
• Facilitate integrative network analysis on the multi-omics network
33
Multi-layered network based pathway activity inference
Conclusion
▪ Network-based integrative approaches for aggregating multi-modal data
• Multi-view network clustering
• Multi-layered network-based pathway activity inference
▪ Thesis contributions
• Effectively aggregates heterogeneous information by utilizing the interactions
between different modalities of data based on the network
• Facilitate the integrated network analysis as they represent multi-modal data on the
integrated network
• Generally applicable to any numbers and types of data in various domains
34
Conclusion
Conclusion
▪ Future directions
• Can be applied to the multi-modal data which are represented by the network
in other domain
• The hybrid approach which aggregates multi-modal data into clusters and
generates a new input matrix which uses clusters as features
• Multi-modal data network which is scalable to larger network and considers
different types of modalities
35
Conclusion
Publications
• “Multi-view network-based social-tagged landmark image clustering”
So Yeon Kim and Kyung-Ah Sohn. In proceedings of ICIP 2017
• “Integrative Pathway based Survival Prediction utilizing Interaction between Gene Expression and
DNA Methylation in Breast Cancer” So Yeon Kim, Tae Rim Kim, Hyun-hwan Jeong, Kyung-Ah Sohn.
BMC Medical Genomics 2018 (presented at TBC 2017)
• “Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival
Prediction in Multiple Cancer Studies” So Yeon Kim, Hyun-Hwan Jeong, Jaesik Kim, Jeong-Hyeon
Moon, Kyung-Ah Sohn. Biology Direct 2019 (presented at CAMDA 2018 – ISMB COSI track)
• “iDRW: integrative directed random walks on multi-layered gene-gene graph to infer pathway
activity for outcome prediction in urologic cancer” So Yeon Kim, Eun Kyung Choe, Manu
Shivakumar, Dokyoon Kim, Kyung-Ah Sohn. Bioinformatics (in preparation) (presented at MABC
2019 / ASHG 2019)
Conclusion
36
Thank you !
37
Multi-view network analysis
Abbey of Saint Gall
Louvre pyramid, Paris
Canadian National Vimy Memorial, France
George Washington
Birthplace, Virginia
Hosios Loukas
Monastery, Greece
Casas Grandes Chihuahua,
Mexico
Tagged-image network
38
Multi-view network clustering
Incorporating pathway
information showed better
survival group classification
performance
meanAUC
meanAccuracy(%)
39
Multi-layered network based pathway activity inference
iDRW shows distinctive pathway activity patterns across
cancers
40
Multi-layered network based pathway activity inference

Network-based machine learning approach for aggregating multi-modal data

  • 1.
    Network-based machine learningapproach for aggregating multi-modal data So Yeon Kim December 12, 2019 Advisor: Professor Kyung-Ah Sohn 1 Ph.D. Dissertation Defense
  • 2.
    Multi-modality everywhere * Imagesfrom Di Minin et al. Front. Environ. Sci. (2015); Sun et al. Adv Genet. (2016)2 Introduction
  • 3.
    Integrating multi-modal data •How to effectively integrate multi-modal data? • We can aggregate multi-modal data to better represent the data • Aggregate similar data into clusters • Transform them into a high-level feature matrix * Images from Wikipedia, http://blog.voicebase.com/big-data-aggregation3 Introduction
  • 4.
    Key challenges • Dataheterogeneity • Noisy data can be included in some views • Inconsistent results (correlated data in one view may not stay correlated under other views) • Complex inter- and intra-relationships between data in multiple views • Developing a model that is robust to noise and effectively handles complexity and heterogeneity is the key! 4 Introduction
  • 5.
    Network-based integrative approach •Complex and heterogeneous information can be utilized in a network * Image from twitter, OmicsNet (https://www.omicsnet.ca/)5 Introduction
  • 6.
    Network clustering • Goal:Given the similarity graph of affinity matrix W from the data, we want to find a partition in the graph which corresponds to k clusters 6 Background Social-tag network Image network Tagged-image network Affinity matrix W Similarity graph 𝑮(𝑽, 𝑬, 𝑾)Data points undirected weighted graph 𝒊 𝒋 𝑾 𝒊, 𝒋 = 𝒆𝒙𝒑 − 𝝆 𝒗𝒊, 𝒗𝒋 𝟐𝜶 𝟐 Spectral clustering (graph partitioning algorithm) A B 𝒊 𝒋 0.1 0.1 3 3 2 2 2 4
  • 7.
    Network-based pathway activityinference • Genomic data can be represented in a network using pathway knowledge Gene Samples 𝑔1 𝑔2 𝑔3 … 𝑔 𝑘 𝑔1 𝑔2 𝑔3 𝑷 𝟏 Pathway 𝑷 𝟏 𝑷 𝟐 𝑷 𝟑 … 7 Background
  • 8.
    Network-based pathway activityinference • Pathway activity inference • It transforms single genomic profile into pathway profile using activity scoring measure • Early methods simply summarized expression values of genes within pathways • e.g. PLAGE (Tomfor et al. Bioinformatics 2005), CORG (Lee et al. Plos Comput Biol 2008) • Network-based pathway activity inference • DART estimates relevance network topology-based pathway activities (Jiao et al. Bioinformatics, 2011) • DRW infers pathway activities using directed random walk- based method (Liu et al. Bioinformatics, 2013) 8 Background Pathways Samples 𝑃1 𝑃2 𝑃3 … 𝑃𝑛 Gene Samples 𝑔1 𝑔2 𝑔3 … 𝑔 𝑘 Pathway-based gene-gene graph
  • 9.
    Thesis goal • Developa general and flexible network-based integrative model to effectively aggregate multi-modal data • Focus on the algorithm that is robust to noise and is able to handle complexity and data heterogeneity • Facilitate an integrative analysis based on the multi-modal network 9 Thesis goal
  • 10.
    Data View 𝟏 View 𝟐 View𝒏 . . . . . . Multi-view network clustering . . . . . . Layer 𝟏 Layer 𝟐 Layer 𝒏 Data Knowledge (e.g. pathway) Multi-layered network based pathway activity inference Thesis outline • Network-based multi-modal data aggregation techniques 10 Thesis outline
  • 11.
    Thesis outline 1) Multi-viewnetwork clustering ▪ Experiment: Social-tagged landmark image clustering 2) Multi-layered network based pathway activity inference ▪ Experiment I: Pathway-driven integrative network analysis for a better cancer prognosis ▪ Experiment II: Robust predictive model on the integrated pathway-based gene-gene network ▪ Experiment III: Urologic cancer integrative analysis on the multi-layered gene- gene network 11 Thesis outline
  • 12.
    Multi-view network clustering 2017IEEE International Conference on Image Processing (ICIP 2017) 12 Multi-view network clustering
  • 13.
    Multi-view network clustering Data View𝟏 View 𝟐 View 𝒏 . . . . . . 13 Multi-view network clustering
  • 14.
    Experiment: Social-tagged landmarkimage clustering • Apply a similarity based multi-view network clustering for social- tagged landmark image clustering Textual feature Visual feature Social-tag network Image network Multi-view network clustering roma, italy, vatican, church, cathedral italy, pisa, tower, campo, attraction louvre, paris, france, palace, architecture wurzburg, germany, palace, architecture Tagged-image data landmark landmark 14 Multi-view network clustering Similarity Network Fusion (SNF) Spectral clustering
  • 15.
    Performance comparison betweendifferent algorithms Method αv αw λ ARI NMI Min-disagreement 0.7 0.4 - 0.7318 0.8539 Pairwise co-reg. 0.2 0.7 0.2 0.7528 0.8666 Centroid co-reg. 0.2 1 0.1 0.7555 0.8734 Proposed method 0.3 0.2 - 0.7972 0.8968 15 Multi-view network clustering
  • 16.
    Multi-view network analysis •Robust to the noise when combining different types of network Social-tag network clustering Image network clustering Multi-view network-based tagged-image clustering “Wurzburg Residence Germany”Social-tag network Image network Tagged-image network 16 Multi-view network clustering
  • 17.
    Discussion • It iseffective in capturing the structure of the multiple types of networks for a challenging clustering problem in social media data • Do not need to know the exact network structure of each view • It is robust to noise when combining different types of data • SNF assumes that network for each view shared same nodes, but different representation • Not scalable to large network 17 Multi-view network clustering
  • 18.
    Multi-layered network based pathwayactivity inference BMC medical genomics (2018); Biology direct (2019); Bioinformatics (In preparation) 18 Multi-layered network based pathway activity inference
  • 19.
    Pathway activity inferenceon multi-layered network Gene Samples GeneGene 𝑮 𝟏 𝑮 𝟐 𝑮 𝟑 Pathways Samples 𝑃1 𝑃2 𝑃3 … 𝑃𝑛 19 Multi-layered network based pathway activity inference Pathway-based multi-layered gene-gene graph Directed unweighted graph 𝐺(𝑉, 𝐸, 𝑋)
  • 20.
    Integrative directed randomwalk-based pathway activity inference (iDRW) on multi-layered network Gene Samples GeneGene 𝑮 𝟏 𝑮 𝟐 𝑮 𝟑 Pathways Samples 𝑃1 𝑃2 𝑃3 … 𝑃𝑛 𝑊0 = −𝑙𝑜𝑔(𝑝 𝑣 + 𝜖) 𝑊𝑡+1 = 1 − 𝑟 𝑀 𝑇 𝑊𝑡 + 𝑟𝑊0 Random Walk with Restart (RWR) on directed graph Random walker 𝑃𝑖 = σ 𝑘=1 𝑛𝑖 𝑊 𝑣 𝑘 × 𝑠𝑔𝑛 𝑧 𝑘 × 𝑥 𝑘 σ 𝑘=1 𝑛𝑖 𝑊 𝑣 𝑘 2 𝒗 𝒌 For 𝑛𝑖 significant genes within pathway, 𝑝 𝑣 < 0.05 20 Multi-layered network based pathway activity inference 𝒙 𝒌
  • 21.
    Experiment I • Investigatethe causal relationships between gene expression and DNA methylation on the pathway-based gene-gene graph in breast cancer data 21 Multi-layered network based pathway activity inference
  • 22.
    meanAUC meanAccuracy(%) iDRW-based method improved performance thanbaseline methods 22 Multi-layered network based pathway activity inference
  • 23.
    iDRW identifies breastcancer-related pathways and genes Pathway name Freq Total EXP Meth Dorso-ventral axis formation 10/50 27 4 0 Pancreatic secretion 8/50 65 26 3 Neurotrophin signaling pathway 7/50 90 47 3 Prion diseases 7/50 30 12 0 One carbon pool by folate 5/50 33 6 1 alpha-Linolenic acid metabolism 5/50 23 8 1 Pyruvate metabolism 5/50 96 7 1 PPAR signaling pathway 5/50 61 13 1 T cell receptor signaling pathway 5/50 85 52 8 Focal adhesion 5/50 148 83 11 Ribosome 5/50 143 1 0 Glioma 5/50 52 27 0 Circadian rhythm – fly 5/50 8 4 1 Tropane, piperidine and pyridine alkaloid biosynthesis 5/50 26 1 0 23 Multi-layered network based pathway activity inference
  • 24.
    Experiment II • Investigatethe effectiveness of iDRW considering the interactions between gene expression and copy number variation in breast cancer and neuroblastoma datasets 24 Multi-layered network based pathway activity inference
  • 25.
    Breast cancer Neuroblastoma iDRWimproved survival group classification performance than benchmark methods in both cancer datasets 25 Multi-layered network based pathway activity inference
  • 26.
    iDRW showed robustpredictive power to the number of pathway features (k) and samples (n) 26 Multi-layered network based pathway activity inference
  • 27.
    iDRW identifies cancer-associatedpathways and genes Dataset Pathway name Total EXP CNA Breast cancer (k = 25) Olfactory transduction 419 54 268 Ras signaling pathway 232 68 164 Rap1 signaling pathway 206 64 142 Melanogenesis 101 37 73 Neurotrophin signaling pathway 119 38 84 Pathways in cancer 526 166 359 AGE-RAGE signaling pathway in diabetic complications 99 37 67 Tight junction 170 53 107 Focal adhesion 199 76 125 Neuroactive ligand-receptor interaction 278 64 193 Hepatocellular carcinoma 168 56 112 Calcium signaling pathway 182 59 136 cAMP signaling pathway 198 58 139 Neuroblasto ma (k = 5) Bile secretion 71 13 5 Alcoholism 180 22 7 Metabolic pathways 1273 43 93 Neuroactive ligand-receptor interaction 278 21 24 PI3K-Akt signaling pathway 352 19 31 Gene expression CNV 27 Multi-layered network based pathway activity inference
  • 28.
    Experiment III • TCGAurologic cancer integrative analysis Directed random walks with restart on multi-layered graph Random walker Pathways Samples Genomic profile Pathway profile Survival Time Survival prediction Gene expression CNV Methylation Between-layer edges Top-k pathway features 𝑷 𝟏 𝑷 𝟐 𝑷 𝟑 … 𝑷 𝒏 𝑷 𝟏 𝑷 𝟐 𝑷 𝟑 Pathways with significant genes (p < 0.05) Samples Pathway activity inference Primary Tumor Regional metastasis Metastasis prediction M0 M+ N0 samples T N M Distant metastasis GeneGeneGene Gene expression Methylation CNV Within-layer edges 28 Multi-layered network based pathway activity inference
  • 29.
    iDRW contributes toa better cancer survival prediction in both cancer datasets Model BLCA KIRC CORG(E) 0.6473 0.7841 PLAGE(E) 0.6651 0.7930 DART(E) 0.7012 0.8048 DRW(E) 0.7400 0.8199 DRW(C) 0.7213 0.7918 DRW(M) 0.7208 0.8141 iDRW(CM) 0.7430 0.8253 iDRW(EC) 0.7414 0.8254 iDRW(EM) 0.7488 0.8251 iDRW(ECM) 0.7518 0.8321 Median C-index 29 Multi-layered network based pathway activity inference Model Model
  • 30.
    iDRW contributes toa better cancer metastasis prediction in both cancer datasets Model BLCA KIRC Regional Metastasis Any Metastasis Any metastasis Distant Metastasis any T / N+ / M0 any T / N+ / M1 any T / N0 / M1 CORG(E) 0.8531 0.8553 0.9445 0.8911 PLAGE(E) 0.8555 0.8903 0.9509 0.8684 DART(E) 0.8993 0.9122 0.9515 0.8802 DRW(E) 0.9291 0.9315 0.9615 0.9372 DRW(C) 0.8887 0.9315 0.9292 0.9180 DRW(M) 0.9007 0.9167 0.9707 0.9610 iDRW(CM) 0.9207 0.9401 0.9650 0.9658 iDRW(EC) 0.9278 0.9309 0.9703 0.9442 iDRW(EM) 0.9140 0.9344 0.9689 0.9531 iDRW(ECM) 0.9279 0.9472 0.9686 0.9621 Area under the Precision-Recall curves 30 Multi-layered network based pathway activity inference
  • 31.
    Data Model Pathwayname Freq. Total Exp. CNV Meth. BLCA Survival Phototransduction 88 28 4 1 0 Necroptosis 73 164 20 19 15 Intestinal immune network for IgA production 66 49 5 2 1 Neuroactive ligand-receptor interaction 55 278 33 15 29 Regional metastasis (any T / N+ / M0) Olfactory transduction 78 419 25 39 7 Purine metabolism 67 175 15 13 16 Inflammatory bowel disease (IBD) 65 65 4 5 8 Leishmaniasis 64 74 6 6 9 Apoptosis 63 138 17 10 17 MicroRNAs in cancer 61 299 30 25 12 Complement and coagulation cascades 60 79 20 5 3 Amyotrophic lateral sclerosis (ALS) 60 51 4 3 3 Cholesterol metabolism 57 50 4 3 5 Bile secretion 56 71 8 11 6 Glycosaminoglycan degradation 53 19 4 2 2 PI3K-Akt signaling pathway 53 352 64 33 26 Endocytosis 52 244 39 20 34 Serotonergic synapse 51 113 10 12 6 Hepatitis C 51 131 15 5 13 Pantothenate and CoA biosynthesis 50 19 2 5 4 Cytokine-cytokine receptor interaction 50 270 37 25 11 Toxoplasmosis 50 113 13 9 14 Any metastasis (any T / N+ / M1) Olfactory transduction 59 419 37 48 14 Fat digestion and absorption 53 41 3 0 6 KIRC Distant metastasis (any T / N0 / M1) Fatty acid biosynthesis 80 13 0 3 3 Biotin metabolism 69 3 0 0 1 Caffeine metabolism 60 5 0 0 1 Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate 57 20 0 3 0 Phenylalanine, tyrosine and tryptophan biosynthesis 56 5 0 0 1 iDRW prioritizes risk-pathways and genes associated with cancer survival or metastasis 31 Multi-layered network based pathway activity inference
  • 32.
    iDRW facilitates theintegrative gene-gene network analysis 32 Multi-layered network based pathway activity inference
  • 33.
    Discussion • Multi-layered gene-genenetwork based pathway activity inference to transform multiple genomic profiles into a single pathway profile • Showed the effectiveness of iDRW on various experimental settings for several types of cancer data • Contribute to an improved outcome prediction performance • Jointly identify cancer-associated pathways and genes • Facilitate integrative network analysis on the multi-omics network 33 Multi-layered network based pathway activity inference
  • 34.
    Conclusion ▪ Network-based integrativeapproaches for aggregating multi-modal data • Multi-view network clustering • Multi-layered network-based pathway activity inference ▪ Thesis contributions • Effectively aggregates heterogeneous information by utilizing the interactions between different modalities of data based on the network • Facilitate the integrated network analysis as they represent multi-modal data on the integrated network • Generally applicable to any numbers and types of data in various domains 34 Conclusion
  • 35.
    Conclusion ▪ Future directions •Can be applied to the multi-modal data which are represented by the network in other domain • The hybrid approach which aggregates multi-modal data into clusters and generates a new input matrix which uses clusters as features • Multi-modal data network which is scalable to larger network and considers different types of modalities 35 Conclusion
  • 36.
    Publications • “Multi-view network-basedsocial-tagged landmark image clustering” So Yeon Kim and Kyung-Ah Sohn. In proceedings of ICIP 2017 • “Integrative Pathway based Survival Prediction utilizing Interaction between Gene Expression and DNA Methylation in Breast Cancer” So Yeon Kim, Tae Rim Kim, Hyun-hwan Jeong, Kyung-Ah Sohn. BMC Medical Genomics 2018 (presented at TBC 2017) • “Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival Prediction in Multiple Cancer Studies” So Yeon Kim, Hyun-Hwan Jeong, Jaesik Kim, Jeong-Hyeon Moon, Kyung-Ah Sohn. Biology Direct 2019 (presented at CAMDA 2018 – ISMB COSI track) • “iDRW: integrative directed random walks on multi-layered gene-gene graph to infer pathway activity for outcome prediction in urologic cancer” So Yeon Kim, Eun Kyung Choe, Manu Shivakumar, Dokyoon Kim, Kyung-Ah Sohn. Bioinformatics (in preparation) (presented at MABC 2019 / ASHG 2019) Conclusion 36
  • 37.
  • 38.
    Multi-view network analysis Abbeyof Saint Gall Louvre pyramid, Paris Canadian National Vimy Memorial, France George Washington Birthplace, Virginia Hosios Loukas Monastery, Greece Casas Grandes Chihuahua, Mexico Tagged-image network 38 Multi-view network clustering
  • 39.
    Incorporating pathway information showedbetter survival group classification performance meanAUC meanAccuracy(%) 39 Multi-layered network based pathway activity inference
  • 40.
    iDRW shows distinctivepathway activity patterns across cancers 40 Multi-layered network based pathway activity inference