Eccb

Guideline Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markers
using Density-Constrained Biclustering

Presenters: Phuong Dao1 , Alexander Schonhuth2

1
School of Computing Science, Simon Fraser University
2
Algorithmic Computational Biology Group, CWI, Netherlands


Guideline

Introduction
Personalized Medicine
Biomarker Discovery

Methods
Motivations
Our approach

Experimental Results
Data
Classiﬁer Performance
Markers


Introduction

• Exact determination of disease status based on
patient genetics/genomics
• Goal: Speciﬁc, individual choice of treatment


Introduction

• Exact determination of disease status based on
patient genetics/genomics
• Goal: Speciﬁc, individual choice of treatment
• Necessary: Reliable disease markers


Biomarker Discovery

• Single gene markers: Each gene is ranked according to
their ability to distinguish samples of different classes
• Multigenic markers: Each subset S of genes is ranked
based on the aggregation ability of all genes in S to
distinguish samples of different classes


Single Gene Markers

Control 1

Control 2
Control 3
Case 1
Case 2

Case 3
Control 1

Control 2
Control 3
Case 1
Case 2

Case 3

Gene 1
Gene 3
Gene 1
Gene 2 Differentially Expressed
Gene 3
Gene 4 Gene 2
Gene 5 Gene 4
Gene 6 Gene 5
Gene 6
Non−Differentially Expressed


Multigenic Markers
Subnetwork Markers

[Chuang et al., Mol.Sys.Biol. (2007)]:
• Predicting progression of breast
cancer
• Subnetwork markers are
connected subnetworks with
aggregate expression proﬁles
correlates the most with the labels
of the samples
• Greedy heuristics for searching
for optimal subnetwork markers


Multigenic Markers
Subnetwork Markers

[Chowdhury et al., PSB 2010]:
• Predicting colon cancer subtypes
• Each marker is a small connected subnetwork N such that genes
in N cover all disease samples (gene g covers sample s if g is
differentially expressed in s)
• Greedy heuristics for searching for the smallest subnetwork
markers


Motivations
Heterogeneity of Cancer Genomes

• Cancer genomes evolve
(many cells in one
patient have different
genomes)
• No two cancer cells of
two different patients
are the same

[Hampton et al., Genome Research (2009)]


Motivations
Proximity of Disease Related Genes in PPI Network

[Goh et al., PNAS (2007)]:
• The protein products of genes related to the same disease tend to
interact with one another
• Genes related to a disease have coherent functions with respect to the
Gene Ontology hierarchy


Our Approach

Each of our subnetwork markers:
• includes genes that have higher interaction among
them than expected (densely connected
subnetworks)
• contains differentially expressed genes in a fraction of
all the samples from cancer tissues (partially
differential expression)


Methods


Densely Connected Subnetworks
Properties

Let G = (V , E) be a network with edge weights we , e ∈ E.
• The density θ(G) of G is

e∈E we
θ(G) := |V |
2

|V |
where 2 is the number of possible edges in G.
• G is called α-dense if
θ(G) ≥ α.
• An α-dense, connected network G is called α-densely
connected.


Partially Differential Expression

S1

S2

S3
G1
0.95
0.6 0.8 0.95
0.85 G1 1 1 0
0.9 0.75 G3
0.45
0.85
G2 1 1 1
G2
0.75 0.25 0.9 0.8 G3 1 1 0
0.7 0.9 0.9 G4 1 1 1
0.55 0.5 0.95
G4
0.8 0.85
0.95 0.75
0.95
0.35 0.65 G4
0.45 0.8
0.9

S1

S2

S3
0.75 0.8
0.9 0.7
0.3 0.8
0.9 0.9 0.7 G4 1 1 1
0.65 0.85
0.8 0.9 0.95
G5 G6 G5 0 1 1
0.75
G6 0 1 1
0.85 0 1 1
0.95 G7
G7

Compute all densely connected subnetworks whose genes are differentially
expressed in a subset of patients of size at least k (here: k = 2).


Density Constrained Biclustering
Search Strategy

Theorem: Let α ≥ 0.5. Every α-densely connected network of size n
contains an α-densely connected subnetwork of size n − 1.

0.4 A 0.6 A 0.9 A C 0.8 D C
B C D B B D

C
0.6 A 0.6 A 0.9 A 0.8 D
0.4 0.6
B A C 0.4 C 0.9 D 0.4 B
0.9 B D 0.8 B C
0.8
D

Density: 0.45
= [(0.8 + 0.9 + 0.6 + 0.4) / 6] C
Not Dense wDCB
0.4 0.6
B A
0.9
0.8
Not Connected D maximal wDCB

Figure: Toy example for computation of densely connected subnetworks,
density threshold θ = 0.5.


Classifier Construction
G4
G1
0.95 0.9
0.85 0.7
0.75 G3 G5
1. Rank density constrained G2 G6
biclusters according to density 0.8
0.9
0.85
significance G4 0.95
G7
2. Keep only high-ranked
Gene 1 1.25
subnetworks with little overlap Gene 2 1.5
3. Feature space dimension = Gene 3 1.0
Marker 1 1.25
Gene 4 1.25 Average
number of markers Gene 5 0.5
Marker 2 0.5

4. SVM classification Gene 6 0.0
Gene 7 0.25

Gene Expression Profile Average Gene Expression Profile


Experimental Results


Network Data

Confidence-scored PPI network
[STRING, von Mering et al., NAR 2009]

• Edges reflect physical
protein-protein interactions
• Confidence scores reflect the
probability that the interaction is 0.95
0.6 0.8
0.9
associated with a cellular 0.45

0.75
0.85
0.9
0.25 0.9
0.7
phenomenon (and not an 0.8 0.55
0.95
0.5 0.95
0.75
0.85
0.95
experimental artifact) 0.45
0.35 0.65
0.8
0.75 0.8
0.9
0.9 0.7
0.3 0.8

• Scoring system based on KEGG 0.65
0.75 0.8
0.9

0.9
0.85
0.95

pathways


Gene Expression Data

Three experiments on colon cancer

• GSE8671, 32 patients / tissue pairs
• GSE10950, 24 patients / tissue pairs
• GSE6988, 123 samples across several cancer subtypes

One experiment on breast cancer

• GSE3494, 251 patients with different mutation status (wildtype vs.
mutant)


GSE 8671 −→ GSE 10950

GSE8671 >> GSE10950
1

0.95

0.9
AUC

0.85

0.8 SGM
GMI
NETCOVER
wDCB
0.75
0 5 10 15 20 25 30 35 40 45 50
# Subnetworks/Genes


GSE 8671 −→ GSE 6988 - Colon Cancer

GSE8671 >> GSE6988
1

0.95

0.9

0.85
AUC

0.8

0.75

0.7 SGM
GMI
0.65 NETCOVER
wDCB
0.6
0 5 10 15 20 25 30 35 40 45 50
#Subnetworks/Genes


GSE 3494 - Breast Cancer


Subnetwork Marker Statistics

Avg AUC Avg AUC
# ER-50 6988 10950 # ER-50 6988 8671
GMI 806 0.38 0.86 0.95 755 0.34 0.84 0.99
NC 923 0.12 0.87 0.99 N/A N/A 0.86 N/A
wDCB 282 0.76 0.91 1.00 216 0.74 0.91 1.00
8671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.)
NC = NetCover (Chowdhury et al.)
wDCB = weighted Density Constrained Biclustering
# = total number of subnetworks computed
ER-50 = enrichment rate of the top-50 markers


Top Marker 8671

• DNA replication
initiation
• DNA metabolic
process
• TP53, BRCA1: tumor
suppressor genes
• Minichromosome
maintenance (MCM)
complex
• Protein kinase CDC7
phosphorylates
MCM2


Top Marker 10950

• Nukleotide excision
• DNA clamp (PCNA)
loader activity
• Polymorphisms in
WRN ↔ colon cancer
• DNMT1: methyl
transferase, silences
cell growth repressors


Future Works

1. Comparison subnetwork signatures of different cancers or subtypes of a
particular cancer
2. Extend the interaction network with for example ncRNA-protein interactions
3. Redesign novel methods to work with real valued continuous phenotype
variables


Thanks for the attention!

Eccb

More Related Content

Similar to Eccb

Recently uploaded

Eccb