SlideShare a Scribd company logo
1 of 37
Expression Networks for
Cancer Gene Markers
DIMITRIOS-APOSTOLOS CHALEPAKIS-NTELLIS
TECHNICAL UNIVERSITY OF CRETE
SCHOOL OF ELECTRONIC AND COMPUTER ENGINEERING
DIGITAL SIGNAL & IMAGE PROCESSING LAB
Supervisor: Prof. M.Zervakis
Assoc. Prof. A.Mania
Principal Inv. D.Kafetzopoulos
July 2015
Presentation Structure
1. Aim of Research
2. Background
3. Method
◦ Structure Learning
◦ Structure Analysis
◦ Finding Hubs
◦ Clustering
4. Results & Conclusion
TECHNICAL UNIVERSITY OF CRETE 2
1. Aim of Research
◦ Create Bayesian networks with two kind of variables (discrete and
continuous) based on a small amount of genes (part of a gene
signature of breast cancer)
◦ Find new or confirm known interactions related with breast cancer
◦ Study the properties of Bayesian networks and how they can be
compared with biological and other networks.
◦ Find significant genes, modules and pathways in these networks.
TECHNICAL UNIVERSITY OF CRETE 3
2. Background
Why Networks?
◦Network biology is a multidisciplinary intersection of
mathematic, computer science and biology.
◦Network biology provide valuable frameworks :
◦ to analyze high throughput data
◦ significantly altered our understanding of biological systems
◦ embedded in important applications in practical medicine.
TECHNICAL UNIVERSITY OF CRETE 4
2. Background
Bayesian Networks(1)
◦ A Bayesian network specifies a joint distribution in a structured form
◦ Represent dependence/independence via a directed graph
◦ Nodes = random variables (this variables can be expression levels of
different genes)
◦ Edges = direct dependence
◦ Requires that graph is acyclic (no directed cycles)
◦ Two components to a Bayesian network
◦ The graph structure
◦ The numerical probabilities (for each variable given its parents)
TECHNICAL UNIVERSITY OF CRETE 5
2. Background
Bayesian Networks(2)
◦ Specifically encodes the Markov assumptions according to which, each variable is
independent of its non-descendant, given its parents.
◦ Any joint distribution that satisfied Markov’s assumptions can be analyzed into the product
form:
𝑃 𝑋1, … , 𝑋 𝑛 =
𝑖=1
𝑛
𝑃(𝑋𝑖|𝑃𝑎
𝐺(𝑋𝑖))
Xi : random variables
PaG : sets of parents of Xi
◦ To fully determine a joint distribution we need to determine each of the conditional
probabilities in the product form.
◦ The functional form of the conditional distribution can be:
1. Multinomial – for discrete variables
2. Linear Gaussian – for continuous variables
TECHNICAL UNIVERSITY OF CRETE 6
3. Methodology
Data
◦ High throughput data - Gene expression values of 4174 genes
◦ 529 samples
◦ 425 cancer and 104 control samples
◦ We used the 82 from 4174 genes for our research
◦ 77 genes are part of a gene signature (Nikos Chlis) + 5 control genes
◦ The interactions of this 82 genes compose our initial network
◦ Our initial network composed from 12 biological verified interactions
◦ 15 genes participate in these 12 interactions
TECHNICAL UNIVERSITY OF CRETE 7
3. Methodology
Structure Learning
◦ Structure learning is the process which induces Bayesian Networks from data
◦ We get different networks if we change the parameters:
◦ Variable type (Discrete, Continuous)
◦ Data (cancer and control samples)
◦ Discrete Variables – Discretization based on 2 thresholds
◦ Continuous Variables – need of Gaussian distribution
TECHNICAL UNIVERSITY OF CRETE 8
3. Methodology
Finding Thresholds - Discretization(1)
◦ 900/4174 most differential expressed genes
◦ Means of the expression values of these 900 genes, separately for cancer and control smaples
◦ Creating two classes – max class and min class
◦ Comparing the mean values of cancer and control samples, we group them into these two classes
◦ Creating the histograms of these two classes
◦ Finding thresholds from the joint Gaussian Fit
TECHNICAL UNIVERSITY OF CRETE 9
3. Methodology
Finding Thresholds - Discretization(2)
TECHNICAL UNIVERSITY OF CRETE 10
Thresholds Discrete Value
Expression value < 1.131 Underexpressed
1.131<Expression value<3.48 Normal
Expression value>3.48 Overexpressed
3. Methodology
Structure Learning – Gaussian Variables
◦ Each variable take the expression values of each gene
◦ The expression values of each gene are normally(Gaussian) distributed because a log base 2
transformation has been applied to them.
◦ We can ascertain this by creating the histogram of the expression values for each gene
◦ We observe that our data are about to be normally distributed
TECHNICAL UNIVERSITY OF CRETE 11
3. Methodology
Structure Learning
◦ We used K2 algorithm which is a score-based algorithm.
◦ It attempts to select the network structure that maximizes the networks posterior
probability given the experimental data.
◦ The K2 search
◦ assumes that a node has no parents
◦ adds incrementally that parent from a given ordering whose addition increases the score of
the resulting structure the most.
◦ stops adding parents to the node when the score does not increase.
◦ The final score of network is obtained by multiplying the individual score of
nodes.
TECHNICAL UNIVERSITY OF CRETE 12
3. Methodology
Structure Implementation
◦ K2 algorithm needs to know:
◦ The maximum number of node parents (the number of our total genes - 82)
◦ The order of the nodes-variables (reduces computational complexity)
◦ We used two kind of orders of the nodes:
1. MWST. The order that obtained applying the MWST algorithm (Maximum Weight
Spanning Tree) and a topological sorting
2. CUSTOM. The first 15 (15 nodes participate in the initial 12 interactions) slots of the
custom order are result from the MWST and the rest are random order.
TECHNICAL UNIVERSITY OF CRETE 13
Δείγματα
Καρκινικά
Διακριτές
μεταβλητές
MWST
ταξινόμηση
CUSTOM
ταξινόμηση
Συνεχείς
μεταβλητές
MWST
ταξινόμηση
CUSTOM
ταξινόμηση
Υγιή
Συνεχείς
μεταβλητές
MWST
ταξινόμηση
CUSTOM
ταξινόμηση
Διακριτές
μεταβλητές
MWST
ταξινόμηση
CUSTOM
ταξινόμηση
3. Methodology
Structure Implementation
◦ Finally we learned 8 structures:
◦ In order to study the networks we created some unions:
1. Cancer Samples – Discrete Variables (CD)
2. Cancer Samples – Gaussian Variables (CG)
3. Control Samples – Discrete Variables (HD)
4. Control Samples – Gaussian Variables (HG)
5. Union Cancer Samples (CU)
6. Union Control Samples (HU)
TECHNICAL UNIVERSITY OF CRETE 14
Network #Nodes #Edges
CD 74 142
CG 82 765
HD 77 102
HG 82 562
CU 82 843
HU 82 605
At CD and HD networks, don’t
participate all nodes-genes in an
interaction, maybe because we
have information loss from
discretization.
TECHNICAL UNIVERSITY OF CRETE 15
Bayesian Network
Cancer Union
3. Methodology
Structure Analysis
◦ Small-world and scale-free are some properties that real networks often have.
◦ Small- worlds is a type of mathematical graph in which
◦ most nodes are not neighbors of one another, but most nodes can be reached from every other by a
small number of hops or steps.
◦ the typical distance L between two randomly chosen nodes (the number of steps required) grows
proportionally to the logarithm of the number of nodes N in the network
◦ Their model is characterized by
◦ a small average path length
◦ a large clustering coefficient
◦ A scale-free network is a network whose degree distribution follows a power law
◦ The most notable characteristic in a scale-free network is the relative commonness of
vertices with a degree that greatly exceeds the average. The highest-degree nodes are often
called "hubs", and are thought to serve specific purposes in their networks, although this
depends greatly on the domain.
TECHNICAL UNIVERSITY OF CRETE 16
3. Methodology
Structure Analysis - Examples
TECHNICAL UNIVERSITY OF CRETE 17
Small-World Network Example
Hubs are highlighted
Average Path Length = 1.8
Clustering Coefficient = 0.522
Random Network Example
Average Path Length = 2.1
Clustering Coefficient = 0.167
Scale-Free Network Example
Hubs are highlighted
3. Methodology
Structure Analysis – Small World
TECHNICAL UNIVERSITY OF CRETE 18
Network C Crand l log(n) n
Cancer (CU) 0,152 0,126 2,37308 4,4067 82
Control (HU) 0,120 0,099 2,47681 4,4067 82
C = clustering coefficient of the current network
Crand = clustering coefficient of an equivalent randomized network
l = average path length of the current network
n = number of nodes
• To characterize a network as Small-World:
• C > Crand and l = log(n)
• The Clustering Coefficients of our networks are a bit higher that these of the
random network.
• Average path lengths are much lower that the logarithm of n.
• So the networks are not Small-World
3. Methodology
Structure Analysis – Scale Free
TECHNICAL UNIVERSITY OF CRETE 19
Cancer Union (CU) Network Control Union (HU) Network
• The Degree Distributions of
our networks follow a power
law.
• So the networks can be
categorized as Scale-Free.
• This is in accordance with
other studies.
3. Methodology
Centralities
◦ Centralities are some topological characteristics-indices that produce rankings which seek to
identify the most important nodes in a network model.
◦ Degree
◦ of a node in a network is the number of links (vertices) incident on the node. If a network
is directed, meaning that edges point in one direction from one node to another node,
then nodes have two different degrees, the in-degree, which is the number of incoming
edges, and the out-degree, which is the number of outgoing edges.
◦ Betweenness centrality
◦ determines the relative importance of a node by measuring the amount of traffic flowing
through that node to other nodes in the network. This is done by measuring the fraction
of paths connecting all pairs of nodes and containing the node of interest.
TECHNICAL UNIVERSITY OF CRETE 20
3. Methodology
Finding Hubs
◦ We need to find the Hubs in the network because they have significant role in networks.
◦ Hubs are the highest degree nodes of a network and have usually great biological
significance.
◦ Degree is a local node metric
◦ Betweenness is a global node metric
◦ So to find the significant nodes-central proteins we used a combination of these metrics
◦ How to find hubs in a network:
◦ Draw histograms and cumulative distributions of the node degrees and betweenness for the network
◦ In cumulative distribution find the point that the curve starts flattening
◦ We call this values as the minimum hub node degree (or betweenness) value
◦ The nodes with the highest degrees and betweennesses are the most significant in out network
TECHNICAL UNIVERSITY OF CRETE 21
3. Methodology
Finding Hubs
TECHNICAL UNIVERSITY OF CRETE 22
Cancer Union (CU) Network – Cumulative Distribution Control Union (HU) Network – Cumulative Distribution
• 7 Hubs occurred • 11 Hubs occurred
In-Degree In-Degree
Out-Degree Out-Degree
Betweenness Betweenness
3. Methodology
Network Comparison
◦ We need to compare our networks to find their similarity degree.
◦ We compared the networks with a method that estimates the ratio of
correctness of one net with respect to an other. This measures ranges
between 0 and 1, where 0 is the lowest validity and 1 the highest.
◦ This method is based on distance levels between nodes – shortest paths.
TECHNICAL UNIVERSITY OF CRETE 23
3. Methodology
Network Comparison
TECHNICAL UNIVERSITY OF CRETE 24
• We compare the CU and HU networks.
• We got four V, one for each level.
• We got four levels because this is the
maximum shortest path in the networks.
• The value of V indicates us how correct is one
network with respect to the other.
• The best similarity is observed at level 4 and
3, in which we get 45-50% similarity.
• For bigger level we get bigger similarity.
Validity
Network
CU HU
V1 0,0992 0,1339
V2 0,3215 0,4124
V3 0,4491 0,5088
V4 0,4591 0,5138
3. Methodology
Conclusion
◦ Until now:
◦ The networks that we are working with are the Cancer Union (CU) and Control Union (HU)
◦ Our networks are Scale-Free
◦ There are hubs for each network
◦ What’s next?
◦ We want to find significant modules and molecular complexes in our networks
◦ We apply two clustering algorithms (MCODE and jActiveModules)
◦ We create a difference network and study some centralities of its clusters
TECHNICAL UNIVERSITY OF CRETE 25
3. Methodology
Clustering with MCODE algorithm
◦ MCODE uses a clustering coefficient algorithm to identify molecular complexes in a large
protein interaction network derived from heterogeneous experimental sources.
◦ The idea of this algorithm is that highly interconnected, or dense, regions of the network may
represent complexes.
◦ The algorithmic stages are:
◦ Vertex weighting
◦ which weights all of the nodes based on their local network density using the highest k-core of the
vertex neighborhood.
◦ Molecular complex prediction
◦ staring with the highest-weighted node, recursively move out adding nodes to the complex that are
above a given threshold.
TECHNICAL UNIVERSITY OF CRETE 26
3. Methodology
Clustering with MCODE algorithm
TECHNICAL UNIVERSITY OF CRETE 27
Union Bayesian Network
with Control Samples (HU)
Union Bayesian Network
with Cancer Samples (CU)
3. Methodology
Clustering with jActiveModules algorithm
◦ A general method for searching the network to find active subnetworks
◦ This algorithm uses
◦ a statistical scoring system which captures the amount of gene expression change in a given
subnetwork.
◦ search algorithm for identifying the highest scoring subnetworks.
◦ The algorithmic stages are:
◦ Basic z-score calculation
◦ Transform p-values (significance of differential expression for each gene) to z-score
◦ Calibrating z against the background distribution
◦ Searching for high-scoring subnetworks via simulated annealing
TECHNICAL UNIVERSITY OF CRETE 28
3. Methodology
Clustering with jActiveModules algorithm
TECHNICAL UNIVERSITY OF CRETE 29
Union Bayesian Network
with Cancer Samples (CU)
Union Bayesian Network
with Control Samples (HU)
TECHNICAL UNIVERSITY OF CRETE 30
Current Knowledge: 12 experimentally verified interactions among the 82 genes
 Control Union Interactions: FN1-CDKN2A, FN1-KRT16, FN1-COMP, ERBB2-NRG1 and FGFR3-FGF18
 Cancer Union Interactions: ACTA1-CDKN2A, FN1-COMP and KRT16-IGHG1
 A number of known interactions (Control 5/12, Cancer 3/12) are validated
 New interactions are provided (Control 600, Cancer 840) that can be experimentally verified
 There is a need of a more compact model in order to examine the biological significance of the identified
interactions in constructed Networks
 Construction of Differentiating Network from cancer and control Bayesian Networks
 Identification of enriched pathways within MCODE clusters
 Construction of Differentiating Sub-Networks (cancer and control)
Evaluation of Interactions of Bayesian Networks
Biological Results
Differentiating Network
TECHNICAL UNIVERSITY OF CRETE 31
TECHNICAL UNIVERSITY OF CRETE 32
Differentiating
Network
Significant (p≤0.05) pathways of
Differentiating Network are
provided that are associated with
breast cancer. For each MCODE
cluster or pathway within MCODE
cluster the average betweenness
centrality and the average
degree centrality were computed.
Average
Betweenness
Centrality
Average
Degree
Centrality
MCODE - Cluster 1 18 12.5
MCODE - Cluster 2 23.78 7.14
MCODE - Cluster 3 5.7 6
MCODE - Cluster 4 2.3 2.7
MCODE - PATHWAYS Pathways Gene Symbol
Average
Betweenness
Centrality
Average
Degree
Centrality
Pathway 1_1 Pathways in cancer ERBB2 FGFR3 CDKN2A FGF18 EGF 23.92 14.2
Pathway 1_2 Focal adhesion ERBB2 COL11A1 COMP EGF 23.73 13.75
Pathway 1_3 ErbB signaling pathway NRG2 ERBB2 EGF 27.52 14.66
Pathway 1_4 Regulation of actin cytoskeleton FGFR3 FGF18 EGF 21.79 13.33
Pathway 1_5 MAPK signaling pathway FGFR3 FGF18 EGF 21.79 13.33
Pathway 1_6 EGF-EGFR Signaling Pathway ERBB2 REPS2 EGF 27.38 14.33
Pathway 1_7 ECM-receptor interaction COL11A1 COMP 20.33 13
Pathway 1_8 Endocytosis FGFR3 EGF 26.95 14
Pathway 1_9 Endochondral Ossification FGFR3 FGF18 21.52 13.5
Pathway 2_1 Protein digestion and absorption COL17A1 CPA3 39.45 9
Pathway 2_2 Chemokine signaling pathway CCL19 CCL18 15.35 6
Pathway 2_3 Metabolic pathways TAT ATP6V0A4 HSD17B2 33.48 8.66
Pathway 2_4 Androgen receptor signaling pathway BRCA1 PARK7 19.47 6
KEGG & WikiPathways in Differentiating Network
MCODE - CLUSTERS
Cancer and Control hubs are
included in the differentiating
network.
3. Methodology: Differentiating Network
3. Methodology
Differentiating Network
◦ Create the difference network of CU and HU
◦ Apply the MCODE algorithm on this network to observe the clusters
◦ Analyze the centralities of the clusters and their pathways
◦ The centralities of these pathways were analyzed by aggregating the centralities of all genes
enriched in one pathway
TECHNICAL UNIVERSITY OF CRETE 33
0
2
4
6
8
10
12
14
16
Cluster1
Cluster2
Cluster3
Cluster4
Pathway1_1
Pathway1_2
Pathway1_3
Pathway1_4
Pathway1_5
Pathway1_6
Pathway1_7
Pathway1_8
Pathway1_9
Pathway2_1
Pathway2_2
Pathway2_3
Pathway2_4
Degree
Degree
0
5
10
15
20
25
30
35
40
Cluster1
Cluster2
Cluster3
Cluster4
Pathway1_1
Pathway1_2
Pathway1_3
Pathway1_4
Pathway1_5
Pathway1_6
Pathway1_7
Pathway1_8
Pathway1_9
Pathway2_1
Pathway2_2
Pathway2_3
Pathway2_4
Betweenness Centrality
Betweenness Centrality
TECHNICAL UNIVERSITY OF CRETE 34
Construction of
Differentiating Sub-Networks
By Considering
 Significant Pathways of Differentiating Network
 ʽcancerʼ and ʽhealthyʼ hubs
 adjacent edges of ʽcancerʼ or ʽhealthyʼ hubs
DifferentiatingʽCancerʼSubnetwork DifferentiatingʻHealthyʼSubnetwork
Biological Results
4. Conclusions
◦ Networks from Cancer and Control samples are Scale-Free.
◦ There are significant nodes in Cancer and Control Networks with biological
significance.
◦ There are significant pathways in network complexes.
TECHNICAL UNIVERSITY OF CRETE 35
4. Conclusions
TECHNICAL UNIVERSITY OF CRETE 36
• The differentiating network involves all Hub nodes, so that Hub genes can be considered as
potential gene markers for breast cancer
• FN1, TTYH1 and OGN are the common hubs between cancer and control differentiating sub-
networks
• There are fewer interactions in cancer differentiating sub-network compared to the number
of interactions in control differentiating sub-network, despite the fact that the number of
genes remains constant
The constructed Networks and Sub-networks can give an insight into the
molecular alterations taking place in different conditions (cancer and control)
The differentiating sub-networks might be considered as local models to test
a biological hypothesis and are more convenient for experimental design
TECHNICAL UNIVERSITY OF CRETE 37

More Related Content

What's hot

A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...TELKOMNIKA JOURNAL
 
A systematic review on sequence-to-sequence learning with neural network and ...
A systematic review on sequence-to-sequence learning with neural network and ...A systematic review on sequence-to-sequence learning with neural network and ...
A systematic review on sequence-to-sequence learning with neural network and ...IJECEIAES
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksDR.P.S.JAGADEESH KUMAR
 
A New Function-based Framework for Classification and Evaluation of Mutual Ex...
A New Function-based Framework for Classification and Evaluation of Mutual Ex...A New Function-based Framework for Classification and Evaluation of Mutual Ex...
A New Function-based Framework for Classification and Evaluation of Mutual Ex...CSCJournals
 
Brain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafnessBrain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafnessAntonio Carlos da Silva Senra Filho
 
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...ijwmn
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networkseSAT Publishing House
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONcscpconf
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningjaumebp
 
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSSECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSIJCNCJournal
 
Mesoscale Structures in Networks
Mesoscale Structures in NetworksMesoscale Structures in Networks
Mesoscale Structures in NetworksMason Porter
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...IJCNCJournal
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genapnooriasukmaningtyas
 

What's hot (20)

A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
A genetic algorithm approach for predicting ribonucleic acid sequencing data ...
 
A systematic review on sequence-to-sequence learning with neural network and ...
A systematic review on sequence-to-sequence learning with neural network and ...A systematic review on sequence-to-sequence learning with neural network and ...
A systematic review on sequence-to-sequence learning with neural network and ...
 
Temporal networks - Alain Barrat
Temporal networks - Alain BarratTemporal networks - Alain Barrat
Temporal networks - Alain Barrat
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
 
G44083642
G44083642G44083642
G44083642
 
1207.2600
1207.26001207.2600
1207.2600
 
1104.0355
1104.03551104.0355
1104.0355
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural Networks
 
A New Function-based Framework for Classification and Evaluation of Mutual Ex...
A New Function-based Framework for Classification and Evaluation of Mutual Ex...A New Function-based Framework for Classification and Evaluation of Mutual Ex...
A New Function-based Framework for Classification and Evaluation of Mutual Ex...
 
Brain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafnessBrain structural connectivity and functional default mode network in deafness
Brain structural connectivity and functional default mode network in deafness
 
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
 
A survey research summary on neural networks
A survey research summary on neural networksA survey research summary on neural networks
A survey research summary on neural networks
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
 
Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Presentation, navid khoob
Presentation, navid khoobPresentation, navid khoob
Presentation, navid khoob
 
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETSSECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
SECURING BGP BY HANDLING DYNAMIC NETWORK BEHAVIOR AND UNBALANCED DATASETS
 
Mesoscale Structures in Networks
Mesoscale Structures in NetworksMesoscale Structures in Networks
Mesoscale Structures in Networks
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
 

Viewers also liked

Innovación, invención
Innovación, invenciónInnovación, invención
Innovación, invenciónAndrea Jerez
 
99 atividades-de-ensino-religioso-para-o-ensino-fundamental1
99 atividades-de-ensino-religioso-para-o-ensino-fundamental199 atividades-de-ensino-religioso-para-o-ensino-fundamental1
99 atividades-de-ensino-religioso-para-o-ensino-fundamental1Christian Bairros
 
Effective confidentiality training
Effective confidentiality trainingEffective confidentiality training
Effective confidentiality trainingChristine Ochs
 
Media studies transition work
Media studies transition workMedia studies transition work
Media studies transition work_kellypowell_
 
Loadmaster Corporate Brochure 2016
Loadmaster Corporate Brochure 2016Loadmaster Corporate Brochure 2016
Loadmaster Corporate Brochure 2016Jeff Myklebust
 
Problemas ambientales
Problemas ambientalesProblemas ambientales
Problemas ambientalesatlasmanda
 
Self introduction (1)
Self introduction (1)Self introduction (1)
Self introduction (1)Kayla Woodson
 
Problemas ambientales
Problemas ambientalesProblemas ambientales
Problemas ambientalesatlasmanda
 

Viewers also liked (14)

Innovación, invención
Innovación, invenciónInnovación, invención
Innovación, invención
 
ABergeron_PaceSetters
ABergeron_PaceSettersABergeron_PaceSetters
ABergeron_PaceSetters
 
El raton
El ratonEl raton
El raton
 
ERGONOMIC
ERGONOMICERGONOMIC
ERGONOMIC
 
99 atividades-de-ensino-religioso-para-o-ensino-fundamental1
99 atividades-de-ensino-religioso-para-o-ensino-fundamental199 atividades-de-ensino-religioso-para-o-ensino-fundamental1
99 atividades-de-ensino-religioso-para-o-ensino-fundamental1
 
Effective confidentiality training
Effective confidentiality trainingEffective confidentiality training
Effective confidentiality training
 
Media studies transition work
Media studies transition workMedia studies transition work
Media studies transition work
 
Loadmaster Corporate Brochure 2016
Loadmaster Corporate Brochure 2016Loadmaster Corporate Brochure 2016
Loadmaster Corporate Brochure 2016
 
Problemas ambientales
Problemas ambientalesProblemas ambientales
Problemas ambientales
 
Self introduction (1)
Self introduction (1)Self introduction (1)
Self introduction (1)
 
Problemas ambientales
Problemas ambientalesProblemas ambientales
Problemas ambientales
 
Elementos que componen una red.
Elementos que componen una red.Elementos que componen una red.
Elementos que componen una red.
 
My
MyMy
My
 
Transformadores
TransformadoresTransformadores
Transformadores
 

Similar to Thesis Presentation

A survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manetsA survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manetsIAEME Publication
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofeSAT Publishing House
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...ijfcstjournal
 
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...IJMIT JOURNAL
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitantChoksi1
 
Pizza club - March 2017 - Gaia
Pizza club - March 2017 - GaiaPizza club - March 2017 - Gaia
Pizza club - March 2017 - GaiaRSG Luxembourg
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...Editor IJCATR
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionAPNIC
 
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...IJMIT JOURNAL
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationSaigeRutherford
 
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Christopher Neighbor
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programmingeSAT Publishing House
 
Ameliorate the performance using soft computing approaches in wireless networks
Ameliorate the performance using soft computing approaches  in wireless networksAmeliorate the performance using soft computing approaches  in wireless networks
Ameliorate the performance using soft computing approaches in wireless networksIJECEIAES
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfJayanti Pande
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 

Similar to Thesis Presentation (20)

A survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manetsA survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manets
 
A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...A clonal based algorithm for the reconstruction of genetic network using s sy...
A clonal based algorithm for the reconstruction of genetic network using s sy...
 
A clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction ofA clonal based algorithm for the reconstruction of
A clonal based algorithm for the reconstruction of
 
C04511822
C04511822C04511822
C04511822
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...
 
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
 
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptxNitant_Choksi_CAP6545_Presentation_Slides.pptx
Nitant_Choksi_CAP6545_Presentation_Slides.pptx
 
Pizza club - March 2017 - Gaia
Pizza club - March 2017 - GaiaPizza club - March 2017 - Gaia
Pizza club - March 2017 - Gaia
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
Data aggregation in wireless sensor networks
Data aggregation in wireless sensor networksData aggregation in wireless sensor networks
Data aggregation in wireless sensor networks
 
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow Detection
 
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual Variation
 
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Ameliorate the performance using soft computing approaches in wireless networks
Ameliorate the performance using soft computing approaches  in wireless networksAmeliorate the performance using soft computing approaches  in wireless networks
Ameliorate the performance using soft computing approaches in wireless networks
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdf
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 

Thesis Presentation

  • 1. Expression Networks for Cancer Gene Markers DIMITRIOS-APOSTOLOS CHALEPAKIS-NTELLIS TECHNICAL UNIVERSITY OF CRETE SCHOOL OF ELECTRONIC AND COMPUTER ENGINEERING DIGITAL SIGNAL & IMAGE PROCESSING LAB Supervisor: Prof. M.Zervakis Assoc. Prof. A.Mania Principal Inv. D.Kafetzopoulos July 2015
  • 2. Presentation Structure 1. Aim of Research 2. Background 3. Method ◦ Structure Learning ◦ Structure Analysis ◦ Finding Hubs ◦ Clustering 4. Results & Conclusion TECHNICAL UNIVERSITY OF CRETE 2
  • 3. 1. Aim of Research ◦ Create Bayesian networks with two kind of variables (discrete and continuous) based on a small amount of genes (part of a gene signature of breast cancer) ◦ Find new or confirm known interactions related with breast cancer ◦ Study the properties of Bayesian networks and how they can be compared with biological and other networks. ◦ Find significant genes, modules and pathways in these networks. TECHNICAL UNIVERSITY OF CRETE 3
  • 4. 2. Background Why Networks? ◦Network biology is a multidisciplinary intersection of mathematic, computer science and biology. ◦Network biology provide valuable frameworks : ◦ to analyze high throughput data ◦ significantly altered our understanding of biological systems ◦ embedded in important applications in practical medicine. TECHNICAL UNIVERSITY OF CRETE 4
  • 5. 2. Background Bayesian Networks(1) ◦ A Bayesian network specifies a joint distribution in a structured form ◦ Represent dependence/independence via a directed graph ◦ Nodes = random variables (this variables can be expression levels of different genes) ◦ Edges = direct dependence ◦ Requires that graph is acyclic (no directed cycles) ◦ Two components to a Bayesian network ◦ The graph structure ◦ The numerical probabilities (for each variable given its parents) TECHNICAL UNIVERSITY OF CRETE 5
  • 6. 2. Background Bayesian Networks(2) ◦ Specifically encodes the Markov assumptions according to which, each variable is independent of its non-descendant, given its parents. ◦ Any joint distribution that satisfied Markov’s assumptions can be analyzed into the product form: 𝑃 𝑋1, … , 𝑋 𝑛 = 𝑖=1 𝑛 𝑃(𝑋𝑖|𝑃𝑎 𝐺(𝑋𝑖)) Xi : random variables PaG : sets of parents of Xi ◦ To fully determine a joint distribution we need to determine each of the conditional probabilities in the product form. ◦ The functional form of the conditional distribution can be: 1. Multinomial – for discrete variables 2. Linear Gaussian – for continuous variables TECHNICAL UNIVERSITY OF CRETE 6
  • 7. 3. Methodology Data ◦ High throughput data - Gene expression values of 4174 genes ◦ 529 samples ◦ 425 cancer and 104 control samples ◦ We used the 82 from 4174 genes for our research ◦ 77 genes are part of a gene signature (Nikos Chlis) + 5 control genes ◦ The interactions of this 82 genes compose our initial network ◦ Our initial network composed from 12 biological verified interactions ◦ 15 genes participate in these 12 interactions TECHNICAL UNIVERSITY OF CRETE 7
  • 8. 3. Methodology Structure Learning ◦ Structure learning is the process which induces Bayesian Networks from data ◦ We get different networks if we change the parameters: ◦ Variable type (Discrete, Continuous) ◦ Data (cancer and control samples) ◦ Discrete Variables – Discretization based on 2 thresholds ◦ Continuous Variables – need of Gaussian distribution TECHNICAL UNIVERSITY OF CRETE 8
  • 9. 3. Methodology Finding Thresholds - Discretization(1) ◦ 900/4174 most differential expressed genes ◦ Means of the expression values of these 900 genes, separately for cancer and control smaples ◦ Creating two classes – max class and min class ◦ Comparing the mean values of cancer and control samples, we group them into these two classes ◦ Creating the histograms of these two classes ◦ Finding thresholds from the joint Gaussian Fit TECHNICAL UNIVERSITY OF CRETE 9
  • 10. 3. Methodology Finding Thresholds - Discretization(2) TECHNICAL UNIVERSITY OF CRETE 10 Thresholds Discrete Value Expression value < 1.131 Underexpressed 1.131<Expression value<3.48 Normal Expression value>3.48 Overexpressed
  • 11. 3. Methodology Structure Learning – Gaussian Variables ◦ Each variable take the expression values of each gene ◦ The expression values of each gene are normally(Gaussian) distributed because a log base 2 transformation has been applied to them. ◦ We can ascertain this by creating the histogram of the expression values for each gene ◦ We observe that our data are about to be normally distributed TECHNICAL UNIVERSITY OF CRETE 11
  • 12. 3. Methodology Structure Learning ◦ We used K2 algorithm which is a score-based algorithm. ◦ It attempts to select the network structure that maximizes the networks posterior probability given the experimental data. ◦ The K2 search ◦ assumes that a node has no parents ◦ adds incrementally that parent from a given ordering whose addition increases the score of the resulting structure the most. ◦ stops adding parents to the node when the score does not increase. ◦ The final score of network is obtained by multiplying the individual score of nodes. TECHNICAL UNIVERSITY OF CRETE 12
  • 13. 3. Methodology Structure Implementation ◦ K2 algorithm needs to know: ◦ The maximum number of node parents (the number of our total genes - 82) ◦ The order of the nodes-variables (reduces computational complexity) ◦ We used two kind of orders of the nodes: 1. MWST. The order that obtained applying the MWST algorithm (Maximum Weight Spanning Tree) and a topological sorting 2. CUSTOM. The first 15 (15 nodes participate in the initial 12 interactions) slots of the custom order are result from the MWST and the rest are random order. TECHNICAL UNIVERSITY OF CRETE 13
  • 14. Δείγματα Καρκινικά Διακριτές μεταβλητές MWST ταξινόμηση CUSTOM ταξινόμηση Συνεχείς μεταβλητές MWST ταξινόμηση CUSTOM ταξινόμηση Υγιή Συνεχείς μεταβλητές MWST ταξινόμηση CUSTOM ταξινόμηση Διακριτές μεταβλητές MWST ταξινόμηση CUSTOM ταξινόμηση 3. Methodology Structure Implementation ◦ Finally we learned 8 structures: ◦ In order to study the networks we created some unions: 1. Cancer Samples – Discrete Variables (CD) 2. Cancer Samples – Gaussian Variables (CG) 3. Control Samples – Discrete Variables (HD) 4. Control Samples – Gaussian Variables (HG) 5. Union Cancer Samples (CU) 6. Union Control Samples (HU) TECHNICAL UNIVERSITY OF CRETE 14 Network #Nodes #Edges CD 74 142 CG 82 765 HD 77 102 HG 82 562 CU 82 843 HU 82 605 At CD and HD networks, don’t participate all nodes-genes in an interaction, maybe because we have information loss from discretization.
  • 15. TECHNICAL UNIVERSITY OF CRETE 15 Bayesian Network Cancer Union
  • 16. 3. Methodology Structure Analysis ◦ Small-world and scale-free are some properties that real networks often have. ◦ Small- worlds is a type of mathematical graph in which ◦ most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps. ◦ the typical distance L between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes N in the network ◦ Their model is characterized by ◦ a small average path length ◦ a large clustering coefficient ◦ A scale-free network is a network whose degree distribution follows a power law ◦ The most notable characteristic in a scale-free network is the relative commonness of vertices with a degree that greatly exceeds the average. The highest-degree nodes are often called "hubs", and are thought to serve specific purposes in their networks, although this depends greatly on the domain. TECHNICAL UNIVERSITY OF CRETE 16
  • 17. 3. Methodology Structure Analysis - Examples TECHNICAL UNIVERSITY OF CRETE 17 Small-World Network Example Hubs are highlighted Average Path Length = 1.8 Clustering Coefficient = 0.522 Random Network Example Average Path Length = 2.1 Clustering Coefficient = 0.167 Scale-Free Network Example Hubs are highlighted
  • 18. 3. Methodology Structure Analysis – Small World TECHNICAL UNIVERSITY OF CRETE 18 Network C Crand l log(n) n Cancer (CU) 0,152 0,126 2,37308 4,4067 82 Control (HU) 0,120 0,099 2,47681 4,4067 82 C = clustering coefficient of the current network Crand = clustering coefficient of an equivalent randomized network l = average path length of the current network n = number of nodes • To characterize a network as Small-World: • C > Crand and l = log(n) • The Clustering Coefficients of our networks are a bit higher that these of the random network. • Average path lengths are much lower that the logarithm of n. • So the networks are not Small-World
  • 19. 3. Methodology Structure Analysis – Scale Free TECHNICAL UNIVERSITY OF CRETE 19 Cancer Union (CU) Network Control Union (HU) Network • The Degree Distributions of our networks follow a power law. • So the networks can be categorized as Scale-Free. • This is in accordance with other studies.
  • 20. 3. Methodology Centralities ◦ Centralities are some topological characteristics-indices that produce rankings which seek to identify the most important nodes in a network model. ◦ Degree ◦ of a node in a network is the number of links (vertices) incident on the node. If a network is directed, meaning that edges point in one direction from one node to another node, then nodes have two different degrees, the in-degree, which is the number of incoming edges, and the out-degree, which is the number of outgoing edges. ◦ Betweenness centrality ◦ determines the relative importance of a node by measuring the amount of traffic flowing through that node to other nodes in the network. This is done by measuring the fraction of paths connecting all pairs of nodes and containing the node of interest. TECHNICAL UNIVERSITY OF CRETE 20
  • 21. 3. Methodology Finding Hubs ◦ We need to find the Hubs in the network because they have significant role in networks. ◦ Hubs are the highest degree nodes of a network and have usually great biological significance. ◦ Degree is a local node metric ◦ Betweenness is a global node metric ◦ So to find the significant nodes-central proteins we used a combination of these metrics ◦ How to find hubs in a network: ◦ Draw histograms and cumulative distributions of the node degrees and betweenness for the network ◦ In cumulative distribution find the point that the curve starts flattening ◦ We call this values as the minimum hub node degree (or betweenness) value ◦ The nodes with the highest degrees and betweennesses are the most significant in out network TECHNICAL UNIVERSITY OF CRETE 21
  • 22. 3. Methodology Finding Hubs TECHNICAL UNIVERSITY OF CRETE 22 Cancer Union (CU) Network – Cumulative Distribution Control Union (HU) Network – Cumulative Distribution • 7 Hubs occurred • 11 Hubs occurred In-Degree In-Degree Out-Degree Out-Degree Betweenness Betweenness
  • 23. 3. Methodology Network Comparison ◦ We need to compare our networks to find their similarity degree. ◦ We compared the networks with a method that estimates the ratio of correctness of one net with respect to an other. This measures ranges between 0 and 1, where 0 is the lowest validity and 1 the highest. ◦ This method is based on distance levels between nodes – shortest paths. TECHNICAL UNIVERSITY OF CRETE 23
  • 24. 3. Methodology Network Comparison TECHNICAL UNIVERSITY OF CRETE 24 • We compare the CU and HU networks. • We got four V, one for each level. • We got four levels because this is the maximum shortest path in the networks. • The value of V indicates us how correct is one network with respect to the other. • The best similarity is observed at level 4 and 3, in which we get 45-50% similarity. • For bigger level we get bigger similarity. Validity Network CU HU V1 0,0992 0,1339 V2 0,3215 0,4124 V3 0,4491 0,5088 V4 0,4591 0,5138
  • 25. 3. Methodology Conclusion ◦ Until now: ◦ The networks that we are working with are the Cancer Union (CU) and Control Union (HU) ◦ Our networks are Scale-Free ◦ There are hubs for each network ◦ What’s next? ◦ We want to find significant modules and molecular complexes in our networks ◦ We apply two clustering algorithms (MCODE and jActiveModules) ◦ We create a difference network and study some centralities of its clusters TECHNICAL UNIVERSITY OF CRETE 25
  • 26. 3. Methodology Clustering with MCODE algorithm ◦ MCODE uses a clustering coefficient algorithm to identify molecular complexes in a large protein interaction network derived from heterogeneous experimental sources. ◦ The idea of this algorithm is that highly interconnected, or dense, regions of the network may represent complexes. ◦ The algorithmic stages are: ◦ Vertex weighting ◦ which weights all of the nodes based on their local network density using the highest k-core of the vertex neighborhood. ◦ Molecular complex prediction ◦ staring with the highest-weighted node, recursively move out adding nodes to the complex that are above a given threshold. TECHNICAL UNIVERSITY OF CRETE 26
  • 27. 3. Methodology Clustering with MCODE algorithm TECHNICAL UNIVERSITY OF CRETE 27 Union Bayesian Network with Control Samples (HU) Union Bayesian Network with Cancer Samples (CU)
  • 28. 3. Methodology Clustering with jActiveModules algorithm ◦ A general method for searching the network to find active subnetworks ◦ This algorithm uses ◦ a statistical scoring system which captures the amount of gene expression change in a given subnetwork. ◦ search algorithm for identifying the highest scoring subnetworks. ◦ The algorithmic stages are: ◦ Basic z-score calculation ◦ Transform p-values (significance of differential expression for each gene) to z-score ◦ Calibrating z against the background distribution ◦ Searching for high-scoring subnetworks via simulated annealing TECHNICAL UNIVERSITY OF CRETE 28
  • 29. 3. Methodology Clustering with jActiveModules algorithm TECHNICAL UNIVERSITY OF CRETE 29 Union Bayesian Network with Cancer Samples (CU) Union Bayesian Network with Control Samples (HU)
  • 30. TECHNICAL UNIVERSITY OF CRETE 30 Current Knowledge: 12 experimentally verified interactions among the 82 genes  Control Union Interactions: FN1-CDKN2A, FN1-KRT16, FN1-COMP, ERBB2-NRG1 and FGFR3-FGF18  Cancer Union Interactions: ACTA1-CDKN2A, FN1-COMP and KRT16-IGHG1  A number of known interactions (Control 5/12, Cancer 3/12) are validated  New interactions are provided (Control 600, Cancer 840) that can be experimentally verified  There is a need of a more compact model in order to examine the biological significance of the identified interactions in constructed Networks  Construction of Differentiating Network from cancer and control Bayesian Networks  Identification of enriched pathways within MCODE clusters  Construction of Differentiating Sub-Networks (cancer and control) Evaluation of Interactions of Bayesian Networks Biological Results
  • 32. TECHNICAL UNIVERSITY OF CRETE 32 Differentiating Network Significant (p≤0.05) pathways of Differentiating Network are provided that are associated with breast cancer. For each MCODE cluster or pathway within MCODE cluster the average betweenness centrality and the average degree centrality were computed. Average Betweenness Centrality Average Degree Centrality MCODE - Cluster 1 18 12.5 MCODE - Cluster 2 23.78 7.14 MCODE - Cluster 3 5.7 6 MCODE - Cluster 4 2.3 2.7 MCODE - PATHWAYS Pathways Gene Symbol Average Betweenness Centrality Average Degree Centrality Pathway 1_1 Pathways in cancer ERBB2 FGFR3 CDKN2A FGF18 EGF 23.92 14.2 Pathway 1_2 Focal adhesion ERBB2 COL11A1 COMP EGF 23.73 13.75 Pathway 1_3 ErbB signaling pathway NRG2 ERBB2 EGF 27.52 14.66 Pathway 1_4 Regulation of actin cytoskeleton FGFR3 FGF18 EGF 21.79 13.33 Pathway 1_5 MAPK signaling pathway FGFR3 FGF18 EGF 21.79 13.33 Pathway 1_6 EGF-EGFR Signaling Pathway ERBB2 REPS2 EGF 27.38 14.33 Pathway 1_7 ECM-receptor interaction COL11A1 COMP 20.33 13 Pathway 1_8 Endocytosis FGFR3 EGF 26.95 14 Pathway 1_9 Endochondral Ossification FGFR3 FGF18 21.52 13.5 Pathway 2_1 Protein digestion and absorption COL17A1 CPA3 39.45 9 Pathway 2_2 Chemokine signaling pathway CCL19 CCL18 15.35 6 Pathway 2_3 Metabolic pathways TAT ATP6V0A4 HSD17B2 33.48 8.66 Pathway 2_4 Androgen receptor signaling pathway BRCA1 PARK7 19.47 6 KEGG & WikiPathways in Differentiating Network MCODE - CLUSTERS Cancer and Control hubs are included in the differentiating network. 3. Methodology: Differentiating Network
  • 33. 3. Methodology Differentiating Network ◦ Create the difference network of CU and HU ◦ Apply the MCODE algorithm on this network to observe the clusters ◦ Analyze the centralities of the clusters and their pathways ◦ The centralities of these pathways were analyzed by aggregating the centralities of all genes enriched in one pathway TECHNICAL UNIVERSITY OF CRETE 33 0 2 4 6 8 10 12 14 16 Cluster1 Cluster2 Cluster3 Cluster4 Pathway1_1 Pathway1_2 Pathway1_3 Pathway1_4 Pathway1_5 Pathway1_6 Pathway1_7 Pathway1_8 Pathway1_9 Pathway2_1 Pathway2_2 Pathway2_3 Pathway2_4 Degree Degree 0 5 10 15 20 25 30 35 40 Cluster1 Cluster2 Cluster3 Cluster4 Pathway1_1 Pathway1_2 Pathway1_3 Pathway1_4 Pathway1_5 Pathway1_6 Pathway1_7 Pathway1_8 Pathway1_9 Pathway2_1 Pathway2_2 Pathway2_3 Pathway2_4 Betweenness Centrality Betweenness Centrality
  • 34. TECHNICAL UNIVERSITY OF CRETE 34 Construction of Differentiating Sub-Networks By Considering  Significant Pathways of Differentiating Network  ʽcancerʼ and ʽhealthyʼ hubs  adjacent edges of ʽcancerʼ or ʽhealthyʼ hubs DifferentiatingʽCancerʼSubnetwork DifferentiatingʻHealthyʼSubnetwork Biological Results
  • 35. 4. Conclusions ◦ Networks from Cancer and Control samples are Scale-Free. ◦ There are significant nodes in Cancer and Control Networks with biological significance. ◦ There are significant pathways in network complexes. TECHNICAL UNIVERSITY OF CRETE 35
  • 36. 4. Conclusions TECHNICAL UNIVERSITY OF CRETE 36 • The differentiating network involves all Hub nodes, so that Hub genes can be considered as potential gene markers for breast cancer • FN1, TTYH1 and OGN are the common hubs between cancer and control differentiating sub- networks • There are fewer interactions in cancer differentiating sub-network compared to the number of interactions in control differentiating sub-network, despite the fact that the number of genes remains constant The constructed Networks and Sub-networks can give an insight into the molecular alterations taking place in different conditions (cancer and control) The differentiating sub-networks might be considered as local models to test a biological hypothesis and are more convenient for experimental design