SlideShare a Scribd company logo
1 of 33
Urszula Czeriwnska, Laurence Calzone, Emmanuel Barillot and Andrei Zinovyev
Atelier Grands Graphes et Bioinformatique - EGC 2016
DEDAL CYTOSCAPE 3 APP
FOR PRODUCING AND MORPHING
DATA-DRIVEN AND STRUCTURE-DRIVEN
NETWORK LAYOUTS
U900 Computational Systems Biology for Cancer
OUTLINE
FROM EXTRATION OF KNOWLEDGE
TO INTELLIGENT LAYOUT
COMBING MUTIDIMENTIONAL DATA
AND NETWORK STRUCTURE
SUMMARY
DEDAL CYTOSCAPE 3.0 APP.
FROM EXTRACTION OF
KNOWLEDGE TO INTELLIGENT
LAYOUT
EXTRACTION OF KNOWLEDGE
IN BIOLOGY
molecular biology interactions -> networks
: the creation of knowledge from
structured and unstructured sources;
the resulting knowledge needs to be in a
machine-readable format;
protein 1 protein 2interaction
node nodeedge
NETWORKS
NETWORKS
Moldovan and D’Andrea 2009
Peri et al., 2004
LAYOUTS
circular
hierarchical
organic
COMBINING MULTIDIMENSIONAL DATA
AND NETWORK
THERE IS NOT A SINGLE LAYOUT
mapping data on the top of pre-defined
biological network layout
identyfing subnetworks from a global
network processing certain properities
computed from the data
using biological network structure for
pre-processing the high throughput data
1
2
3
T-test statistics
1. MAPPING DATA ON THE TOP OF THE NETWORK
Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012
Ulitsky and Shamir, 2007
2. IDENTYFING SUBNETWORKS
3. USING BIOLOGICAL NETWORK STRUCTURE FOR
PRE-PROCESSING THE HIGH THROUGHPUT DATA
Hofree et al, 2013
MULTIDIMENTIONAL SCALING
represent (dis)similarties as distances
dimension reduction
i.e. Principal Components Analysis
EXAMPLE: GOlorize Cytoscape App.
Garcia et al, 2007
DeDaL Cytoscape 3.0 App
DATA DRIVEN NETWORK LAYOUTS
& MORPHING
Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012
CYTOSCAPE: ORGANIC LAYOUT WITH MAPPED EXPRESSION DATA
T-test statistics
PCA
PCA layout
PC2
PC1
T-test statistics
DEDAL:
Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012
PRINCIPAL COMPONENT ANALYSIS DRIVEN LAYOUT
PMS
Pure network structure based
layout
Purely Data Driven Layout
PCA or Elmap
Combination of network
structure and data layout
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT
PMS
Pure network structure based
layout
Purely Data Driven Layout
PCA or Elmap
Combination of network
structure and data layout
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT
T-test statistics
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT
T-test statistics
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT
MORE ADVANCED FEATURES
OF DEDAL
ADVANCED FEATURES OF DEDAL
Pre-processing of the data:
o Smoothing
o Double centering
o Quality check
Post-processing of the layout:
o Alignment
o Overlapping
o Missing values
o Outliers
Data-driven layout: PCA or nPMs
Morphing
NONLINEAR PRINCIPAL MANIFOLDS
Elastic map algorithms
Principal manifolds approximation
Linear PCA vs nonlinear Principal Manifolds for
visalisation of breast cancer microarray data
a) 3D PCA linear manifold.
b) ELMap2D
c) PCA2D
Gorban A.N., Zinovyev A. 2010
-
+
NETWORK SMOOTHING
data
sample
initial space
subspace of
functions smooth
on a gene network
basis vectors are
eigenvectors of the
graph Laplacian
projected
sample
Rapaport et al.,2007
courtesy of A.Zinovyev
A B
C
D
E
F
G
H
I
K
A B
C
D
E
F
G
H
I
K
Smooth distribution,
balanced dosage
Non-smooth distribution,
unbalanced dosage
courtesy of A.Zinovyev
NETWORK SMOOTHING
p=10-23 p=10-5
BIG NETWORK
Czerwinska et al., 2015
1047 nodes
1986 edges
pearson coef.= 0.3 pearson coef.= 0.1
SCALABILITY
Czerwinska et al., 2015
network smoothing
network smoothing, reusing matrix
principal manifold
PCA 10 components
106
104
102
100
10-2
102 103 104
Number of nodes
Computationtime[sec]
TUTORIAL
bioinfo-out.curie.fr/projects/dedal/
TUTORIAL
bioinfo-out.curie.fr/projects/dedal/
PUBLICATION
There is a need to combine networks
and -omics data in biology
Network layout should be adapted
to the analysis
DeDaL – Cytoscape App. performs
o different types of data-driven layouts
o morphing between strucure-based and
data-driven layout
o pre-processing of data as double
centering and network smoothing
SUMMARY
THANK YOU
Urszula Czerwinska
urszula.czerwinska@curie.fr
@UlaLaParis

More Related Content

Similar to Atelier_GGB_v2

Network Topologies - Barabasi & Power Laws
Network Topologies - Barabasi & Power LawsNetwork Topologies - Barabasi & Power Laws
Network Topologies - Barabasi & Power LawsNew Mediators
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsVijay Karan
 
M phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsM phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsVijay Karan
 
WGCNA: an R package for weighted correlation network analysis
WGCNA: an R package for weighted  correlation network analysisWGCNA: an R package for weighted  correlation network analysis
WGCNA: an R package for weighted correlation network analysisAlireza Doustmohammadi
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsVijay Karan
 
Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)S3 Infotech IEEE Projects
 
Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)S3 Infotech IEEE Projects
 
Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM
 
Latest seminar topics
Latest seminar topicsLatest seminar topics
Latest seminar topicsPulla Surya
 
friction factor modelling.pptx
friction factor modelling.pptxfriction factor modelling.pptx
friction factor modelling.pptxOKORIE1
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...lauratoni4
 
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...ijwmn
 

Similar to Atelier_GGB_v2 (20)

Network Topologies - Barabasi & Power Laws
Network Topologies - Barabasi & Power LawsNetwork Topologies - Barabasi & Power Laws
Network Topologies - Barabasi & Power Laws
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 Projects
 
Resume
ResumeResume
Resume
 
Resume
Resume Resume
Resume
 
Data fusion
Data fusionData fusion
Data fusion
 
M phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projectsM phil-computer-science-signal-processing-projects
M phil-computer-science-signal-processing-projects
 
WGCNA: an R package for weighted correlation network analysis
WGCNA: an R package for weighted  correlation network analysisWGCNA: an R package for weighted  correlation network analysis
WGCNA: an R package for weighted correlation network analysis
 
Signal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 ProjectsSignal Processing IEEE 2015 Projects
Signal Processing IEEE 2015 Projects
 
Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)
 
Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)Matlab ieee 2014 be, b.tech_completed list_(m)
Matlab ieee 2014 be, b.tech_completed list_(m)
 
Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
 
Latest seminar topics
Latest seminar topicsLatest seminar topics
Latest seminar topics
 
PDN for Machine Learning
PDN for Machine LearningPDN for Machine Learning
PDN for Machine Learning
 
Matlab 2013 14 papers astract
Matlab 2013 14 papers astractMatlab 2013 14 papers astract
Matlab 2013 14 papers astract
 
Biological networks
Biological networksBiological networks
Biological networks
 
Cytoscape Talk 2010
Cytoscape Talk 2010Cytoscape Talk 2010
Cytoscape Talk 2010
 
friction factor modelling.pptx
friction factor modelling.pptxfriction factor modelling.pptx
friction factor modelling.pptx
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
 
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 

Atelier_GGB_v2

Editor's Notes

  1. Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.
  2. Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.[1] The yFiles Organic layout (ORL) is a proprietary closed-source implementation of the force-directed placement paradigm, which combines elements from several layout algorithms to facilitate identification of clusters of tightly connected network modules
  3. integrating network structure and high-throughput data The existing layouts are based on network structure Quantitative data has been visualized so far with nodes coloring
  4. Toy input example. A toy example of an input problem with two distinct JACSs and with front and back nodes. Both JACSs (circled) are connected in the interaction network and heavy in the similarity graph. Note that the four front nodes in the left JACS form a connected subgraph only after the addition of the back node.
  5. (a) Flowchart of the approach. (b) Example illustrating smoothing of patient somatic mutation profiles over a molecular interaction network. Mutated genes are shown in yellow (patient 1) and blue (patient 2) in the context of a gene interaction network. Following smoothing, the mutational activity of a gene is a continuous value reflected in the intensity of yellow or blue; genes with high scores in both patients appear in green (dashed oval). (c) Clustering mutation profiles using non-negative matrix factorization (NMF) regularized by a network. The input data matrix (F) is decomposed into the product of two matrices: one of subtype prototypes (W) and the other of assignments of each mutation profile to the prototypes (H). The decomposition attempts to minimize the objective function shown, which includes a network influence constraint L on the subtype prototypes. k, predefined number of subtypes. (d) The final tumor subtypes are obtained from the consensus (majority) assignments of each tumor after 1,000 applications of the procedures in b and c to samples of the original data set. A darker blue color in the matrix coincides with higher co-clustering for pairs of patients.
  6. 548 patients
  7. Graph Laplacian and dosage balance in interaction networks Assumption: Interacting molecular entities A and B should be balanced in their concentrations (local amounts in space and time) Examples from molecular biology: 1) A and B form complex, A or B along is toxic 2) A and B form functional complex, production of A or B is expensive 3) A is a scaffold for B and C, complex of A:B:C performs a function 4) A regulates B (catalyzes, titrates, ..) 5) A and B compete for a resource, this competition is decisive for a cell fate The approach is formally based on the spectral decomposition of the gene expression measurements with respect to the gene network seen as a graph, followed by an attenuation of the high-frequency components of the expression vectors with respect to the topology of the graph.
  8. Using DeDaL for visualizing the network and RNA-Seq expression data of tissue-specific genes. RNA-Seq dataset for 27 healthy human tissues was used to defined a subnetwork of HPRD PPI database enriched in tissue-specific genes (see the text for explanations). Network smoothing followed by computation of principal manifold was applied to produce the data-driven network layout (DDL). Patterns of gene expression for two selected tissues (brain and spleen) are shown on top the constructed DDL, red color denotes higher expression, green color corresponds to lower expression. The sizes of the nodes are proportional to their connectivity degree in this network. On the left top panel application of the Force Directed layout is shown for comparison. On the left bottom panel results of quantitative comparison between multidimensional distance representation in DeDaL and Force Directed layout are shown. The most representative distances between the genes in the initial multidimensional space (see [28] for details) are ranked here from the largest to the smallest values