Atelier_GGB_v2

•Download as PPTX, PDF•

0 likes•140 views

Urszula Czerwinska

Urszula Czeriwnska, Laurence Calzone, Emmanuel Barillot and Andrei Zinovyev
Atelier Grands Graphes et Bioinformatique - EGC 2016
DEDAL CYTOSCAPE 3 APP
FOR PRODUCING AND MORPHING
DATA-DRIVEN AND STRUCTURE-DRIVEN
NETWORK LAYOUTS
U900 Computational Systems Biology for Cancer

OUTLINE
FROM EXTRATION OF KNOWLEDGE
TO INTELLIGENT LAYOUT
COMBING MUTIDIMENTIONAL DATA
AND NETWORK STRUCTURE
SUMMARY
DEDAL CYTOSCAPE 3.0 APP.

FROM EXTRACTION OF
KNOWLEDGE TO INTELLIGENT
LAYOUT

EXTRACTION OF KNOWLEDGE
IN BIOLOGY
molecular biology interactions -> networks
: the creation of knowledge from
structured and unstructured sources;
the resulting knowledge needs to be in a
machine-readable format;

protein 1 protein 2interaction
node nodeedge
NETWORKS

NETWORKS
Moldovan and D’Andrea 2009
Peri et al., 2004

COMBINING MULTIDIMENSIONAL DATA
AND NETWORK

THERE IS NOT A SINGLE LAYOUT
mapping data on the top of pre-defined
biological network layout
identyfing subnetworks from a global
network processing certain properities
computed from the data
using biological network structure for
pre-processing the high throughput data
1
2
3

T-test statistics
1. MAPPING DATA ON THE TOP OF THE NETWORK
Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012

Ulitsky and Shamir, 2007
2. IDENTYFING SUBNETWORKS

3. USING BIOLOGICAL NETWORK STRUCTURE FOR
PRE-PROCESSING THE HIGH THROUGHPUT DATA
Hofree et al, 2013

MULTIDIMENTIONAL SCALING
represent (dis)similarties as distances
dimension reduction
i.e. Principal Components Analysis

EXAMPLE: GOlorize Cytoscape App.
Garcia et al, 2007

DeDaL Cytoscape 3.0 App
DATA DRIVEN NETWORK LAYOUTS
& MORPHING

Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012
CYTOSCAPE: ORGANIC LAYOUT WITH MAPPED EXPRESSION DATA
T-test statistics

PCA
PCA layout
PC2
PC1
T-test statistics
DEDAL:
Moldovan and D’Andrea 2009
Peri et al., 2004
TGCA, 2012
PRINCIPAL COMPONENT ANALYSIS DRIVEN LAYOUT

PMS
Pure network structure based
layout
Purely Data Driven Layout
PCA or Elmap
Combination of network
structure and data layout
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

T-test statistics
MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

ADVANCED FEATURES OF DEDAL
Pre-processing of the data:
o Smoothing
o Double centering
o Quality check
Post-processing of the layout:
o Alignment
o Overlapping
o Missing values
o Outliers
Data-driven layout: PCA or nPMs
Morphing

NONLINEAR PRINCIPAL MANIFOLDS
Elastic map algorithms
Principal manifolds approximation
Linear PCA vs nonlinear Principal Manifolds for
visalisation of breast cancer microarray data
a) 3D PCA linear manifold.
b) ELMap2D
c) PCA2D
Gorban A.N., Zinovyev A. 2010
-
+

NETWORK SMOOTHING
data
sample
initial space
subspace of
functions smooth
on a gene network
basis vectors are
eigenvectors of the
graph Laplacian
projected
sample
Rapaport et al.,2007
courtesy of A.Zinovyev

A B
C
D
E
F
G
H
I
K
A B
C
D
E
F
G
H
I
K
Smooth distribution,
balanced dosage
Non-smooth distribution,
unbalanced dosage
courtesy of A.Zinovyev
NETWORK SMOOTHING

p=10-23 p=10-5
BIG NETWORK
Czerwinska et al., 2015
1047 nodes
1986 edges
pearson coef.= 0.3 pearson coef.= 0.1

SCALABILITY
Czerwinska et al., 2015
network smoothing
network smoothing, reusing matrix
principal manifold
PCA 10 components
106
104
102
100
10-2
102 103 104
Number of nodes
Computationtime[sec]

TUTORIAL
bioinfo-out.curie.fr/projects/dedal/

There is a need to combine networks
and -omics data in biology
Network layout should be adapted
to the analysis
DeDaL – Cytoscape App. performs
o different types of data-driven layouts
o morphing between strucure-based and
data-driven layout
o pre-processing of data as double
centering and network smoothing
SUMMARY

THANK YOU
Urszula Czerwinska
urszula.czerwinska@curie.fr
@UlaLaParis

Similar to Atelier_GGB_v2

Network Topologies - Barabasi & Power LawsNew Mediators

Signal Processing IEEE 2015 ProjectsVijay Karan

ResumeVivek Pahwa

Resume Vivek Pahwa

Data fusionyousef emami

M phil-computer-science-signal-processing-projectsVijay Karan

WGCNA: an R package for weighted correlation network analysisAlireza Doustmohammadi

Signal Processing IEEE 2015 ProjectsVijay Karan

Matlab ieee 2014 be, b.tech_completed list_(m)S3 Infotech IEEE Projects

Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM

Latest seminar topicsPulla Surya

PDN for Machine LearningSrikanth Chavali

Matlab 2013 14 papers astractIGEEKS TECHNOLOGIES

Biological networksBioinformatics and Computational Biosciences Branch

Cytoscape Talk 2010Stewart MacArthur

friction factor modelling.pptxOKORIE1

Graph Signal Processing for Machine Learning A Review and New Perspectives - ...lauratoni4

MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...ijwmn

Qi liu 08.08.2014Hyun Wong Choi

Similar to Atelier_GGB_v2 (20)

Network Topologies - Barabasi & Power Laws

Signal Processing IEEE 2015 Projects

Resume

Data fusion

M phil-computer-science-signal-processing-projects

WGCNA: an R package for weighted correlation network analysis

Signal Processing IEEE 2015 Projects

Matlab ieee 2014 be, b.tech_completed list_(m)

Network-based machine learning approach for aggregating multi-modal data

Latest seminar topics

PDN for Machine Learning

Matlab 2013 14 papers astract

Biological networks

Cytoscape Talk 2010

friction factor modelling.pptx

Graph Signal Processing for Machine Learning A Review and New Perspectives - ...

MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...

Qi liu 08.08.2014

Atelier_GGB_v2

1. Urszula Czeriwnska, Laurence Calzone, Emmanuel Barillot and Andrei Zinovyev Atelier Grands Graphes et Bioinformatique - EGC 2016 DEDAL CYTOSCAPE 3 APP FOR PRODUCING AND MORPHING DATA-DRIVEN AND STRUCTURE-DRIVEN NETWORK LAYOUTS U900 Computational Systems Biology for Cancer

2. OUTLINE FROM EXTRATION OF KNOWLEDGE TO INTELLIGENT LAYOUT COMBING MUTIDIMENTIONAL DATA AND NETWORK STRUCTURE SUMMARY DEDAL CYTOSCAPE 3.0 APP.

3. FROM EXTRACTION OF KNOWLEDGE TO INTELLIGENT LAYOUT

4. EXTRACTION OF KNOWLEDGE IN BIOLOGY molecular biology interactions -> networks : the creation of knowledge from structured and unstructured sources; the resulting knowledge needs to be in a machine-readable format;

5. protein 1 protein 2interaction node nodeedge NETWORKS

6. NETWORKS Moldovan and D’Andrea 2009 Peri et al., 2004

7. LAYOUTS circular hierarchical organic

8. COMBINING MULTIDIMENSIONAL DATA AND NETWORK

9. THERE IS NOT A SINGLE LAYOUT mapping data on the top of pre-defined biological network layout identyfing subnetworks from a global network processing certain properities computed from the data using biological network structure for pre-processing the high throughput data 1 2 3

10. T-test statistics 1. MAPPING DATA ON THE TOP OF THE NETWORK Moldovan and D’Andrea 2009 Peri et al., 2004 TGCA, 2012

11. Ulitsky and Shamir, 2007 2. IDENTYFING SUBNETWORKS

12. 3. USING BIOLOGICAL NETWORK STRUCTURE FOR PRE-PROCESSING THE HIGH THROUGHPUT DATA Hofree et al, 2013

13. MULTIDIMENTIONAL SCALING represent (dis)similarties as distances dimension reduction i.e. Principal Components Analysis

14. EXAMPLE: GOlorize Cytoscape App. Garcia et al, 2007

15. DeDaL Cytoscape 3.0 App DATA DRIVEN NETWORK LAYOUTS & MORPHING

16. Moldovan and D’Andrea 2009 Peri et al., 2004 TGCA, 2012 CYTOSCAPE: ORGANIC LAYOUT WITH MAPPED EXPRESSION DATA T-test statistics

17. PCA PCA layout PC2 PC1 T-test statistics DEDAL: Moldovan and D’Andrea 2009 Peri et al., 2004 TGCA, 2012 PRINCIPAL COMPONENT ANALYSIS DRIVEN LAYOUT

18. PMS Pure network structure based layout Purely Data Driven Layout PCA or Elmap Combination of network structure and data layout MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

19. PMS Pure network structure based layout Purely Data Driven Layout PCA or Elmap Combination of network structure and data layout MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

20. T-test statistics MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

21. T-test statistics MORPHING DATA-DRIVEN AND STRUCTURE-BASED LAYOUT

22. MORE ADVANCED FEATURES OF DEDAL

23. ADVANCED FEATURES OF DEDAL Pre-processing of the data: o Smoothing o Double centering o Quality check Post-processing of the layout: o Alignment o Overlapping o Missing values o Outliers Data-driven layout: PCA or nPMs Morphing

24. NONLINEAR PRINCIPAL MANIFOLDS Elastic map algorithms Principal manifolds approximation Linear PCA vs nonlinear Principal Manifolds for visalisation of breast cancer microarray data a) 3D PCA linear manifold. b) ELMap2D c) PCA2D Gorban A.N., Zinovyev A. 2010 - +

25. NETWORK SMOOTHING data sample initial space subspace of functions smooth on a gene network basis vectors are eigenvectors of the graph Laplacian projected sample Rapaport et al.,2007 courtesy of A.Zinovyev

26. A B C D E F G H I K A B C D E F G H I K Smooth distribution, balanced dosage Non-smooth distribution, unbalanced dosage courtesy of A.Zinovyev NETWORK SMOOTHING

27. p=10-23 p=10-5 BIG NETWORK Czerwinska et al., 2015 1047 nodes 1986 edges pearson coef.= 0.3 pearson coef.= 0.1

28. SCALABILITY Czerwinska et al., 2015 network smoothing network smoothing, reusing matrix principal manifold PCA 10 components 106 104 102 100 10-2 102 103 104 Number of nodes Computationtime[sec]

29. TUTORIAL bioinfo-out.curie.fr/projects/dedal/

30. TUTORIAL bioinfo-out.curie.fr/projects/dedal/

31. PUBLICATION

32. There is a need to combine networks and -omics data in biology Network layout should be adapted to the analysis DeDaL – Cytoscape App. performs o different types of data-driven layouts o morphing between strucure-based and data-driven layout o pre-processing of data as double centering and network smoothing SUMMARY

33. THANK YOU Urszula Czerwinska urszula.czerwinska@curie.fr @UlaLaParis

Editor's Notes

Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.
Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.[1] The yFiles Organic layout (ORL) is a proprietary closed-source implementation of the force-directed placement paradigm, which combines elements from several layout algorithms to facilitate identification of clusters of tightly connected network modules
integrating network structure and high-throughput data The existing layouts are based on network structure Quantitative data has been visualized so far with nodes coloring
Toy input example. A toy example of an input problem with two distinct JACSs and with front and back nodes. Both JACSs (circled) are connected in the interaction network and heavy in the similarity graph. Note that the four front nodes in the left JACS form a connected subgraph only after the addition of the back node.
(a) Flowchart of the approach. (b) Example illustrating smoothing of patient somatic mutation profiles over a molecular interaction network. Mutated genes are shown in yellow (patient 1) and blue (patient 2) in the context of a gene interaction network. Following smoothing, the mutational activity of a gene is a continuous value reflected in the intensity of yellow or blue; genes with high scores in both patients appear in green (dashed oval). (c) Clustering mutation profiles using non-negative matrix factorization (NMF) regularized by a network. The input data matrix (F) is decomposed into the product of two matrices: one of subtype prototypes (W) and the other of assignments of each mutation profile to the prototypes (H). The decomposition attempts to minimize the objective function shown, which includes a network influence constraint L on the subtype prototypes. k, predefined number of subtypes. (d) The final tumor subtypes are obtained from the consensus (majority) assignments of each tumor after 1,000 applications of the procedures in b and c to samples of the original data set. A darker blue color in the matrix coincides with higher co-clustering for pairs of patients.
548 patients
Graph Laplacian and dosage balance in interaction networks Assumption: Interacting molecular entities A and B should be balanced in their concentrations (local amounts in space and time) Examples from molecular biology: 1) A and B form complex, A or B along is toxic 2) A and B form functional complex, production of A or B is expensive 3) A is a scaffold for B and C, complex of A:B:C performs a function 4) A regulates B (catalyzes, titrates, ..) 5) A and B compete for a resource, this competition is decisive for a cell fate The approach is formally based on the spectral decomposition of the gene expression measurements with respect to the gene network seen as a graph, followed by an attenuation of the high-frequency components of the expression vectors with respect to the topology of the graph.
Using DeDaL for visualizing the network and RNA-Seq expression data of tissue-specific genes. RNA-Seq dataset for 27 healthy human tissues was used to defined a subnetwork of HPRD PPI database enriched in tissue-specific genes (see the text for explanations). Network smoothing followed by computation of principal manifold was applied to produce the data-driven network layout (DDL). Patterns of gene expression for two selected tissues (brain and spleen) are shown on top the constructed DDL, red color denotes higher expression, green color corresponds to lower expression. The sizes of the nodes are proportional to their connectivity degree in this network. On the left top panel application of the Force Directed layout is shown for comparison. On the left bottom panel results of quantitative comparison between multidimensional distance representation in DeDaL and Force Directed layout are shown. The most representative distances between the genes in the initial multidimensional space (see [28] for details) are ranked here from the largest to the smallest values

Atelier_GGB_v2

Recommended

Recommended

More Related Content

Similar to Atelier_GGB_v2

Similar to Atelier_GGB_v2 (20)

Atelier_GGB_v2

Editor's Notes