Clustered graph, visualization, hierarchical visualization

•

0 likes•513 views

February 21st, 2012 Dagstuhl seminar "Information visualization, visual data mining and machine learning", Schloss Dagstuhl, Germany

Clustered graph, visualization, hierarchical
visualization
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
SAMM (Université Paris 1)
2012/02/21 - Dagstuhl seminar 12081
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 1 / 4

Full vs simpliﬁed visualization
Framework: Static graph visualization.
Standard (FDP) approach: visualize the whole graph
aims at being aesthetic ⇒ tends to place the hubs in the center of the
ﬁgure (edges with uniform length); does not emphasize dense groups
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

Full vs simpliﬁed visualization
Framework: Static graph visualization.
Simpliﬁed approach: ﬁnd communities and represent each one by a
glyph
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

Full vs simpliﬁed visualization
Framework: Static graph visualization.
Simpliﬁed approach: ﬁnd communities and represent each one by a
glyph and investigate sub-structure by a hierarchical clustering
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

Basic description
1 Search for communities: node clustering (e.g., modularity
optimization)
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

Basic description
1 Search for communities: node clustering (e.g., modularity
optimization)
Is the clustering relevant / signiﬁcant?
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

Basic description
1 Search for communities: node clustering (e.g., modularity
optimization)
Is the clustering relevant / signiﬁcant?
Possible answer: generate N random graphs with the same degree
distribution and compare the observed optimal modularity to the
optimal modularity distribution among the N random graphs
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

Basic description
1 Search for communities: node clustering (e.g., modularity
optimization)
2 Iterate the clustering in each class in a hierarchical way.
When to stop the process? Is the clustering relevant /
signiﬁcant?
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

Open issues
• Clustering: what is a meaningful clustering? When to stop the
hierarchy?
• Clustering hierarchy representation: how to anticipate, at a given
level, the place needed for the representation of the ﬁnest levels?
• Including estimation about the clustering quality in the
representation: at the node level (“quality” of the clustering for the
cluster? What does that mean?) or at the edge level (contribution to
the modularity between clusters?)
Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 4 / 4

1. The document discusses methods for clustering and differential analysis of Hi-C matrices, which represent the 3D organization of DNA. 2. It proposes extending Ward's hierarchical clustering to directly use Hi-C similarity matrices while enforcing adjacency constraints. A fast algorithm was also developed. 3. A new method called "treediff" was created to perform differential analysis of Hi-C matrices based on the Wasserstein distance between hierarchical clusterings. Software implementations of these methods were also developed.

Méthodes à noyaux pour l’intégration de données hétérogènes

tuxette

The document discusses a presentation about multi-omics data integration methods using kernel methods. The presentation introduces kernel methods, how they can be used to integrate heterogeneous omics data, and examples of applications. Specifically, it discusses using kernel methods to perform unsupervised transformation-based integration of multi-omics data. It also presents an application of constrained kernel hierarchical clustering to analyze Hi-C data by directly using Hi-C matrices as kernels.

Méthodologies d'intégration de données omiques

tuxette

This document presents a presentation on multi-omics data integration methods given by Nathalie Vialaneix on December 13, 2023. The presentation discusses different types of omics data that can be integrated, both vertically across different levels of omics data on the same samples and horizontally across similar types of omics data on different samples. It also discusses different analysis approaches that can be taken, including supervised and unsupervised methods. The rest of the presentation focuses on unsupervised transformation-based integration methods using kernels.

Projets autour de l'Hi-C

tuxette

The document discusses current and future work on analyzing Hi-C data and differential analysis of Hi-C matrices. It describes a clustering method developed to partition chromosomes based on Hi-C matrix similarity. It also introduces a new method called treediff for differential analysis of Hi-C data that calculates the distance between hierarchical clusterings. Current work includes reviewing differential analysis methods, investigating differential subtrees with multiple testing control, and inferring chromatin interaction networks.

Can deep learning learn chromatin structure from sequence?

tuxette

This document discusses a deep learning model called ORCA that can predict chromatin structure from DNA sequence. The model uses a neural network with an encoder to extract features from sequence and a decoder to predict Hi-C matrices. It was trained on Hi-C data from multiple cell types and can predict interactions between regions at various resolutions. The model accurately captures features like CTCF-mediated loops and can predict effects of structural variants on chromatin structure. It allows for in silico mutagenesis to study how mutations may alter 3D genome organization.

Multi-omics data integration methods: kernel and other machine learning appro...

tuxette

The document discusses multi-omics data integration methods, particularly kernel methods. It describes how kernel methods transform data into similarity matrices between samples rather than relying on variable space. Multiple kernel integration approaches are presented that combine multiple similarity matrices into a consensus kernel in an unsupervised manner, such as through a STATIS-like framework that maximizes the similarity between kernels. Examples of applications to datasets from the TARA Oceans expedition are given.

ASTERICS : une application pour intégrer des données omiques

tuxette

This document provides an overview of the MetaboWean and Idefics projects. MetaboWean aims to study the co-evolution of gut microbiota and epithelium during suckling-to-weaning transition in rabbits, using metabolomics, metagenomics, and single-cell RNA sequencing data. Idefics integrates multiple omics datasets from human skin samples to understand relationships between microorganisms and molecules and how they are structured in patient groups. The datasets include metagenomics, metabolomics, and proteomics from host and microbiota.

Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...

tuxette

ASTERICS is an interactive and integrative data analysis tool for omics data. It uses Rserve and PyRserve with Flask and Vue.js in a Docker container to integrate omics data. The backend uses Rserve and PyRserve with Flask on the server side, while the frontend uses Vue.js. This architecture was chosen for its open source and light design. Data communication between Rserve and PyRserve is limited, requiring an object database. ASTERICS is deployed using three Docker containers for R, Python, and

Apprentissage pour la biologie moléculaire et l’analyse de données omiques

tuxette

This document summarizes a scientific presentation about molecular biology and omics data analysis. The presentation covers topics related to analyzing large omics datasets using methods like kernel methods, graphical models, and neural networks to learn gene regulation networks and predict phenotypes. Key challenges addressed are handling big data, missing values, non-Gaussian data types like counts and compositional data. The goal is to better understand complex biological systems from multi-omics data.

Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...

tuxette

The document summarizes preliminary results from evaluating methods for inferring gene regulatory networks from expression data in Bacillus subtilis. It finds that recall of the known network is generally poor (<20% for random forest), but inferred clusters still retain biological information about common regulators. It plans to confirm results, test restricting edges to sigma factors, and explore other inference methods like Bayesian networks and ARACNE.

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...

tuxette

The document discusses methods for integrating multi-scale omics data using kernel and machine learning approaches. It describes how omics data is large, heterogeneous, and multi-scaled, creating bottlenecks for analysis. Methods discussed for data integration include multiple kernel learning to combine different relational datasets in an unsupervised way. The methods are applied to integrate different datasets from the TARA Oceans expedition to identify patterns in ocean microbial communities. Improving interpretability of the methods and making them more accessible to biological users is discussed.

Journal club: Validation of cluster analysis results on validation data

tuxette

This document presents a framework for validating cluster analysis results on validation data. It describes situations where clustering is inferential versus descriptive and recommends using validation data separate from the data used for clustering. A typology of validation methods is provided, including validation based on the clustering method or results, and evaluation using internal validation, external validation, visual properties, or stability measures.

Overfitting or overparametrization?

tuxette

The document discusses the differences between overfitting and overparametrization in machine learning models. It explores how random forests may exhibit a phenomenon known as "double descent" where test error initially decreases then increases with more parameters before decreasing again. While double descent has been observed in other models, the document questions whether it is directly due to model complexity in random forests since very large trees may be unable to fully interpolate extremely large datasets.

Selective inference and single-cell differential analysis

tuxette

This document discusses selective inference and single-cell differential analysis. It introduces the problem of "double dipping" in the standard single-cell analysis pipeline where the same dataset is used for clustering and differential analysis. Two approaches for addressing this are presented: 1) A method that perturbs clusters before testing for differences, and 2) A test based on a truncated distribution that assumes clusters and genes are given separately. Experiments applying these methods to real single-cell datasets are described. The document outlines challenges in extending these approaches to more complex analyses.

SOMbrero : un package R pour les cartes auto-organisatrices

tuxette

SOMbrero is an R package that implements self-organizing map (SOM) algorithms. It can handle numeric, non-numeric, and relational data. The package contains functions for training SOMs, diagnosing results, and plotting maps. It also includes tools like a shiny app and vignettes to aid users without programming experience. SOMbrero supports missing data imputation and extends SOM to relational datasets through non-Euclidean distance measures.

Graph Neural Network for Phenotype Prediction

tuxette

This document describes a study on using graph neural networks (GNNs) for phenotype prediction from gene expression data. The objectives are to determine if including network information can improve predictions, which network types work best, and if GNNs can learn network inferences. It provides background on GNNs and how they generalize convolutional layers to graph data. The authors implemented a GNN model from previous work as a starting point and tested it on different network types to see which network information is most useful for predictions. Their methodology involves comparing GNN performance to other methods like random forests using 10-fold cross validation.

A short and naive introduction to using network in prediction models

tuxette

The document provides an introduction to using network information in prediction models. It discusses representing a network as a graph with a Laplacian matrix. The Laplacian captures properties like random walks on the graph and heat diffusion. Eigenvectors of the Laplacian related to small eigenvalues are strongly tied to graph structure. The document discusses using the Laplacian in prediction models by working in the feature space defined by the Laplacian eigenvectors or directly regularizing a linear model with the Laplacian. This introduces network information and encourages similar contributions from connected nodes. The approaches are applied to problems like predicting phenotypes from gene expression using a known gene network.

Explanable models for time series with random forest

tuxette

Présentation du projet ASTERICS

tuxette

Présentation du projet ASTERICS

tuxette

Kernel methods and variable selection for exploratory analysis and multi-omic...

tuxette

A review on structure learning in GNN

tuxette

This document summarizes different approaches for structure learning in graph neural networks. It discusses three main classes of methods: 1) metric-based learning which learns a similarity matrix between nodes, 2) probabilistic models which learn the parameters of a distribution over graphs, and 3) direct optimization which directly optimizes the graph adjacency matrix. The document provides examples of methods within each class and notes challenges such as the simplicity of probabilistic models and computational difficulties of direct optimization.

La statistique et le machine learning pour l'intégration de données de la bio...

tuxette

This document summarizes a presentation on using statistics and machine learning for integrating high-throughput biological data. It discusses how biological data is large in volume, multi-scaled and heterogeneous in type, creating bottlenecks for analysis. It presents different methods for integrating multiple data tables, including multiple kernel learning to combine similarity matrices. An example application to TARA Oceans data is described, identifying Rhizaria abundance as structuring ocean differences. Interpretability of results is discussed along with prospects for deep learning and predicting phenotypes while understanding relationships.

Graph Neural Network in practice

tuxette

This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.

La famille *down

tuxette

Differential analyses of structures in HiC data

tuxette

When Hi-C matrices are collected from two different conditions, methods can compare the matrices to identify regions with significant structural differences between conditions. TADpole and TADcompare are two available methods. TADpole represents hierarchical TAD structures and detects differences by computing a difference index between normalized binarized matrices. TADcompare represents Hi-C matrices as networks and uses the eigenvectors of the graph Laplacian and gap scores to define boundaries and detect differential boundaries between conditions. Both methods were shown to recover known breakpoints and have boundaries enriched for biological marks.

More from tuxette

Autour des projets Idefics et MetaboWean

tuxette

Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...

tuxette

Apprentissage pour la biologie moléculaire et l’analyse de données omiques

tuxette

Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...

tuxette

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...

tuxette

Journal club: Validation of cluster analysis results on validation data

tuxette

Overfitting or overparametrization?

tuxette

Selective inference and single-cell differential analysis

tuxette

SOMbrero : un package R pour les cartes auto-organisatrices

tuxette

Graph Neural Network for Phenotype Prediction

tuxette

A short and naive introduction to using network in prediction models

tuxette

Explanable models for time series with random forest

tuxette

Présentation du projet ASTERICS

tuxette

Présentation du projet ASTERICS

tuxette

Kernel methods and variable selection for exploratory analysis and multi-omic...

tuxette

A review on structure learning in GNN

tuxette

La statistique et le machine learning pour l'intégration de données de la bio...

tuxette

Graph Neural Network in practice

tuxette

La famille *down

tuxette

Differential analyses of structures in HiC data

tuxette

More from tuxette (20)

Autour des projets Idefics et MetaboWean

Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...

Apprentissage pour la biologie moléculaire et l’analyse de données omiques

Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...

Journal club: Validation of cluster analysis results on validation data

Overfitting or overparametrization?

Selective inference and single-cell differential analysis

SOMbrero : un package R pour les cartes auto-organisatrices

Graph Neural Network for Phenotype Prediction

A short and naive introduction to using network in prediction models

Explanable models for time series with random forest

Présentation du projet ASTERICS

Kernel methods and variable selection for exploratory analysis and multi-omic...

A review on structure learning in GNN

La statistique et le machine learning pour l'intégration de données de la bio...

Graph Neural Network in practice

La famille *down

Differential analyses of structures in HiC data

Clustered graph, visualization, hierarchical visualization

1. Clustered graph, visualization, hierarchical visualization Nathalie Villa-Vialaneix http://www.nathalievilla.org SAMM (Université Paris 1) 2012/02/21 - Dagstuhl seminar 12081 Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 1 / 4

2. Full vs simpliﬁed visualization Framework: Static graph visualization. Standard (FDP) approach: visualize the whole graph aims at being aesthetic ⇒ tends to place the hubs in the center of the ﬁgure (edges with uniform length); does not emphasize dense groups Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

3. Full vs simplified visualization Framework: Static graph visualization. Simplified approach: find communities and represent each one by a glyph Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

4. Full vs simplified visualization Framework: Static graph visualization. Simplified approach: find communities and represent each one by a glyph and investigate sub-structure by a hierarchical clustering Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 2 / 4

5. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

6. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) Is the clustering relevant / signiﬁcant? Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

7. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) Is the clustering relevant / signiﬁcant? Possible answer: generate N random graphs with the same degree distribution and compare the observed optimal modularity to the optimal modularity distribution among the N random graphs Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

8. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) 2 Iterate the clustering in each class in a hierarchical way. When to stop the process? Is the clustering relevant / signiﬁcant? Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

9. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) 2 Iterate the clustering in each class in a hierarchical way. 3 Visualize the graph (in a simpliﬁed way) at various levels of the clustering hierarchy. Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

10. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) 2 Iterate the clustering in each class in a hierarchical way. 3 Visualize the graph (in a simpliﬁed way) at various levels of the clustering hierarchy. How to have consistent representations? (a cluster and its subclusters are approximately displayed at the same place) How to take into account the space needed for a cluster of the last level of the hierarchy in any representation (at any level)? Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

11. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) 2 Iterate the clustering in each class in a hierarchical way. 3 Visualize the graph (in a simpliﬁed way) at various levels of the clustering hierarchy. How to have consistent representations? (a cluster and its subclusters are approximately displayed at the same place) How to take into account the space needed for a cluster of the last level of the hierarchy in any representation (at any level)? Possible solution: Recursively estimate the place needed for each cluster in the hierarchy (by a circle encompassing the visualization of all sub-clusters) ⇒ over-estimation Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

12. Basic description 1 Search for communities: node clustering (e.g., modularity optimization) 2 Iterate the clustering in each class in a hierarchical way. 3 Visualize the graph (in a simpliﬁed way) at various levels of the clustering hierarchy. Include information about the quality of the clustering in the representation? (user warning) Example: Color and weight edges between clusters according to their contribution to the modularity Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 3 / 4

13. Open issues • Clustering: what is a meaningful clustering? When to stop the hierarchy? • Clustering hierarchy representation: how to anticipate, at a given level, the place needed for the representation of the ﬁnest levels? • Including estimation about the clustering quality in the representation: at the node level (“quality” of the clustering for the cluster? What does that mean?) or at the edge level (contribution to the modularity between clusters?) Dagstuhl Seminar 12081 (2012/02/21) Graph visualization & clustering Nathalie Villa-Vialaneix 4 / 4

Clustered graph, visualization, hierarchical visualization

Recommended

Recommended

More Related Content

More from tuxette

More from tuxette (20)

Clustered graph, visualization, hierarchical visualization