SlideShare a Scribd company logo
1 of 29
Download to read offline
Di erential analyses of structures in HiCDi erential analyses of structures in HiC
datadata
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
Chrocogen, November 13th, 2020Chrocogen, November 13th, 2020
1 / 291 / 29
      
2 / 292 / 29
Description ofthe scope ofthe articlesDescription ofthe scope ofthe articles
3 / 293 / 29
Topic
(What is this presentation about?)
When two sets of Hi-C matrices have been collected in two different
conditions, what are the available methods to compare the matrices and
identify regions that are significantly different between the conditions?
Comparison usually means: at a bin pair level:
4 / 29
Topic
(What is this presentation about?)
When two sets of Hi-C matrices have been collected in two different
conditions, what are the available methods to compare the matrices and
identify regions that are significantly different between the conditions?
Comparison usually means: at a bin pair level but here: structure level
(differences between TADs or TAD boundaries)
5 / 29
TADpoleTADpole
6 / 296 / 29
Main features
R package available on github (only?)
Main purpose of TADpole: represent the hierarchical structure of TAD,
sub-TADs and meta-TADs. Can secondarily be used to detect differences
between TADs
7 / 29
Method 1/3: TADpole for one HiCmatrix
remove "bad columns"
compute correlation matrix
perform PCA using the rows of as a representation of the bins extract
eigenvectors ( representation of bins as elements of )
Warning: Since 1/ is not sparse and 2/ PCA is expensive, the approach is
performed on half chromosomes (centromere is estimated using the
correlation)
Σ
Σ ⇒
Np ∼ R
Np
Σ
8 / 29
Method 2/3: TADpole for one HiCmatrix
Perform Ward's constrained HAC on the eigenvectors to represent HiC as a
dendrogram!! (package rioja is used :'( )
Cut the dendrogram with a broken stick heuristic (not model! :'( ): this
gives TADs
Ratio between intra and inter cluster variance is used to find the most
relevant dimension (and also the optimal number of clusters/TADs...?)Np 9 / 29
Method 3/3: TADpole for one HiCmatrix
10 / 29
Howto use TADpole for comparing matrices?
Framework: 2 matrices, one in each condition. TADpole has been used on
each of them
Computing a difference index between matrix and matrix for a given
bin : where:
: entry of binarized HiC matrix ($i$ and are in the same cluster)
this quantity is normalized to stay between 0 and 1
Personal note: I don't get why the quantity is summed over the beginning of
the matrix... ( )
H
1
H
2
b D(H
1
, H
2
)(b) = ∑
i≤b
∑
p
j=1
|
~
h
1
ij
−
~
h
2
ij
|
~
h
k
ij
(i, j) j
∑
i<b
11 / 29
p-value derivation
Random test:
generate random partitions (clusters)
compute the Diff statistics between and the random partitions -
value for bin (in practice used only on a 2Gb portion of the genome)
Note: this is not symmetric between and !
10
4
H
1
⇒ p
b
H
1
H
2
12 / 29
Evaluation
1. One HiC dataset transformed into 24 HiC matrices (four resolutions 2
normalization + raw data and 12 down-sampling of one of the matrix)
used for: comparing several TAD callers (as in [Zufferey et al, 2018])
by comparing domains accross different resolutions
by measuring the concordance between two partitions (MI measure)
by assessing the computational performances of the tools
by using biological evidences (histone mark or structural protein profiles,
FC at TAD boundaries, ratio of TAD boundaries hosting a SP, ratio of ChIP-
seq signals in TAD bodies)
1. Two cHiC experiments (one chromosome, one genomic interval), based
on the two homozygous strains (mouse, embryonic), one WT and one
mutant
×
13 / 29
Results
TADpole gives replicable results
14 / 29
Results
TADpole is in accordance with biological evidence
15 / 29
Results
TADpole can recover a breakpoint between two conditions
16 / 29
Results
TADpole can recover a breakpoint between two conditions
17 / 29
TADcompareTADcompare
18 / 2918 / 29
Main features
R package available on github (and submitted to Bioconductor)
by the same authors than HicCompare and multiHicCompare (based on
MD corrections)
Main purpose of TADcompare: represent the HiC matrices as networks
and derive a bin gap score to detect boundaries (same exact idea than in
[Cresswell et al., 2020] on SpectralTAD). Use that score to derive
differential boundaries
19 / 29
Method: connexion to spectral clustering
Main idea: HiC matrix is a graph so use tools dedicated to graphs.
Laplacian of a graph: (where is the HiC matrix without
its diagonal and )
Laplacian, graph structure and spectral clustering (see [von Luxburg, 2007]):
eigenvectors associated to eigenvalue 0 gives the connected components
of the graph
L = D
−1/2
HD
1−2
H
D = Diag(1
⊤
p
H)
20 / 29
Method: connexion to spectral clustering
Main idea: HiC matrix is a graph so use tools dedicated to graphs.
Laplacian of a graph: (where is the HiC matrix
without its diagonal and )
Laplacian, graph structure and spectral clustering (see [von Luxburg, 2007]):
eigenvectors associated to eigenvalue 0 gives the connected components
of the graph
other eigenvalues are and corresponding
eigenvectors provide increasingly noisy information about the main
structures (clusters) in the graph
spectral clustering: take the first eigenvectors (smallest eigenvalues) and
use them as representations of graph nodes (here, bins) in for -means
L = D
−1/2
(D − H)D
1−2
H
D = Diag(1
⊤
p
H)
0 < λ1 < λ2 <. . . < λp−k
d
R
d
k
21 / 29
TADcompare method
compute eigen-decomposition of and extract the first 2 eigenvectors
(length: )
replace the HiC matrix by a representation of the bins with (so bin
is in , )
cuisine: normalization: (I guess but very unclear in both
articles)
distance between bins and (called gap score of ):
(again, very unclearly written)
magic trick: this is distributed as a log-normal...
boundary scores: (said to follow which is not
true... because would be the proper score)
more cuisine: spectral decomposition is performed with sliding windows
of 15 bins to avoid having to handle a large spectral decomposition
L
(v1 , v2 ) p
[v1 , v2 ]
i R
2
v
i
z
i
=
v
i
∥v
i
∥
i i − 1 i
Di = ∥v
i
− v
i−1
∥
log Di ∼ N (μ, σ) ⇒
Bi =
log Di−μ
σ
2
N (0, 1)
log Di−μ
σ
22 / 29
Using the approach to detect di erential TADs
two matrices: are gap scores of bin for matrix
pseudo-maths: (note: this is
not true in general...)
new differential boundary scores: (also said to follow
) -values
Time course version: monitor medians of differential scores with
accross multiple replicates and identify breaks in this values
D
k
i
i k ∈ {1, 2}
log(D
1
i
) − log(D
2
i
) ∼ N (μ1 − μ2 , σ
2
1
+ σ
2
2
)
DBi =
σ
2
1
B
1
i
−σ
2
2
B
2
i
σ
2
1
+σ
2
2
N (0, 1) ⇒ p
t = 0
23 / 29
Evaluation
The method is evaluated for:
boundary discovery (enrichment in proteins with permutation tests)
boundary difference discovery (also colocalized boundaries enrichment
with permutation tests)
Data: from [Forcato et al, 2017] (repository), time course data from human
colon cancer cell line at four time points after auxin treatment
Scripts: R package on Bioconductor + scripts in a repository
24 / 29
Results
SpektralTAD detects more clearly TAD boundaries
25 / 29
Results
Boundaries are mostly consistent between technical/biological replicates
26 / 29
Results
ND boundaries are more enriched in biological marks (well, of course...?)
27 / 29
Results
Consensus boundary score (sum of log scores) improves biological relevance
28 / 29
References
Cresswell KG, Dozmorov MG (2020) TADCompare: an R package for differential and temporal
analysis of topologically associated domains. Frontiers in Genetics 11: 158
Cresswell KG, Stansfield JC and Dozmorov MG (2020) SpectralTAD: an R package for defining
a hierarchy of topologically associated domains using spectral clustering. BMC
Bioinformatics 21: 319
Forcato M, Nicoletti C, Pal K, Livi CM, Ferrari F, Bicciato S (2017) Comparison of
computational methods for Hi-C data analysis. Nature Methods 14: 679-685
von Luxburg U (2007) A Tutorial on Spectral Clustering. Statistics and Computing 17(4): 395-
416
Soler-Vila P, Cuscó P, Farabella I, Di Stefano M, Marti-Renom M.A. (2020) Hierarchical
chromatin organization detected by TADpole. Nucleic Acid Research 48(7): e39
Zufferey M, Tavernari D, Oricchio E, Ciriello G (2018) Comparison of computational methods
for the identification of topologically associated domains. Genome Biology 19: 217-234
29 / 29

More Related Content

What's hot

Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Information Content of Complex Networks
Information Content of Complex NetworksInformation Content of Complex Networks
Information Content of Complex NetworksHector Zenil
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity datatuxette
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 

What's hot (15)

Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Information Content of Complex Networks
Information Content of Complex NetworksInformation Content of Complex Networks
Information Content of Complex Networks
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 

Similar to Differential analyses of structures in HiC data

'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualizationtuxette
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn osalisha230390
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documentslau
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 
InternshipReport
InternshipReportInternshipReport
InternshipReportHamza Ameur
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN RishirajChakraborty4
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsPatrick Diehl
 

Similar to Differential analyses of structures in HiC data (20)

'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualization
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
for sbi so Ds c c++ unix rdbms sql cn os
for sbi so   Ds c c++ unix rdbms sql cn osfor sbi so   Ds c c++ unix rdbms sql cn os
for sbi so Ds c c++ unix rdbms sql cn os
 
MLMM_16_08_2022.pdf
MLMM_16_08_2022.pdfMLMM_16_08_2022.pdf
MLMM_16_08_2022.pdf
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
Coclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain DocumentsCoclustering Base Classification For Out Of Domain Documents
Coclustering Base Classification For Out Of Domain Documents
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
SPAA11
SPAA11SPAA11
SPAA11
 
On the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic modelsOn the treatment of boundary conditions for bond-based peridynamic models
On the treatment of boundary conditions for bond-based peridynamic models
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
La famille *down
La famille *downLa famille *down
La famille *downtuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
La famille *down
La famille *downLa famille *down
La famille *down
 

Recently uploaded

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Differential analyses of structures in HiC data

  • 1. Di erential analyses of structures in HiCDi erential analyses of structures in HiC datadata Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT Chrocogen, November 13th, 2020Chrocogen, November 13th, 2020 1 / 291 / 29
  • 2.        2 / 292 / 29
  • 3. Description ofthe scope ofthe articlesDescription ofthe scope ofthe articles 3 / 293 / 29
  • 4. Topic (What is this presentation about?) When two sets of Hi-C matrices have been collected in two different conditions, what are the available methods to compare the matrices and identify regions that are significantly different between the conditions? Comparison usually means: at a bin pair level: 4 / 29
  • 5. Topic (What is this presentation about?) When two sets of Hi-C matrices have been collected in two different conditions, what are the available methods to compare the matrices and identify regions that are significantly different between the conditions? Comparison usually means: at a bin pair level but here: structure level (differences between TADs or TAD boundaries) 5 / 29
  • 7. Main features R package available on github (only?) Main purpose of TADpole: represent the hierarchical structure of TAD, sub-TADs and meta-TADs. Can secondarily be used to detect differences between TADs 7 / 29
  • 8. Method 1/3: TADpole for one HiCmatrix remove "bad columns" compute correlation matrix perform PCA using the rows of as a representation of the bins extract eigenvectors ( representation of bins as elements of ) Warning: Since 1/ is not sparse and 2/ PCA is expensive, the approach is performed on half chromosomes (centromere is estimated using the correlation) Σ Σ ⇒ Np ∼ R Np Σ 8 / 29
  • 9. Method 2/3: TADpole for one HiCmatrix Perform Ward's constrained HAC on the eigenvectors to represent HiC as a dendrogram!! (package rioja is used :'( ) Cut the dendrogram with a broken stick heuristic (not model! :'( ): this gives TADs Ratio between intra and inter cluster variance is used to find the most relevant dimension (and also the optimal number of clusters/TADs...?)Np 9 / 29
  • 10. Method 3/3: TADpole for one HiCmatrix 10 / 29
  • 11. Howto use TADpole for comparing matrices? Framework: 2 matrices, one in each condition. TADpole has been used on each of them Computing a difference index between matrix and matrix for a given bin : where: : entry of binarized HiC matrix ($i$ and are in the same cluster) this quantity is normalized to stay between 0 and 1 Personal note: I don't get why the quantity is summed over the beginning of the matrix... ( ) H 1 H 2 b D(H 1 , H 2 )(b) = ∑ i≤b ∑ p j=1 | ~ h 1 ij − ~ h 2 ij | ~ h k ij (i, j) j ∑ i<b 11 / 29
  • 12. p-value derivation Random test: generate random partitions (clusters) compute the Diff statistics between and the random partitions - value for bin (in practice used only on a 2Gb portion of the genome) Note: this is not symmetric between and ! 10 4 H 1 ⇒ p b H 1 H 2 12 / 29
  • 13. Evaluation 1. One HiC dataset transformed into 24 HiC matrices (four resolutions 2 normalization + raw data and 12 down-sampling of one of the matrix) used for: comparing several TAD callers (as in [Zufferey et al, 2018]) by comparing domains accross different resolutions by measuring the concordance between two partitions (MI measure) by assessing the computational performances of the tools by using biological evidences (histone mark or structural protein profiles, FC at TAD boundaries, ratio of TAD boundaries hosting a SP, ratio of ChIP- seq signals in TAD bodies) 1. Two cHiC experiments (one chromosome, one genomic interval), based on the two homozygous strains (mouse, embryonic), one WT and one mutant × 13 / 29
  • 15. Results TADpole is in accordance with biological evidence 15 / 29
  • 16. Results TADpole can recover a breakpoint between two conditions 16 / 29
  • 17. Results TADpole can recover a breakpoint between two conditions 17 / 29
  • 19. Main features R package available on github (and submitted to Bioconductor) by the same authors than HicCompare and multiHicCompare (based on MD corrections) Main purpose of TADcompare: represent the HiC matrices as networks and derive a bin gap score to detect boundaries (same exact idea than in [Cresswell et al., 2020] on SpectralTAD). Use that score to derive differential boundaries 19 / 29
  • 20. Method: connexion to spectral clustering Main idea: HiC matrix is a graph so use tools dedicated to graphs. Laplacian of a graph: (where is the HiC matrix without its diagonal and ) Laplacian, graph structure and spectral clustering (see [von Luxburg, 2007]): eigenvectors associated to eigenvalue 0 gives the connected components of the graph L = D −1/2 HD 1−2 H D = Diag(1 ⊤ p H) 20 / 29
  • 21. Method: connexion to spectral clustering Main idea: HiC matrix is a graph so use tools dedicated to graphs. Laplacian of a graph: (where is the HiC matrix without its diagonal and ) Laplacian, graph structure and spectral clustering (see [von Luxburg, 2007]): eigenvectors associated to eigenvalue 0 gives the connected components of the graph other eigenvalues are and corresponding eigenvectors provide increasingly noisy information about the main structures (clusters) in the graph spectral clustering: take the first eigenvectors (smallest eigenvalues) and use them as representations of graph nodes (here, bins) in for -means L = D −1/2 (D − H)D 1−2 H D = Diag(1 ⊤ p H) 0 < λ1 < λ2 <. . . < λp−k d R d k 21 / 29
  • 22. TADcompare method compute eigen-decomposition of and extract the first 2 eigenvectors (length: ) replace the HiC matrix by a representation of the bins with (so bin is in , ) cuisine: normalization: (I guess but very unclear in both articles) distance between bins and (called gap score of ): (again, very unclearly written) magic trick: this is distributed as a log-normal... boundary scores: (said to follow which is not true... because would be the proper score) more cuisine: spectral decomposition is performed with sliding windows of 15 bins to avoid having to handle a large spectral decomposition L (v1 , v2 ) p [v1 , v2 ] i R 2 v i z i = v i ∥v i ∥ i i − 1 i Di = ∥v i − v i−1 ∥ log Di ∼ N (μ, σ) ⇒ Bi = log Di−μ σ 2 N (0, 1) log Di−μ σ 22 / 29
  • 23. Using the approach to detect di erential TADs two matrices: are gap scores of bin for matrix pseudo-maths: (note: this is not true in general...) new differential boundary scores: (also said to follow ) -values Time course version: monitor medians of differential scores with accross multiple replicates and identify breaks in this values D k i i k ∈ {1, 2} log(D 1 i ) − log(D 2 i ) ∼ N (μ1 − μ2 , σ 2 1 + σ 2 2 ) DBi = σ 2 1 B 1 i −σ 2 2 B 2 i σ 2 1 +σ 2 2 N (0, 1) ⇒ p t = 0 23 / 29
  • 24. Evaluation The method is evaluated for: boundary discovery (enrichment in proteins with permutation tests) boundary difference discovery (also colocalized boundaries enrichment with permutation tests) Data: from [Forcato et al, 2017] (repository), time course data from human colon cancer cell line at four time points after auxin treatment Scripts: R package on Bioconductor + scripts in a repository 24 / 29
  • 25. Results SpektralTAD detects more clearly TAD boundaries 25 / 29
  • 26. Results Boundaries are mostly consistent between technical/biological replicates 26 / 29
  • 27. Results ND boundaries are more enriched in biological marks (well, of course...?) 27 / 29
  • 28. Results Consensus boundary score (sum of log scores) improves biological relevance 28 / 29
  • 29. References Cresswell KG, Dozmorov MG (2020) TADCompare: an R package for differential and temporal analysis of topologically associated domains. Frontiers in Genetics 11: 158 Cresswell KG, Stansfield JC and Dozmorov MG (2020) SpectralTAD: an R package for defining a hierarchy of topologically associated domains using spectral clustering. BMC Bioinformatics 21: 319 Forcato M, Nicoletti C, Pal K, Livi CM, Ferrari F, Bicciato S (2017) Comparison of computational methods for Hi-C data analysis. Nature Methods 14: 679-685 von Luxburg U (2007) A Tutorial on Spectral Clustering. Statistics and Computing 17(4): 395- 416 Soler-Vila P, Cuscó P, Farabella I, Di Stefano M, Marti-Renom M.A. (2020) Hierarchical chromatin organization detected by TADpole. Nucleic Acid Research 48(7): e39 Zufferey M, Tavernari D, Oricchio E, Ciriello G (2018) Comparison of computational methods for the identification of topologically associated domains. Genome Biology 19: 217-234 29 / 29