'ACCOST' for differential HiC analysis

‘ACCOST’ for di erential HiC analysis‘ACCOST’ for di erential HiC analysis
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
Chrocogen, July 10th, 2020Chrocogen, July 10th, 2020
1 / 241 / 24

First ofall... in the previous episodes...First ofall... in the previous episodes...
3 / 243 / 24

Topic
(What is this presentation about?)
When two sets of Hi-C matrices have been collected in two different
conditions, what are the available methods to compare the matrices and
identify regions that are significantly different between the conditions?
Comparison usually means: at a bin pair level.
4 / 24

Notations and formal de nition ofthe problem
Hi-C matrices: for , Hi-C matrices
Conditions: 2 conditions and such that and
Interactions: is the interaction frequency (in ) for bin pair
where and are two genomic loci, in the matrix
Question: for all pair , test the following assumption:
in which is the random variable that represents the number of contacts
(interaction frequency) between loci and in condition .
H
t
t =, … , T T
C1 C2 C1 ∪ C2 = {1, … , T }
C1 ∩ C2 = ∅
h
t
ij
N
+
(i, j)
i j t
(i, j)
H
ij
0
: N
C1
ij
= N
C2
ij
N
Cr
ij
i j Cr
5 / 24

1. Prior di erential analysis
Most methods start to correct sequencing bias (between matrices
normalization)
Standard sequencing depth normalization [Anders & Huber, 2010] to
obtain equal total number of counts between the different samples (R
package edgeR)
MA plot correction [Lun & Smyth, 2015] and improvement by [Stansfield
et al, 2019] to correct trend in MA (mean versus difference) plots for every
pair of samples (R packages diffHic/csaw and multiHiCcompare)
MD plot correction [Stansfield et al, 2018] to correct trend in MD (distance
versus difference) plots for every pair of samples (R package HiCcompare)
6 / 24

2. Compute a -value per bin
Z score computation [Stansfield et al, 2018] that is based on quantiles of
scaled and centered M values (R package HiCcompare)
that is used when there is no replicate (one sample per
condition)
that is very fast and easy to use
but is a bit low on the theoretical side (no strong evidence)
p
T = 2
8 / 24

Z score computation [Stansfield et al, 2018], (R package HiCcompare)
models [Lun & Smyth, 2015] that is based on Negative Binomial GLM
and statistical tests (R package diffHic)
that needs at least 3 replicates per condition to be used
that is not restricted to two conditions and that can include various
covariates
but is statistically better justified
[Stansfield et al., 2019] (R package multiHiCompare) also do that with small
changes (normalization...)
[Zaborowski and Wilczyński, 2020] also use this distribution but within
distance pools and counts are explained by counts in the other condition
rather than by the condition itself
p
N B
9 / 24

Z score computation [Stansfield et al, 2018], (R package HiCcompare)
models [Lun & Smyth, 2015], (R package diffHic)
In both these approaches, a -value is computed for every bin pair and -
values are corrected by multiple correction procedures (not described)
But spatial dependencies between pairs of bins are not included in the
methods!!
p
N B
p p
10 / 24

2. Compute a -value per bin taking spatial
dependencies into account
Using an analogy with neuroimaging and spatial Poisson processes
[Djekidel et al, 2018] (R package FIND)
needs at least 2 replicates per condition to be used
seems to be restricted to two conditions (but could maybe be easily
extended to more) and can include various covariates
is statistically (more or less) justified (from previous work on image
analysis)
uses tests at bin pair level with multiple corrections but those tests are
based on the value of the bin pair and its neighbors
is shown to work well for high resolution differential analysis (seems
to provide better results for 5kb bins)
p
11 / 24

2. Compute a -value per bin taking spatial
dependencies into account
Using an analogy with neuroimaging and spatial Poisson processes
[Djekidel et al, 2018] (R package FIND)
Using distance based correction and Gaussian filter comparison [Ardakany
et al, 2019] (Python/Matlab scripts selfish available on Github):
not sensitive to sequencing bias and does not require (between
matrix) normalization
only suited to (no replicate) for 2 conditions
p
T = 2
12 / 24

Other tools (not reviewed for the moment... but
mentionned in the article)
HOMER (binomial based test between two samples)
ChromoR (transformation into Gaussian measurements and Bayesian factor
analysis)
HiBrowse (based on edgeR, as diffHic and others)
13 / 24

Now... coming back to ACCOST!Now... coming back to ACCOST!
14 / 2414 / 24

ACCOST overview
suited only for 2 conditions with replicates (even though the
computations may work even without replicates)
based on DESeq (very similar to diffHic or multiHiCompare)
first: ICE normalization (within matrix normalization)
accounts for the distance effect in the matrix with the addition of an offset
in the model (does not require within matrix normalization, nor distance
based correction)
bitckucket python scripts available
15 / 24

Main hypotheses ofACCOST
with mean and standard deviation with:
where is the condition for sample and
is an experiment specific vector of locus biaises for locus in
sample
is a distance specific size factor that accounts for the genomic
distance effect
is the true (unknown) number of interactions between and in
condition (on which the test is based)
a similar decomposition for that depends on the parametric estimation
of a function , which models the dispersion as a smooth non-
negative function of the interaction
N
t
ij
∼ N B μ
t
ij
σ
t
ij
μ
t
ij
= β
t
i
β
t
j
s
t
|i−j|
q
k(t)
ij
k(t) t
β
t
i
i
t
s
t
|ij|
q
k(t)
ij
i j
k(t)
σ
t
ij
ν
k(t)
(q
k(t)
ij
16 / 24

Howare ACCOST parameters obtained?
is set as the ICE normalization factor of ICE for locus in sample
is obtained as the median of ICE normalized counts for pairs of loci
at distance ,
is then obtained by averaging corrected counts accross replicates of
the same condition:
is finally estimated by a polynomial regression (details skipped for
the sake of clarity but basically very similar to what is performed in DESeq
with the distance based corrected counts )
μ
t
ij
= β
t
i
β
t
j
s
t
|i−j|
q
k(t)
ij
β
t
i
i t
s
t
|i−j|
|i − j| median|i
′
−j
′
|=d
N
t
i′j′
β
t
i
′
β
t
j
′
q
k(t)
ij
q
k(t)
ij
= ∑
t∈Ck
1
|Ck|
N
t
ij
β
t
i
β
t
j
s
k
|i−j|
ν
k(t)
q
k(t)
ij
17 / 24

Validation ofACCOST
datasets: two human cell lines from [Rao et al, 2014], two mouse datasets
from [Dixon et al, 2012] and [Sehn et al, 2012], a Plasmodium dataset with
two distinct stages of the parasite from [Ay et al, 2014]
methods: diffHic and FIND
18 / 24

p-value distribution
short vs long range distances
21 / 24

Signi cant results locations
increase of significant contacts at 50 kb corresponds to a threshold related to
LOESS normalization
22 / 24

References
Ardakany, A.R., Ay, F., and Lonardi, S. (2019). Selfish: discovery of differential chromatin
interactions via a self-similarity measure. Bioinformatics, 35(14):i145--i153.
Ay, F., Bunnik, E.M., Varoquaux, N., Bol, S.M., Prudhomme, J., Vert, J.P., Noble, W.S., Le Roch,
K.G. (2014) Three-dimensional modeling of the P.falciparum genome during the erythrocytic
cycle reveals a strong connection between genome architecture and gene expression.
Genome Research, 24:974--988.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B. (2012)
Topological domains in mammalian genomes identified by analysis of chromatin
interactions. Nature, 485: 376--380.
Djekidel, M.N., Chen, Y., and Zhang, M. Q. (2018). FIND: difFerential chromatin INteractions
Detection using a spatial Poisson process. Genome Research, 28:412--422.
Lun, A. and Smyth, G. (2015). diffHic: a Bioconductor package to detect differential genomic
interactions in Hi-C data. BMC Bioinformatics, 16:258.
Rao, S.S.P. et al. (2014). A 3D map of the human genome at kilobase resolution reveals
principles of chromatin looping. Cell, 159: 1665--1680.
Shen, Y., Yue, F., McCleary, D.F., Ye, Z., Edsall, L., Kuan, S., Wagner, U., Dixon, J., Lee, L.,
Lobanenkov, V.V. et al. (2012) A map of the cis-regulatory sequences in the mouse genome.
Nature, 488: 116-120.
23 / 24

References
Stansfield, J.C., Cresswell, K.G., Vladimirov, V.I., and Dozmorov, M.G. (2018). HiCcompare: an
R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics,
19:279.
Stansfield, J.C., Cresswell, K.G., and Dozmorov, M.G. (2019). multiHiCcompare: joint
normalization and comparative analysis of complex Hi-C experiments. Bioinformatics,
35(17): 2916-2923.
Zaborowski, R. and Wilczyński, B. (2020). DiADeM: differential analysis via dependency
modelling of chromatin interactions with robust generalized linear models. bioRxiv preprint.
24 / 24

'ACCOST' for differential HiC analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 'ACCOST' for differential HiC analysis

Similar to 'ACCOST' for differential HiC analysis (20)

More from tuxette

More from tuxette (20)

Recently uploaded

Recently uploaded (20)

'ACCOST' for differential HiC analysis