The document discusses how Hi-C data can be used to identify Topological Associated Domains (TADs) in genomes. It describes the Hi-C processing pipeline, including read mapping, filtering, and creation of a contact matrix. A hidden Markov model is used to detect TADs based on directionality indexes calculated from the matrix. TADs are found to be stable across cell types but interactions within domains can be dynamic. Factors like insulators, genes and heterochromatin boundaries are enriched at TAD boundaries.
3. Background
β’ Despite revealing the sequence of the genome, little is known about its 3D structure
β’ high-throughput chromosome capture (Hi-C) is 3C-based technology
β’ it can detect chromatin interactions between loci across the entire genome
Biological experiment:
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data." Quantitative Biology 1.
4. Background
β’ Hi-C in the chromatin conformation study map
Smallwood, A. and B. Ren (2013). "Genome organization and long-range regulation of gene expression by enhancers." Current opinion in cell biology 25(3):
387-394.
5. Background- Processing pipeline
β’ 4 main steps:
β’ Read mapping : Each side (50 bp) is mapped independently to the reference genome
β’ Read level filtering
β’ Fragment filtering : Filter fragments with low mappability score
β’ Creation of the Hi-C contact matrix
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data." Quantitative Biology 1.
6. Background- Processing pipeline
β’ Read filtering step : The flowing types of reads should be removed :
β’ Self-ligation reads:
β’ Dangling reads : un-ligated reads
β’ PCR amplification reads: many reads that map to the same location
β’ Random breaking reads : reads located far from the enzyme cutting site (π1 + π2 > 500ππ )
7. Background- Processing pipeline
β’ Fragment filtering step : Remove fragments with low mappability score (< 0.5)
β’ fragment near centromere or telomere regions tends to contain a large proportion of repetitive sequence and
leads to a low mappability score
β’ Additional suggestions :
β’ Remove fragments with <100bp or > 100 kb
β’ Remove 0.5% of the fragments with the highest number of reads (can be source of PCR artifacts)
8. Background
β’ Construction of the Hi-C interaction matrixοΌ
β’ The number of Enzyme cut-site is 1012
, however a typical Hi-C experiment generate 108
reads
β’ Thus, we need to partition the genome into large scale bins.
Processing pipeline:
Hi-C vs FISH
9. Discussed paper
β’ Aim :
β’ Investigate the 3D dimensional organization of the human and mouse genome in ES
and differentiated cell.
β’ Data :
β’ Mouse :
β’ Mouse embryonic stem cell (mESC)
β’ Cortex cell (generated by another group)
β’ Human :
β’ Human embryonic stem cell (hESC)
β’ IMR90
11. Data control (2)
Compare 5C generated data for the HoxA
locus (correlation > 0.73)
Compare with Phc1 locus 3C data
Compare with FISH data of 6 loci
14. Identification of topological domains
Step1: Detection of the interaction bias
We notice that in aTAD that :
β’ The upstream portion is highly biased to interact
downstream
β’ The downstream portion is highly biased to interact
upstream
a directionality index (ID) was defined to calculate this bias:
β’ π·πΌ > 0 ο Upstream bias
β’ π·πΌ < 0 ο Downstream bias
β’ π·πΌ the extent of the interaction
16. Domain detection (1)
β’ Each bin can have 3 states :
β’ Upstream biased
β’ Downstream biased
β’ No bias
β’ Use a HMM based on the DI to infer the biased state
β’ We define :
β’ π = [π π, π π, β¦ , π π] :The observed DI
β’ πΈ = [πΈ π, πΈ π, β¦ , πΈ π] :The hidden bias ππ β {π·, π, π}
β’ π΄ = π΄ π, π΄ π, β¦ , π΄ π : π β [1,20]
β’ The probabilities are calculated as follow:
β’ π· ππ πΈ π = π, π΄π ) = π ππ; πππ, πΊππ
β’ π· π΄π = π πΈ π = π) = πͺ(π, π)
β’ πͺ(π, π): the mixture weight
D D D D U U U N N N D D D U U
Domain Boundary Domain
` ` `
π1 π2
π3
πΈ π
ππ
π΄ π
πΈ π+π
ππ+π
π΄π+π
D
U
N
17. Domain detection (1)
β’ The region between twoTAD is termed :
β’ Topological boundary : if size < 400kb
β’ Unrecognized chromatin : if size β₯ 400 kb
18. What separates twoTADs
β’ Studied the HoxA locus known to be separated into two compartments
β’ Found that the CS5 insulator resides in the boundary
β’ Maybe insulators are enriched at the boundary ?
19. CTFC role in the boundary
β’ Studied other known insulator CTCF
20. Heterochromatin and boundary
β’ the H3K9me3 profile changed between cells hESC and IMR90 but the boundaries structure didnβt change
β’ potential link between the topological domains and transcriptional control in the mammalian genome
23. Cell type specific interactions
β’ A binomial test is performed for each 20kb bin to determine is it is cell specific
β’ Calculate π = π° ππ¬πΊπͺ + π° ππππππ , the number of possible interactions at a distance π
β’ Calculate the expected value π =
π° ππ¬πΊπͺ
π
or π =
π° ππππππ
π
β’ Then for each bin do a binomial-test to see if there is a deviation in the number cell specific
interactions
d d d d
π = π + π + π + π + π + π + π + π = ππ
mESC
Cortex
π =
π
ππ
or π =
π
ππ
24. Cell type specific interactions
β’ 20% of the genes that have a FCβ₯ 4 are found in dynamic interacting loci.
β’ > 96% of the dynamic interactions occur in the same domain.
β’ Model :
β’ domain organization is stable between cell types
β’ but the regions within each domain may be dynamic,
25. Factors forming the boundary (1)
β’ Boundaries are enriched for active promoter signals and gene bodies
27. TAD vs A/B compartments (1)
β’ Loci found clustered in A compartments
are generally:
β’ gene rich,
β’ transcriptionally active,
β’ and DNase I hypersensitive,
Lieberman-Aiden, E., et al. (2009), Science (New York, N.Y.) 326(5950): 289-293.
Compartment B
CompartmentA
β’ Loci found clustered in B compartments
are generally:
β’ gene poor,
β’ transcriptionally silent
β’ and DNase I insensitive
At a higher order the chromatin is organized into A and B compartments
28. TAD vs A/B compartments (2)
TAD are smaller than A/B compartments
29. TAD vs A/B compartments (3)
In summary :
Gibcus, J. and J. Dekker (2013). "The hierarchy of the 3D genome." Molecular cell 49(5): 773-782.
30. TAD vs A/B compartments (4)
In summary :
Gibcus, J. and J. Dekker (2013). "The hierarchy of the 3D genome." Molecular cell 49(5): 773-782.
32. TAD vs Lamina associated domains (LAD) (2)
Nora, E., et al. (2013). BioEssays : news and reviews in molecular, cellular and developmental biology 35(9): 818-828.
33. TAD vs LOCKs
β’ LOCK: Large Organized Chromatin K9-modifications
β’ Conserved regions exhibiting large H3K9Me2 difference between cell lines
34. Summary
β’ The mammalian genome is segmented into a megabase-scale domains
β’ Domain boundaries are stable between cell lines and species , suggesting that
they are a basic property of the chromosome architecture.
β’ Domain boundaries are enricher for :
β’ Transcriptionally active genes
β’ Coincide with heterochromatin boundaries
β’ Enriched with insulator proteins
β’ Enriched with tRNA, SINE and housekeeping genes
β’ Developed many data-analysis approaches
35. Summary
β’ The mammalian genome is segmented into a megabase-scale domains
β’ Domain boundaries are stable between cell lines and species , suggesting that
they are a basic property of the chromosome architecture.
β’ Domain boundaries are enricher for :
β’ Transcriptionally active genes
β’ Coincide with heterochromatin boundaries
β’ Enriched with insulator proteins
β’ Enriched with tRNA, SINE and housekeeping genes
β’ Developed many data-analysis approaches