6. Pipeline implementation challenges
• The designed pipeline we face two kind of problems:
• Analysis problems :
• how to accurately filter the linker sequences from the raw reads ?
• how to accurately and efficiently map the tag sequences to reference genomes?
• how to evaluate the noise level in the data?
• how to identify bona-fide (real) binding sites and chromatin interactions ?
• Visualization problems:
• how to organize the datasets ?
• how to effectively visualize the identified long-range chromatin interactions ?
8. ChIA-PET tool-fastq file format
• Each PET entry consists of four lines.
• Only the first 20 bp are mapped to the genome.
• The other 16 are assumed to belong to the linker.
Seq1.fasq
Seq2.fasq
11. ChIA-PET tool-Mapping
• Mapping was done using the Batman package.
• Only one mismatch was allowed.
• Only PETs with uniquely mapped tags where considered.
• Duplicate PETs were removed as they can be the result of PCR clonal
amplification.
12. ChIA-PET tool-Mapping
• Merge similarly mapped PETs (within ± 1bp) into one unique PET
• The aim is to reduce false positive calls due to PCR amplification.
15. ChIA-PET tool – PET classification (2)
• How to determine the span cutoff between self-ligation and Inter-ligation PETs ?
• From the histogram we can see that the majority of the PETs do not span more than 2kb from each other.
• for IHH015M library data, the span cutoff called by this method is 4,595 bp
Self-ligation
Intra-ligation
16. ChIA-PET tool – PET classification (3)
To check whether the obtained self-ligation are not due to random
ligation, a random model was generated:
Let:
• 𝑁𝑖: The number of DNA fragment on 𝐶ℎ𝑟𝑖
• 𝐼𝑛𝑡𝑟𝑎𝑖: The number of Intra-ligation related fragments on 𝐶ℎ𝑟𝑖
If we suppose any DNA fragment can interact with any other
fragment we get:
𝑟𝑎𝑡𝑖𝑜_𝑖𝑛𝑡𝑟𝑎 𝑖 =
𝐼𝑛𝑡𝑟𝑎𝑖
𝑁𝑖
2
So the total ratio over all chromosomes will be
𝑟𝑎𝑡𝑖𝑜 𝑡𝑜𝑡𝑎𝑙 =
𝑖 𝐼𝑛𝑡𝑟𝑎𝑖
𝑖 𝑁𝑖
2
• In Chimeric intra-ligation (IHH015C):
• The expected ration : 0.0552
• the observed ration: 0.0558
• ⇒ P-value =5.2E-4 ⇒ Random
• In non-Chimeric intra-ligation (IHH015M) :
• The expected ratio is : 0.0546
• the observed ratio is : 0.0929
• ⇒ P-value < 2.97E-323 ⇒ non-Random
𝑁𝑖:5
𝐼𝑛𝑡𝑎𝑖 ∶ 3
𝑟𝑎𝑡𝑖𝑜_𝑖𝑛𝑡𝑟𝑎 𝑖 =
3
25
18. ChIA-PET tool – PET classification (3)
We can see from the heatmap that the chimeric ligations are randomly distributed.
19. ChIA-PET tool- Slef-ligation site analysis
• In this step the Chip-enriched regions (self-ligation) are clustered.
• Need to filter-out clusters with high FDR
• FDR was calculated as follow:
• For each cluster calculate the probability of obtaining 𝑘 ligation by chance using a Monte-Carlo simulation
• Monte-Carlo simulation consists of randomly picking PETs and see how many PETs are in that cluster.
• Repeat the simulation 100 times and get the mean.
• Clusters with an FDR > 0.01 were filtered-out
20. ChIA-PET tool-Chromatin interaction (1)
• The identification chromatin interaction is done in two steps:
• Step1:
• Identify ChIP enriched interaction anchor regions from inter-ligation PETs similar to ChIP-
Seq
• The tag length is extended from 5’ to 3’ by a “tag extension length” (in this data set it
was 200 bp)
• Note: Most of the Inter-ligation anchor regions overlap with the self-ligation regions
• Step2
• Determine the number of overlapping PETs.
• Filter-out random interactions.
21. Region A
Region B
ChIA-PET tool-Chromatin interaction (2)
• The null hypothesis assumes :
• Each chromatin fragment has a equal chance to ligate to any other fragment randomly.
• The interaction between each anchors are independent from each other.
• Let
• 𝑹 𝑨 and 𝑹 𝑩 be two regions
• 𝑪 𝑨 = Number of PETs in a 𝑅 𝐴
• 𝑪 𝑩 = Number of PETs in a 𝑅 𝐵
• 𝑰 𝑨,𝑩 = The number of interaction between two regions 𝐴 and 𝐵
• 𝑵 = The number of Inter-ligation (I think it should be only on the same chromosome)
𝑃 𝐼𝐴,𝐵 𝑁, 𝐶𝐴, 𝐶 𝐵) =
𝐶 𝐴
𝐼 𝐴,𝐵
2𝑁−𝐶 𝐴
𝐶 𝐵 −𝐼 𝐴,𝐵
2𝑁
𝐶 𝐵
26. Installing Chia-PET on our server
MySQL server:
• You need to install your own or install MySQL Cluster
Adding a new assembly:
• Download the reference genome and create BATMAN index using steps in Installation guid (appendix)
• Download for the UCSC table the following files and format them in the Chia-pet format:
• Chromosome sizes
• Genes
• Gaps
• Installing missing packages (Note: replace $HOME by the real path)
• fftw: create a custom directory (ex: $HOME/bin) install using the following commands
• rimage: Open R and install it as follow:
./configure –-prefix=$HOME/bin
make CFLAGS=“-fCIP”
Make install
install.packages("rimage_0.5-8.2.tar",repos=NULL,configure.args="--with-fftw-include=$HOME/bin/incluse --with-fftw-lib=$HOME/bin/lib")
27. Running Chia-PET tool
• We suppose that Chia-PET is installed in $HOME/ChiaPET
• Go to $HOME/ChiaPET/src/python/common and open the file config.py and change the linkers
• Go to $HOME/ChiaPET/src/python/main
• Just need to run the following commands:
• The results will be in $HOME/ChiaPET/work/<lib_name>
• Temporary files will be in $HOME/ChiaPET/prep/<lib_name>
Editor's Notes
Intra-chromosomal inter-ligation PETs' are PETs with both tags mapped to the same chromosome with a long genomic span
Intra-chromosomal inter-ligation PETs' are PETs with both tags mapped to the same chromosome with a long genomic span
Intra-chromosomal inter-ligation PETs' are PETs with both tags mapped to the same chromosome with a long genomic span
Intra-chromosomal inter-ligation PETs' are PETs with both tags mapped to the same chromosome with a long genomic span
Intra-chromosomal inter-ligation PETs' are PETs with both tags mapped to the same chromosome with a long genomic span