Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MSPC: Joint analysis of ChIP-seq replicates

458 views

Published on

Using combined evidence from replicates to evaluate ChIP-seq peaks.

The analysis of ChIP-seq samples outputs a number of enriched regions, each indicating a protein-DNA interaction or a specific chromatin modification. Enriched regions (commonly known as "peaks") are called when the read distribution is significantly different from the background and its corresponding significance measure (p-value) is below a user-defined threshold.

When replicate samples are analysed, overlapping enriched regions are expected. This repeated evidence can therefore be used to locally lower the minimum significance required to accept a peak. Here, we propose a method for joint analysis of weak peaks.

Given a set of peaks from (biological or technical) replicates, the method combines the p-values of overlapping enriched regions: users can choose a threshold on the combined significance of overlapping peaks and set a minimum number of replicates where the overlapping peaks should be present. The method allows the "rescue" of weak peaks occuring in more than one replicate and outputs a new set of enriched regions for each replicate.

Published in: Data & Analytics
  • Be the first to comment

MSPC: Joint analysis of ChIP-seq replicates

  1. 1. POLITECNICO DI MILANO Department of Electronics, Information and Bioengineering July 20, 2015 Using combined evidence from replicates to evaluate ChIP-seq peaks Vahid Jalili Vahid Jalili (vahid.jalili@polimi.it) Matteo Matteucci (matteo.matteucci@polimi.it) Marco Masseroli (marco.masseroli@polimi.it) Marco Morelli (marco.morelli@iit.it) Website: https://mspc.codeplex.com
  2. 2. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 2 Motivation Tagcount Genomic DNA Signal Background ChIP-seq sample True Positive False Positive False Negative True Negative Stringent Threshold Permissive Threshold Stringent Threshold Permissive Threshold
  3. 3. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 3 Motivation Benefit from Replicates Utilize replicates to discriminate between sub-threshold binding from truly none-bounding regions Tagcount Genomic DNA Signal Background Replicate 1 Replicate 2 Tagcount
  4. 4. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 4 Motivation Benefit from Replicates
  5. 5. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 5 Method Notations 𝒯 𝑠 𝒯 𝑤 Strong threshold Weak threshold 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯 𝑠 Strong Peak Weak Peak 𝒯 𝑠 < 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯 𝑤
  6. 6. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 6 Method Combining Evidences  𝑋2𝑘 2 follows a 𝜒2 distribution with 2𝑘 degrees of freedom. Alternatives for combining test statistics :  Liptak’s method (Liptak, 1958)  Mudholkar and George (Mudholkar & George, 1979)  Wilkinson’s method (Wilkinson, 1951)  Truncated product method (Zaykin D. , Zhivotovsky, Westfall, & Weir, 2002)  … How to combine evidences ? Fisher’s combined probability test 𝑋2𝑘 2 = −2 𝑖=1 𝑘 ln 𝑝𝑖 𝐶𝑜𝑛𝑓𝑖𝑟𝑚, 𝑋2𝑘 2 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝐷𝑖𝑠𝑐𝑎𝑟𝑑, 𝑋2𝑘 2 < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
  7. 7. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 7 Method Combining Evidences Replicate 1 Replicate 2 Replicate 3 Which evidences to combine ? Replicate 4
  8. 8. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 8 Method Combining Evidences Replicate 1 Replicate 2 Replicate 3 Which evidences to combine ? Replicate 4
  9. 9. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 9 Method Combining Evidences Replicate 1 Replicate 2 Replicate 3 Which evidences to combine ? Replicate 4
  10. 10. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 10 Method Combining Evidences Replicate 1 Replicate 2 Replicate 3 Which evidences to combine ? Replicate 4
  11. 11. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 11 Method Intersection DeterminationThe Challenge … an optimal method for finding the intersections Sorted Lists Naïve method Hashing Based Interval Trees 𝑶 𝒎 𝒏 𝑶 𝒏 𝒎 𝑶 𝒏 𝒍𝒐𝒈 𝟐 𝒘 𝒘 + 𝒎𝒓 𝑶 𝒏 log 𝟐 𝒏 S o m e Po s s i b l e M e t h o d s • 𝑛 average peaks count on a sample • 𝑚 sample count M e t h o d ’s C o m p l ex i t y • 𝑤 number of bits in a machine-word • 𝑟 intersection size
  12. 12. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 12 Method Intersection DeterminationInterval Trees 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 [ 16 , 21 ] Data [ 8 , 9 ] Data [ 25 , 30 ] Data [ 17 , 19 ] Data [ 26 , 27 ] Data [ 19 , 20 ] Data [ 15 , 23 ] Data [ 5 , 8 ] Data [ 6 , 10 ] Data [ 0 , 3 ] Data
  13. 13. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 13 Method Algorithm
  14. 14. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 14 Method Algorithm
  15. 15. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 15 Method Algorithm
  16. 16. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 16 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 1 (weak peak) R 4 (strong region) R 3 (weak peak) Algorithm … an example R 2 (weak peak) R 1 (weak peak)
  17. 17. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 17 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) R 3 (weak peak) Algorithm … an example R 2 (weak peak) Determine intersecting regions across all samples R 1 (weak peak) R 2 (weak peak) R 3 (weak peak)
  18. 18. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 18 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 1 (weak peak) R 2 (weak peak) R 3 (weak peak) If multiple regions determined intersecting on a sample, choose the strongest one R 3 (weak peak) Determine intersecting regions across all samples
  19. 19. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 19 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 1 (weak peak) R 2 (weak peak) R 3 (weak peak) If multiple regions determined intersecting on a sample, choose the strongest one Determine intersecting regions across all samples Combine test statistics using Fisher’s method
  20. 20. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 20 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 1 (weak peak) R 2 (weak peak) R 3 (weak peak) If multiple regions determined intersecting on a sample, choose the strongest one Determine intersecting regions across all samples Combine test statistics using Fisher’s method 𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? NO !
  21. 21. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 21 Method Algorithm ██ Confirmed Peaks Set ██ Discarded Peaks Set Algorithm … an example R 1 I n t e r m e d i a t e S e t s Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3 R 2
  22. 22. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 22 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 1 (weak peak) R 2 (weak peak) R 3 (weak peak) Determine intersecting regions across all samples R 2 (weak peak) Since R2 intersects only with R1, and R1-R2 test is already performed, no further process will be taken
  23. 23. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 23 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 1 (weak peak) R 3 (weak peak) Determine intersecting regions across all samples R 2 (weak peak) R 3 (weak peak) R 4 (strong region) R 1 (weak peak) Combine test statistics using Fisher’s method 𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES !
  24. 24. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 24 Method AlgorithmAlgorithm … an example ██ Confirmed Peaks Set ██ Discarded Peaks Set R 1 I n t e r m e d i a t e S e t s Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3 R 2 R 3 R 4
  25. 25. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 25 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 3 (weak peak)R 2 (weak peak) R 1 (weak peak) R 4 (strong region) Determine intersecting regions across all samples Combine test statistics using Fisher’s method 𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES ! R 3 (weak peak)
  26. 26. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 26 Method AlgorithmAlgorithm … an example ██ Confirmed Peaks Set ██ Discarded Peaks Set I n t e r m e d i a t e S e t s Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3 R 2 R 3 R 4 R 1
  27. 27. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 27 Method AlgorithmAlgorithm … an example I n t e r m e d i a t e S e t s Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3 R 2 R 3 R 4 R 1 R 1 ██ Confirmed Peaks Set ██ Discarded Peaks Set ██ Output Set O u t p u t S e t s
  28. 28. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 28 Method Algorithm Replicate 1 Replicate 2 Replicate 3 R 4 (strong region) Algorithm … an example R 3 (weak peak)R 2 (weak peak) R 1 (weak peak) R 2 (weak peak) R 1 (weak peak) R 3 (weak peak) R 4 (strong region)
  29. 29. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 29 Results Myc2_1 0e+002e+044e+046e+048e+041e+05 Myc2_2 Myc3_1 050001000015000200002500030000 Myc3_2 Myc2_1 0e+002e+044e+046e+048e+041e+05 Myc2_2 Myc3_1 Myc3_2 Abbreviation File name Myc2_1 wgEncodeSydhTfbsK562CmycIggrabAlnRep1 Myc2_2 wgEncodeSydhTfbsK562CmycIggrabAlnRep2 Myc3_1 wgEncodeSydhTfbsK562CmycStdAlnRep1 Myc3_2 wgEncodeSydhTfbsK562CmycStdAlnRep2 Category Abbreviation Color Implication Input (source BED file) In ██ Strong ██ Weak Analysis Results Re ██ Strong Confirmed ██ Weak Confirmed ██ Weak Discarded S e t 1 S e t 2 Set 3 In Re In Re In Re In Re In Re In Re In Re In Re
  30. 30. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 31 Results Motif was enriched in the sequence defined by peaks Motif was NOT enriched in the sequence defined by peaks Presence of Ebox
  31. 31. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 32 Implementation Performance 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 Time(seconds) Peaks Count x 10000 Running Time 2-Replicates 4-Replicates 6-Replicates Demo
  32. 32. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 33 Questions Q u e s t i o n s arewelcomeat: https://mspc.codeplex.com/discussions

×