Motif-based analysis of ChIP-seq data - Timothy Bailey

Uploaded on

ChIP-seq experiments are the current method of choice for surveying the targets of DNA-binding transcription factors (TFs). Motif-based sequence analysis of ChIP-seq data can provide extremely …

ChIP-seq experiments are the current method of choice for surveying the targets of DNA-binding transcription factors (TFs). Motif-based sequence analysis of ChIP-seq data can provide extremely valuable insights into the biological mechanisms underlying transcriptional regulation. For example, it can determine the DNA-binding affinity (motif) of the factor, distinguish regions bound directly or indirectly by the factor, and suggest the identities of TFs that regulate cooperatively. Motif analysis can also be employed in differential mode to identify explanatory differences in the motif content of genomic regions bound by the factor under different cellular contexts. I will describe several types of motif-based analysis and give concrete examples of biological insights they can yield.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Motif-based analysis of ChIP-seq data Timothy L. Bailey AMATA October 16, 2013
  • 2. Overview of Talk • ChIP-seq data analysis – Why do motif-based analysis? – MEME-ChIP • Two case studies – KLF1 in mouse fetal liver cells – NFI in mouse neural stem cells
  • 3. Steps in ChIP-seq • Cross-link proteins to DNA • Fragment chromatin • Immunoprecipitate with antibody to protein • Size-select and ligate • Amplify • Sequence Cross-link
  • 4. What can we learn from Transcription Factor ChIP-seq data? • Where is the TF bound? • What is its DNA-binding affinity? • What genes might it regulate? • What are its partners?
  • 5. ChIP-seq Data Analysis 1. Mapping: Align the reads with the reference genome. 2. Peak Calling: Identify regions with significant read coverage. 3. Motif-based sequence analysis: Identifying DNA sequence patterns in the peaks. … “Practical guidelines for the comprehensive analysis of ChIP-seq data”, Bailey et al., PLoS Comp Bio (in press).
  • 6. Why do motif-based analysis? • Quality control • Understanding DNA-binding affinity • Understanding regulatory mechanisms
  • 7. PWM-based Word-based Known motifs
  • 8. Motif Discovery: MEME 100-bp ChIP-seq regions align sites motif IC = total height of letters motif logo • Searches for novel PWM motif with most significant information content (IC). • Null model: the distribution of the IC of a set of sites of a given width in random sequences of a given length.
  • 9. Discriminative Motif Discovery: DREME 100-bp ChIP-seq regions P=5 100-bp shuffled regions N=3 • Searches for novel regular expression motif with most significant enrichment of sites in positive sequences. • Null model: the probability of a site is the same in the two sets of sequences. • Test: Fisher’s Exact Test on P and N (number of sequences with ≥ 1 site) Motif Regular Expression: CCMRCCC
  • 10. Central Motif Enrichment Analysis: CentriMo 500-bp ChIP-seq regions W=120 Probability L=500 S = number of “successes” = 4 T = number of “trials” = 5 • Searches for known motif whose best sites are most centrally enriched in the ChIP-seq regions. • Null model: best sites are uniformly distributed within the regions. • Test: Binomial(S, T, w/L) “site-probability” curve Position of Best Site
  • 11. Motif Spacing Analysis: SpaMo 500-bp ChIP-seq regions 300-bp centered on primary • Searches for known motifs whose best sites have a preferred spacing with the primary motif. 1. 2. 3. Align regions on best primary site. Predict best secondary site. Compute enrichment at each possible spacing. • Null model: uniform • Test: Binomial
  • 12. Case Study 1: KLF1 Did my ChIP-seq work?
  • 13. Knowing when TF ChIP-seq fails 1) KLF1 ChIP: Tallack et al, Genome Research, 2011. 2) KLF1 ChIP: Other published data. • The top MEME and DREME motifs confirm the in vitro KLF-family motif. • UniPROBE Klf7_primary motif MEME KLF1 motif The best DREME motif only approximates the KLF-family motif. MEME finds no similar motif. UniPROBE Klf7_primary motif DREME KLF1 motif
  • 14. TF motif databases are an invaluable resource
  • 15. KLF-family motifs are nearly identical
  • 16. Strong Evidence of Failure: Central Motif Enrichment 1) Tallack KLF1 data 2) Other KLF1 data p = 10-66 p = 0.7 KLF7 in vitro motif
  • 17. Are KLF1, GATA and GATA/SCL motifs the most centrally enriched motifs in KLF1 peak regions? 1. Tallack KLF1 data – yes. CentriMo Analysis of Tallack KLF1 data Top four centrally enriched motifs in JASPAR+UniProbe (862 motifs) Klf4 KLF7 KLF4 GATA1 W=103 GATA1/SCL W=111 W=194 -40 P = 10-54 W=177 -66 P = 10-48 Klf7 GATA/SCL GATA
  • 18. Are KLF1, GATA and GATA/SCL motifs the most centrally enriched motifs in KLF1 peak regions? 1. Tallack KLF1 data – yes. T 2. Other KLF1 data – no. 1. Tallack KLF1 data KLF4 2. Other KLF1 data
  • 19. KLF1 summary • Enrichment of the known KLF-family motif(s) as well as of known co-factors are strong evidence of a successful TF ChIP-seq experiment. • Perform motif-based analysis on TF ChIP-seq before publishing!
  • 20. Case Study 2: NFI How does my TF bind?
  • 21. Nuclear Factor I • Martynoga et al. (2013) ChIP-ed NFI in proliferating and quiescent mouse neural stem cells. • NFIA, NFIB, NFIC and NFIX. • NFI thought to bind as dimers.
  • 22. Enriched motifs in NFI peaks in proliferating neural stem cells
  • 23. Does NFI bind as a monomer in neural stem cells?
  • 24. Enrichment suggests NFIX binds often as a monomer MEME MEME
  • 25. Half-site spacing enriched at multiples of 10 bp
  • 26. Dimeric sites are twice as common in embryonic fibroblasts mNSC ChIP MEF ChIP
  • 27. E-boxes are highly enriched near NFI peaks
  • 28. Most enriched E-box is enriched also in quiescent neural stem cells proliferating quiescent
  • 29. One E-box not enriched in quiescent neural stem cells proliferating quiescent
  • 30. Differentially enriched motif could bind OLIG or NEUROG/D
  • 31. NFI summary • Motif-based analysis sheds light on how TFs bind. – Unlike other NFIs, NFIX often binds as a monomer. – NFI binding is less associated with binding of OLIG or NEUROG/D factors in quiescent than in proliferating neural stem cells.
  • 32. Conclusions • Motif-based TF ChIP-seq analysis is highly useful for: – Quality control – Understanding DNA-binding affinity – Understanding regulatory mechanisms
  • 33. Acknowledgements The MEME Suite • William Noble • James Johnson • Charles Grant • Martin Frith • Philip Machanick • Tom Whitington • Shobhit Gupta • Tom Lesluyes • Benjamin Dartigues KLF Project • Michael Tallack • Tom Whitington • Andrew Perkins • Sean Grimmond • Brooke Gardiner • Ehsan Nourbakhsh • Nicole Cloonan • Elanor Wainwright • Janelle Keys • Wai Shan Yuen
  • 34.
  • 35. Transcription Factors • Mammalian transcription is controlled (in part) by about 1400 DNA-binding transcription factor (TF) proteins. • These proteins control transcription in two main ways: – Directly, by promoting (or preventing) the assembly of the pre-initiation complex. – Indirectly, by modifying chromatin.
  • 36. ChIP-seq • Chromatin ImmunoPrecipitation followed by highthroughput sequencing. • TF binding sites (“punctate peaks”) • Chromatin mods (“broad peaks”)
  • 37. KLF1 is a key transcription factor in blood cell development • We performed KLF1 ChIP-seq in mouse fetal liver cells and analyzed the resulting 945 peak regions using the MEME Suite [Tallack et al, Genome Research, 2010.] • We confirmed – the in vitro binding motif of KLF1, – several co-factor TFs, and – a co-factor complex. Pooled 4 Livers (~80x106 cells) Positive: ChIP (αKLF1 Rabbit Polyclonal Ab) Control: Input DNA
  • 38. A second KLF1 ChIP-seq experiment • Pilon et al. (Blood, 2011) also performed KLF1 ChIP-seq in mouse fetal liver cells. • They predicted over 13,000 peak regions. • We reanalyzed their data using the MEME Suite. • This second ChIP-seq data gives very different results.
  • 39. Do the Pilon KLF1 ChIP-seq regions contain KLF1 co-factor sites?
  • 40. GATA1 and SCL are important KLF1 regulatory co-factors • GATA1 and SCL bind DNA in a protein complex [Wadman et al, EMBO Journal, 1997]. T 1. Tallack KLF1 data – MEME finds complex motif 2. Pilon KLF1 data—MEME does not find the motif Known GATA-SCL motif (JASPAR database) MEME GATA-SCL motif found in Tallack KLF1 data
  • 41. Is motif discovery failing? • To check this, we use CentriMo to search for any motifs in the JASPAR+UniPROBE motif database that are centrally enriched in the two KLF1 ChIP-seq datasets.
  • 42. Caveats in ChIP-seq Motif Analysis • Peak regions may contain other TF motifs due to looping. • The binding of the ChIP-ed factor “X” may be indirect. • ChIP-ed motif might be weak due to assisted binding. Farnham, Nature Reviews Genetics, 2009
  • 43. MEME motif is E-box with adjacent NFI half-site NFI half-site
  • 44. Differential central enrichment