20091110 Technical Seminar ChIP-seq Data Analysis - Presentation Transcript
Tools and challenges for
ChIP-seq data analysis
Alba Jené Sanz
Biomedical Genomics Lab (UPF)
Overview
1. ChIP-seq – The basics
2. Typical pipeline
3. Challenges in ChIP-seq data analysis
4. To take into account
5. Available tools
6. Analysis example
7. Future Challenges
8. Where to look for help
1. ChIP-seq – The Basics
1. ChIP-seq – The Basics
ChIP-on-chip
ChIP-seq
1. ChIP-seq – The Basics
ChIP-on-chip
Bioinformatics
ChIP-seq
3. Challenges in ChIP-seq data analysis
Millions of segments that need a fast mapping to the genome (allowing
mismatches or gaps, performance issues)
Peak detection – find the exact binding site
Data normalization – compare results, background noise
Visualization – thousands of enriched regions. UCSC, JBrowse…
4. To take into account
Transcription Factors vs Nucleosomes / Histone modifications
Control available?
Sequencing depth bias in Control vs IP
Different alignment methods produce different peak calling results, but the difference is
not as much as the one due to different peak caller or replicate
Many differences on peak callers can be explained by the different thresholds used
Some peak callers may be specific to some data types
Consistency may be used to set threshold if replicates are available
4. To take into account
There are many tools for the analysis of ChIP-
seq data, but no standards yet
5. Available tools
5. Available tools
5. Available tools
5. Available tools
5. Available tools
Uses regional averaging to mitigate sample fluctuations in the control library
Uses the control to model the distribution across the genome using the Poisson
distribution (BG). After identifying candidate peaks significantly enriched over the
BG, a local labda is estimated using windows around each peak to eliminate local
biases
Open-source, open to contributions (Artistic License) and being actively
improved
Easy to use and fast-responding developers
Compares very well to other methods
6. Analysis example
MACS pipeline
Output:
- Peak locations in BED and XLS format (genome browser)
- Tag count in wiggle format (genome browser)
- Bimodal model in R scripts
6. Analysis example
Input:
-BED format peak locations
- Optional signal profile in wiggle format
- BED format extra regions of interest
CEAS output
CEAS output
CEAS output
CEAS output
CEAS output
CEAS output
7. Future challenges
Re-analyze data with new algorithms – sequences remain the same
ChIP-seq combined with Chromatin Conformation Capture (3C) –
long-range physical interactions
Technical improvements: RNA-seq will benefit from longer reads
Integrated computational analyses – integration of TF, histone
marks, methylation, polymerase loading to predict regulatory output
8. Where to look for help...
Seqanswers.com
8. Where to look for help...
Seqanswers.com
Google groups, mailing lists of each project
MACS
CEAS FindPeaks
8. Where to look for help...
Seqanswers.com
Google groups, mailing lists of each project
MACS
CEAS FindPeaks
Lab mates!
0 comments
Post a comment