SlideShare a Scribd company logo
1 of 44
Download to read offline
Tools and challenges for
          ChIP-seq data analysis

Alba Jené Sanz
Biomedical Genomics Lab (UPF)
Overview


1. ChIP-seq – The basics
2. Typical pipeline
3. Challenges in ChIP-seq data analysis
4. To take into account
5. Available tools
6. Analysis example
7. Future Challenges
8. Where to look for help
1. ChIP-seq – The Basics
1. ChIP-seq – The Basics




     ChIP-on-chip


ChIP-seq
1. ChIP-seq – The Basics




     ChIP-on-chip
                    Bioinformatics
ChIP-seq
1. ChIP-seq – The Basics




35 bp
           500 bp   35 bp
1. ChIP-seq – The Basics
2. Typical pipeline
2. Typical pipeline
2. Typical pipeline


             Bowtie
2. Typical pipeline


             Bowtie   MACS
2. Typical pipeline


             Bowtie   MACS




      CEAS
2. Typical pipeline

Mapping…

   Unique / multiple locations
   Allowing mismatches – seed sequence
   Balance accuracy / performance

Peak calling…
2. Typical pipeline
3. Challenges in ChIP-seq data analysis


Millions of segments that need a fast mapping to the genome (allowing
mismatches or gaps, performance issues)


Peak detection – find the exact binding site


Data normalization – compare results, background noise


Visualization – thousands of enriched regions. UCSC, JBrowse…
4. To take into account
Transcription Factors vs Nucleosomes / Histone modifications

Control available?


Sequencing depth bias in Control vs IP


Different alignment methods produce different peak calling results, but the difference is
not as much as the one due to different peak caller or replicate


Many differences on peak callers can be explained by the different thresholds used


Some peak callers may be specific to some data types


Consistency may be used to set threshold if replicates are available
4. To take into account




There are many tools for the analysis of ChIP-
seq data, but no standards yet
5. Available tools
5. Available tools
5. Available tools
5. Available tools
5. Available tools




Uses regional averaging to mitigate sample fluctuations in the control library

      Uses the control to model the distribution across the genome using the Poisson
distribution (BG). After identifying candidate peaks significantly enriched over the
BG, a local labda is estimated using windows around each peak to eliminate local
biases

Open-source, open to contributions (Artistic License) and being actively
improved

Easy to use and fast-responding developers

Compares very well to other methods
6. Analysis example
 QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)
lane5_SNAIL_F9_qseq.txt

SOLEXA   90320   5        1   0   476    0   1      .ACGGGGGAGGG.C...CAAC..A...C............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   1222   0   1      .AATTGAAAAAT.A..TTTAA..G...A............   DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   133    0   1      .CCAGTCTATTAATT.TTGCC..GA..C............   DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   145    0   1      .ATTGTTTCTGACTA.TTGAT..GC..T............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   153    0   1      .ACCGCTATCAGTAC.TAGCT..GT..A............   DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   215    0   1      .TGTTGCCATTGCTA.AGGCA..GT..T............   DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   827    0   1      AGGAGATCGGCCGGTTGATGAGCCGAGTG...........   Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   56     0   1      AAAAATCGACGCTCAAGTCAGAGGTGGCG...........   ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   1   925    0   1      TGCAGCACTGGGGCCAGATGGTAAGCCCT...........   _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   2   1637   0   1      GGGCTTCTGCCCCGGTGGGTACATGAGTA...........   aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB   1




 @3_1_3_89
 CACAGTGTCCTCCAGGTTCATCCC................
 +3_1_3_89
                                                                   @FC30C11AAXX:8:1:1649:1790
 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG
 @3_1_3_762
                                                                   +
 GCAAACAAATGGCGGAAAGCGGCG................
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663
 +3_1_3_762
                                                                   @FC30C11AAXX:8:1:1655:1811
 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT
 @3_1_90_512
                                                                   +
 GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT
                                                                   <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+
 +3_1_90_512
                                                                   @FC30C11AAXX:8:1:1609:1848
 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY
                                                                   GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT
 @3_1_90_1028
                                                                   +
 GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG
                                                                   <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-
 +3_1_90_1028
                                                                   @FC30C11AAXX:8:1:1667:1880
 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a
                                                                   GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA
 @3_1_90_1651
                                                                   +
 TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566
 +3_1_90_1651
                                                                   @FC30C11AAXX:8:1:1577:1853
 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a
                                                                   GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG
 @3_1_90_1670
                                                                   +
                                                                   <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
6. Analysis example
 QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)
lane5_SNAIL_F9_qseq.txt

SOLEXA   90320   5        1   0   476    0    1     .ACGGGGGAGGG.C...CAAC..A...C............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   1222   0    1     .AATTGAAAAAT.A..TTTAA..G...A............   DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   133    0    1     .CCAGTCTATTAATT.TTGCC..GA..C............   DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   145    0    1     .ATTGTTTCTGACTA.TTGAT..GC..T............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   153    0    1     .ACCGCTATCAGTAC.TAGCT..GT..A............   DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   215    0    1     .TGTTGCCATTGCTA.AGGCA..GT..T............   DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   827    0    1     AGGAGATCGGCCGGTTGATGAGCCGAGTG...........   Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   56     0    1     AAAAATCGACGCTCAAGTCAGAGGTGGCG...........   ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   1   925    0    1     TGCAGCACTGGGGCCAGATGGTAAGCCCT...........   _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   2   1637   0    1     GGGCTTCTGCCCCGGTGGGTACATGAGTA...........   aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB   1



                                             Filter qualities and parse
 @3_1_3_89
 CACAGTGTCCTCCAGGTTCATCCC................
 +3_1_3_89
                                                                   @FC30C11AAXX:8:1:1649:1790
 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG
 @3_1_3_762
                                                                   +
 GCAAACAAATGGCGGAAAGCGGCG................
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663
 +3_1_3_762
                                                                   @FC30C11AAXX:8:1:1655:1811
 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT
 @3_1_90_512
                                                                   +
 GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT
                                                                   <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+
 +3_1_90_512
                                                                   @FC30C11AAXX:8:1:1609:1848
 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY
                                                                   GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT
 @3_1_90_1028
                                                                   +
 GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG
                                                                   <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-
 +3_1_90_1028
                                                                   @FC30C11AAXX:8:1:1667:1880
 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a
                                                                   GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA
 @3_1_90_1651
                                                                   +
 TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566
 +3_1_90_1651
                                                                   @FC30C11AAXX:8:1:1577:1853
 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a
                                                                   GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG
 @3_1_90_1670
                                                                   +
                                                                   <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
6. Analysis example
 QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type)
lane5_SNAIL_F9_qseq.txt

SOLEXA   90320   5        1   0   476    0    1     .ACGGGGGAGGG.C...CAAC..A...C............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   1222   0    1     .AATTGAAAAAT.A..TTTAA..G...A............   DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   133    0    1     .CCAGTCTATTAATT.TTGCC..GA..C............   DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   145    0    1     .ATTGTTTCTGACTA.TTGAT..GC..T............   BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   153    0    1     .ACCGCTATCAGTAC.TAGCT..GT..A............   DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   0   215    0    1     .TGTTGCCATTGCTA.AGGCA..GT..T............   DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   827    0    1     AGGAGATCGGCCGGTTGATGAGCCGAGTG...........   Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB   0
SOLEXA   90320   5        1   1   56     0    1     AAAAATCGACGCTCAAGTCAGAGGTGGCG...........   ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   1   925    0    1     TGCAGCACTGGGGCCAGATGGTAAGCCCT...........   _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB   1
SOLEXA   90320   5        1   2   1637   0    1     GGGCTTCTGCCCCGGTGGGTACATGAGTA...........   aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB   1



                                             Filter qualities and parse
 @3_1_3_89
 CACAGTGTCCTCCAGGTTCATCCC................
 +3_1_3_89
                                                                   @FC30C11AAXX:8:1:1649:1790
 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG
 @3_1_3_762
                                                                   +
 GCAAACAAATGGCGGAAAGCGGCG................
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663
 +3_1_3_762
                                                                   @FC30C11AAXX:8:1:1655:1811
 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB
                                                                   GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT
 @3_1_90_512
                                                                   +
 GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT
                                                                   <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+
 +3_1_90_512
                                                                   @FC30C11AAXX:8:1:1609:1848
 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY
 @3_1_90_1028
 GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG
                                                   BOWTIE          GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT
                                                                   +
                                                                   <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%-
 +3_1_90_1028
                                                                   @FC30C11AAXX:8:1:1667:1880
 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a
                                                                   GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA
 @3_1_90_1651
                                                                   +
 TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT
                                                                   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566
 +3_1_90_1651
                                                                   @FC30C11AAXX:8:1:1577:1853
 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a
                                                                   GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG
 @3_1_90_1670
                                                                   +
                                                                   <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
6. Analysis example
SNAIL_F9.bwt
5_1_0_1409     +   gi|51511750|ref|NC_000021.7|NC_000021    34604194        AGTTGCACCTTTAACAATTTCCCAT       %/6::9::;;;;7279#########       0       17:G>T,24:G>T
5_1_0_811      +   gi|89161218|ref|NC_000023.9|NC_000023    77246408        TTCTGCAAGCCTCCGGAGCGCACGTG      BBB@5<?=9<9@>96/:0########      0       25:C>G
5_1_1_1665     +   gi|89161199|ref|NC_000002.10|NC_000002   201785208       GCCCAGCTGTCACTGTGGTTTTGATTTGC   BBCCCBBBCBBB@BABBBACCA#######   0
5_1_2_1637     +   gi|51511731|ref|NC_000015.8|NC_000015    92942360        GGGCTTCTGCCCCGGTGGGTACATGAGTA   BBBABABABBAB9@A??=?<AA#######   0
5_1_2_1359     +   gi|89161205|ref|NC_000003.10|NC_000003   101351498       CAATTCCCTCCTTGAAAGGCTCCTCCACC   BCCBBBBAAAABA9@B?59@ABA######   0
5_1_2_730      -   gi|51511721|ref|NC_000005.8|NC_000005    1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA   ########BB>9@B@;@B<;??ABCBABB   0
5_1_2_1118     -   gi|89161213|ref|NC_000007.12|NC_000007   157199758       CATCTTTGATGAGTTACTACCTGTGGGGT   ########@B@?=B@;8@659@@BAABAB   0
5_1_3_920      +   gi|51511727|ref|NC_000011.8|NC_000011    133317176       GGTAGACTCACAAAACTACCAAAGTCCTCTAC        ABABAABCBBBBCBCBCBBBCCA>@@######        0
5_1_3_971      +   gi|89161190|ref|NC_000012.10|NC_000012   7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG        B@86646330/250##################        0       31:T>G
6. Analysis example
SNAIL_F9.bwt
5_1_0_1409     +   gi|51511750|ref|NC_000021.7|NC_000021    34604194        AGTTGCACCTTTAACAATTTCCCAT       %/6::9::;;;;7279#########       0       17:G>T,24:G>T
5_1_0_811      +   gi|89161218|ref|NC_000023.9|NC_000023    77246408        TTCTGCAAGCCTCCGGAGCGCACGTG      BBB@5<?=9<9@>96/:0########      0       25:C>G
5_1_1_1665     +   gi|89161199|ref|NC_000002.10|NC_000002   201785208       GCCCAGCTGTCACTGTGGTTTTGATTTGC   BBCCCBBBCBBB@BABBBACCA#######   0
5_1_2_1637     +   gi|51511731|ref|NC_000015.8|NC_000015    92942360        GGGCTTCTGCCCCGGTGGGTACATGAGTA   BBBABABABBAB9@A??=?<AA#######   0
5_1_2_1359     +   gi|89161205|ref|NC_000003.10|NC_000003   101351498       CAATTCCCTCCTTGAAAGGCTCCTCCACC   BCCBBBBAAAABA9@B?59@ABA######   0
5_1_2_730      -   gi|51511721|ref|NC_000005.8|NC_000005    1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA   ########BB>9@B@;@B<;??ABCBABB   0
5_1_2_1118     -   gi|89161213|ref|NC_000007.12|NC_000007   157199758       CATCTTTGATGAGTTACTACCTGTGGGGT   ########@B@?=B@;8@659@@BAABAB   0
5_1_3_920      +   gi|51511727|ref|NC_000011.8|NC_000011    133317176       GGTAGACTCACAAAACTACCAAAGTCCTCTAC        ABABAABCBBBBCBCBCBBBCCA>@@######        0
5_1_3_971      +   gi|89161190|ref|NC_000012.10|NC_000012   7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG        B@86646330/250##################        0       31:T>G




                                                     Parsing


           SNAIL_F9.bwt.bed
           chr21   34604194               34604219                  5_1_0_1409              .           +
           chr23   77246408               77246434                  5_1_0_811               .           +
           chr02   201785208              201785237                 5_1_1_1665              .           +
           chr15   92942360               92942389                  5_1_2_1637              .           +
           chr03   101351498              101351527                 5_1_2_1359              .           +
           chr05   1314600 1314629        5_1_2_730                 .       -
           chr07   157199758              157199787                 5_1_2_1118              .           -
           chr11   133317176              133317208                 5_1_3_920               .           +
           chr12   7497006 7497038        5_1_3_971                 .       +
           chr01   201404048              201404081                 5_1_3_1986              .           +
6. Analysis example
SNAIL_F9.bwt
5_1_0_1409     +   gi|51511750|ref|NC_000021.7|NC_000021    34604194        AGTTGCACCTTTAACAATTTCCCAT       %/6::9::;;;;7279#########       0       17:G>T,24:G>T
5_1_0_811      +   gi|89161218|ref|NC_000023.9|NC_000023    77246408        TTCTGCAAGCCTCCGGAGCGCACGTG      BBB@5<?=9<9@>96/:0########      0       25:C>G
5_1_1_1665     +   gi|89161199|ref|NC_000002.10|NC_000002   201785208       GCCCAGCTGTCACTGTGGTTTTGATTTGC   BBCCCBBBCBBB@BABBBACCA#######   0
5_1_2_1637     +   gi|51511731|ref|NC_000015.8|NC_000015    92942360        GGGCTTCTGCCCCGGTGGGTACATGAGTA   BBBABABABBAB9@A??=?<AA#######   0
5_1_2_1359     +   gi|89161205|ref|NC_000003.10|NC_000003   101351498       CAATTCCCTCCTTGAAAGGCTCCTCCACC   BCCBBBBAAAABA9@B?59@ABA######   0
5_1_2_730      -   gi|51511721|ref|NC_000005.8|NC_000005    1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA   ########BB>9@B@;@B<;??ABCBABB   0
5_1_2_1118     -   gi|89161213|ref|NC_000007.12|NC_000007   157199758       CATCTTTGATGAGTTACTACCTGTGGGGT   ########@B@?=B@;8@659@@BAABAB   0
5_1_3_920      +   gi|51511727|ref|NC_000011.8|NC_000011    133317176       GGTAGACTCACAAAACTACCAAAGTCCTCTAC        ABABAABCBBBBCBCBCBBBCCA>@@######        0
5_1_3_971      +   gi|89161190|ref|NC_000012.10|NC_000012   7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG        B@86646330/250##################        0       31:T>G




                                                     Parsing


           SNAIL_F9.bwt.bed
           chr21   34604194               34604219                  5_1_0_1409              .           +
           chr23   77246408               77246434                  5_1_0_811               .           +
           chr02   201785208              201785237                 5_1_1_1665              .           +
           chr15   92942360               92942389                  5_1_2_1637              .           +
           chr03   101351498              101351527                 5_1_2_1359              .           +
           chr05   1314600 1314629        5_1_2_730                 .       -
           chr07
           chr11
                   157199758
                   133317176
                                          157199787
                                          133317208
                                                                    5_1_2_1118
                                                                    5_1_3_920
                                                                                            .
                                                                                            .
                                                                                                MACS
                                                                                                  -
                                                                                                  +
           chr12   7497006 7497038        5_1_3_971                 .       +
           chr01   201404048              201404081                 5_1_3_1986              .           +
6. Analysis example


                MACS pipeline


      Output:

      - Peak locations in BED and XLS format (genome browser)

      - Tag count in wiggle format (genome browser)

      - Bimodal model in R scripts
6. Analysis example


H3K27me3
                      PolII
6. Analysis example

snail_mfold_15_tsize41_newbwt_peaks.bed
track name="MACS peaks for snail_mfold_15_tsize41_newbwt"
chr1    559644 559924 MACS_peak_1       79.29
chr1    2435221 2435542 MACS_peak_2     51.58
chr1    14624217        14624571        MACS_peak_3     66.12
chr1    15610639        15611000        MACS_peak_4     56.69
chr1    16822564        16822753        MACS_peak_5     52.84
chr1    18411948        18412187        MACS_peak_6     82.46
chr1    22857612        22857985        MACS_peak_7     88.74
chr1    27541904        27542134        MACS_peak_8     69.47


snail_mfold_15_MACS.wig
track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS
tag counts for every 10 bp"
variableStep chrom=chr10 span=10
85171   1
85181   1
85191   1
85201   1
85211   1
85221   1
85231   2
85371   2
6. Analysis example

snail_mfold_15_tsize41_newbwt_peaks.bed
track name="MACS peaks for snail_mfold_15_tsize41_newbwt"
chr1    559644 559924 MACS_peak_1       79.29
chr1    2435221 2435542 MACS_peak_2     51.58
chr1    14624217        14624571        MACS_peak_3     66.12
chr1    15610639        15611000        MACS_peak_4     56.69
chr1    16822564        16822753        MACS_peak_5     52.84
chr1    18411948        18412187        MACS_peak_6     82.46
chr1    22857612        22857985        MACS_peak_7     88.74
chr1    27541904        27542134        MACS_peak_8     69.47


snail_mfold_15_MACS.wig
track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS
tag counts for every 10 bp"
variableStep chrom=chr10 span=10
85171   1
85181   1
85191   1
85201   1
85211   1                                   CEAS
85221   1
85231   2
85371   2
6. Analysis example




Input:

-BED format peak locations

- Optional signal profile in wiggle format

- BED format extra regions of interest
CEAS output
CEAS output
CEAS output
CEAS output
CEAS output
CEAS output
7. Future challenges



Re-analyze data with new algorithms – sequences remain the same

ChIP-seq combined with Chromatin Conformation Capture (3C) –
long-range physical interactions

Technical improvements: RNA-seq will benefit from longer reads

Integrated computational analyses – integration of TF, histone
marks, methylation, polymerase loading to predict regulatory output
8. Where to look for help...
Seqanswers.com
8. Where to look for help...
Seqanswers.com




Google groups, mailing lists of each project

                                      MACS

                           CEAS                FindPeaks
8. Where to look for help...
Seqanswers.com




Google groups, mailing lists of each project

                                      MACS

                           CEAS                FindPeaks


Lab mates!
20091110 Technical Seminar  ChIP-seq Data Analysis

More Related Content

Similar to 20091110 Technical Seminar ChIP-seq Data Analysis

Ignoap blank format
Ignoap blank formatIgnoap blank format
Ignoap blank formatBSSO49
 
DOI usage at Kyoto University - Tatsuji Tomioka
DOI usage at Kyoto University - Tatsuji TomiokaDOI usage at Kyoto University - Tatsuji Tomioka
DOI usage at Kyoto University - Tatsuji TomiokaCrossref
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationTyler Liang
 
Mongo db and_academia
Mongo db and_academiaMongo db and_academia
Mongo db and_academiaSkills Matter
 
MANUAL PHOTOSHOP CS5
MANUAL PHOTOSHOP CS5MANUAL PHOTOSHOP CS5
MANUAL PHOTOSHOP CS5autonomo
 
August 2014 use calender 1
August 2014 use calender 1August 2014 use calender 1
August 2014 use calender 1Floodwoodvern
 
MongoDB and research
MongoDB and researchMongoDB and research
MongoDB and researchJan Aerts
 
RNASeqR: RNA-Seq workflow for case-control study
RNASeqR: RNA-Seq workflow for case-control studyRNASeqR: RNA-Seq workflow for case-control study
RNASeqR: RNA-Seq workflow for case-control studyKuanHaoChao
 
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...Vall d'Hebron Institute of Research (VHIR)
 
Cameroun - Repertoire des projets prioritaires à besoins de financement
Cameroun - Repertoire des projets prioritaires à besoins de financementCameroun - Repertoire des projets prioritaires à besoins de financement
Cameroun - Repertoire des projets prioritaires à besoins de financementinvestincameroon
 
국내외 10대 기업 및 공공기관 SEO 평가
국내외 10대 기업 및 공공기관 SEO 평가 국내외 10대 기업 및 공공기관 SEO 평가
국내외 10대 기업 및 공공기관 SEO 평가 선영 박
 
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAY
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAYĐề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAY
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAYViết thuê trọn gói ZALO 0934573149
 

Similar to 20091110 Technical Seminar ChIP-seq Data Analysis (20)

Ignoap blank format
Ignoap blank formatIgnoap blank format
Ignoap blank format
 
NLIS PPT
NLIS PPT  NLIS PPT
NLIS PPT
 
DOI usage at Kyoto University - Tatsuji Tomioka
DOI usage at Kyoto University - Tatsuji TomiokaDOI usage at Kyoto University - Tatsuji Tomioka
DOI usage at Kyoto University - Tatsuji Tomioka
 
Pc pz
Pc pzPc pz
Pc pz
 
Allegato 2
Allegato 2Allegato 2
Allegato 2
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
 
Mongo db and_academia
Mongo db and_academiaMongo db and_academia
Mongo db and_academia
 
MANUAL PHOTOSHOP CS5
MANUAL PHOTOSHOP CS5MANUAL PHOTOSHOP CS5
MANUAL PHOTOSHOP CS5
 
August 2014 use calender 1
August 2014 use calender 1August 2014 use calender 1
August 2014 use calender 1
 
MongoDB and research
MongoDB and researchMongoDB and research
MongoDB and research
 
Ergen medeeleh 9sar
Ergen medeeleh 9sarErgen medeeleh 9sar
Ergen medeeleh 9sar
 
Suprfull
SuprfullSuprfull
Suprfull
 
RNASeqR: RNA-Seq workflow for case-control study
RNASeqR: RNA-Seq workflow for case-control studyRNASeqR: RNA-Seq workflow for case-control study
RNASeqR: RNA-Seq workflow for case-control study
 
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
 
Cameroun - Repertoire des projets prioritaires à besoins de financement
Cameroun - Repertoire des projets prioritaires à besoins de financementCameroun - Repertoire des projets prioritaires à besoins de financement
Cameroun - Repertoire des projets prioritaires à besoins de financement
 
In silico analysis for unknown data
In silico analysis for unknown dataIn silico analysis for unknown data
In silico analysis for unknown data
 
12 sar ergen medeeleh.
12 sar ergen medeeleh.12 sar ergen medeeleh.
12 sar ergen medeeleh.
 
Bio animation
Bio animationBio animation
Bio animation
 
국내외 10대 기업 및 공공기관 SEO 평가
국내외 10대 기업 및 공공기관 SEO 평가 국내외 10대 기업 및 공공기관 SEO 평가
국내외 10대 기업 및 공공기관 SEO 평가
 
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAY
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAYĐề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAY
Đề tài: Hoàn thiện công tác đào tạo nguồn nhân lực công ty xi măng, HAY
 

Recently uploaded

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 

Recently uploaded (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 

20091110 Technical Seminar ChIP-seq Data Analysis

  • 1. Tools and challenges for ChIP-seq data analysis Alba Jené Sanz Biomedical Genomics Lab (UPF)
  • 2. Overview 1. ChIP-seq – The basics 2. Typical pipeline 3. Challenges in ChIP-seq data analysis 4. To take into account 5. Available tools 6. Analysis example 7. Future Challenges 8. Where to look for help
  • 3. 1. ChIP-seq – The Basics
  • 4. 1. ChIP-seq – The Basics ChIP-on-chip ChIP-seq
  • 5. 1. ChIP-seq – The Basics ChIP-on-chip Bioinformatics ChIP-seq
  • 6. 1. ChIP-seq – The Basics 35 bp 500 bp 35 bp
  • 7. 1. ChIP-seq – The Basics
  • 11. 2. Typical pipeline Bowtie MACS
  • 12. 2. Typical pipeline Bowtie MACS CEAS
  • 13. 2. Typical pipeline Mapping… Unique / multiple locations Allowing mismatches – seed sequence Balance accuracy / performance Peak calling…
  • 15. 3. Challenges in ChIP-seq data analysis Millions of segments that need a fast mapping to the genome (allowing mismatches or gaps, performance issues) Peak detection – find the exact binding site Data normalization – compare results, background noise Visualization – thousands of enriched regions. UCSC, JBrowse…
  • 16. 4. To take into account Transcription Factors vs Nucleosomes / Histone modifications Control available? Sequencing depth bias in Control vs IP Different alignment methods produce different peak calling results, but the difference is not as much as the one due to different peak caller or replicate Many differences on peak callers can be explained by the different thresholds used Some peak callers may be specific to some data types Consistency may be used to set threshold if replicates are available
  • 17. 4. To take into account There are many tools for the analysis of ChIP- seq data, but no standards yet
  • 22. 5. Available tools Uses regional averaging to mitigate sample fluctuations in the control library Uses the control to model the distribution across the genome using the Poisson distribution (BG). After identifying candidate peaks significantly enriched over the BG, a local labda is estimated using windows around each peak to eliminate local biases Open-source, open to contributions (Artistic License) and being actively improved Easy to use and fast-responding developers Compares very well to other methods
  • 23. 6. Analysis example QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type) lane5_SNAIL_F9_qseq.txt SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB 1 @3_1_3_89 CACAGTGTCCTCCAGGTTCATCCC................ +3_1_3_89 @FC30C11AAXX:8:1:1649:1790 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG @3_1_3_762 + GCAAACAAATGGCGGAAAGCGGCG................ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663 +3_1_3_762 @FC30C11AAXX:8:1:1655:1811 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT @3_1_90_512 + GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+ +3_1_90_512 @FC30C11AAXX:8:1:1609:1848 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT @3_1_90_1028 + GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%- +3_1_90_1028 @FC30C11AAXX:8:1:1667:1880 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA @3_1_90_1651 + TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566 +3_1_90_1651 @FC30C11AAXX:8:1:1577:1853 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG @3_1_90_1670 + <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
  • 24. 6. Analysis example QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type) lane5_SNAIL_F9_qseq.txt SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB 1 Filter qualities and parse @3_1_3_89 CACAGTGTCCTCCAGGTTCATCCC................ +3_1_3_89 @FC30C11AAXX:8:1:1649:1790 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG @3_1_3_762 + GCAAACAAATGGCGGAAAGCGGCG................ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663 +3_1_3_762 @FC30C11AAXX:8:1:1655:1811 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT @3_1_90_512 + GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+ +3_1_90_512 @FC30C11AAXX:8:1:1609:1848 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT @3_1_90_1028 + GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%- +3_1_90_1028 @FC30C11AAXX:8:1:1667:1880 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA @3_1_90_1651 + TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566 +3_1_90_1651 @FC30C11AAXX:8:1:1577:1853 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG @3_1_90_1670 + <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
  • 25. 6. Analysis example QSEQ files (Solexa's FASTQ with ASCII Phred64, the 3rd FASTQ type) lane5_SNAIL_F9_qseq.txt SOLEXA 90320 5 1 0 476 0 1 .ACGGGGGAGGG.C...CAAC..A...C............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 1222 0 1 .AATTGAAAAAT.A..TTTAA..G...A............ DO[[XVX[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 133 0 1 .CCAGTCTATTAATT.TTGCC..GA..C............ DPXXXYYYYYYBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 145 0 1 .ATTGTTTCTGACTA.TTGAT..GC..T............ BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 153 0 1 .ACCGCTATCAGTAC.TAGCT..GT..A............ DMUYUVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 0 215 0 1 .TGTTGCCATTGCTA.AGGCA..GT..T............ DOVWBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 827 0 1 AGGAGATCGGCCGGTTGATGAGCCGAGTG........... Z__U_]PXYXTGRZ]QXBBBBBBBBBBBBBBBBBBBBBB 0 SOLEXA 90320 5 1 1 56 0 1 AAAAATCGACGCTCAAGTCAGAGGTGGCG........... ababba`_aabaab`UbaBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 1 925 0 1 TGCAGCACTGGGGCCAGATGGTAAGCCCT........... _Z_T]`]M]OLP^^[`WBBBBBBBBBBBBBBBBBBBB 1 SOLEXA 90320 5 1 2 1637 0 1 GGGCTTCTGCCCCGGTGGGTACATGAGTA........... aaa`a`a`aa`aX_`^^^[``BBBBBBBBBBBBBBBBBB 1 Filter qualities and parse @3_1_3_89 CACAGTGTCCTCCAGGTTCATCCC................ +3_1_3_89 @FC30C11AAXX:8:1:1649:1790 abbab^aaaaaaaVUVbBBBBBBBBBBBBBBBBBBBBBB GAAAAGTATTTGCAATTTGTTGCCTCTCATCCAAGAATGAAATTCCTATTG @3_1_3_762 + GCAAACAAATGGCGGAAAGCGGCG................ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6776666666566663 +3_1_3_762 @FC30C11AAXX:8:1:1655:1811 aabba`b`a`]]TXOY`BBBBBBBBBBBBBBBBBBBBBBB GAAGCAGAAGCCTATATACCCTGTAGAACTGGGAGCCAATTAAACCTCTTT @3_1_90_512 + GCTGAGGCAGGAGAATTGCTTGAACTGGGAAGGCAGAGGT <<<<<<<<<<<<<<<<<<5<<<<><<:>:<<;<:;3665537.6+6.33.+ +3_1_90_512 @FC30C11AAXX:8:1:1609:1848 ab`a_a`X``WTGW]T]SZ]T[aXa_T^]XP]H_VXY @3_1_90_1028 GTTACGGCTTATCCTGCACATTACGACCGTTTGCGTAACG BOWTIE GATGTGGTTTCACATAAATTGACATATATAGTTCCAGGCTGTAAATGTTGT + <<;<<;;+<<7:<<<<<<7::7:<<<<<<:7,777402-4.-+*20+0-%- +3_1_90_1028 @FC30C11AAXX:8:1:1667:1880 `bba`X_^ab_aWS`_b[`aa_]TZ^VYa`VW^`^b`a GTTTTATACAAATCAAAACCATAGTGAGATACCATCTCACACTAGTCAGAA @3_1_90_1651 + TAATTTTAGATTTTATCCTTGACATTGTAAATATTACATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<:::::6767366767357566 +3_1_90_1651 @FC30C11AAXX:8:1:1577:1853 aUVF[aa`VU_`aaU[__aaaYV^aP`aQUa`_^_a GTGATGGGAGGAAAGCTAGGGGGCTATAATGTCTATTACAAGGCTCAGTAG @3_1_90_1670 + <6<<<<<<<<<<<<<<<<<<<<<<<<<<<<:99:9646066,6+0604044
  • 26. 6. Analysis example SNAIL_F9.bwt 5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T 5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G 5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0 5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0 5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0 5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0 5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0 5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0 5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G
  • 27. 6. Analysis example SNAIL_F9.bwt 5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T 5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G 5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0 5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0 5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0 5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0 5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0 5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0 5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G Parsing SNAIL_F9.bwt.bed chr21 34604194 34604219 5_1_0_1409 . + chr23 77246408 77246434 5_1_0_811 . + chr02 201785208 201785237 5_1_1_1665 . + chr15 92942360 92942389 5_1_2_1637 . + chr03 101351498 101351527 5_1_2_1359 . + chr05 1314600 1314629 5_1_2_730 . - chr07 157199758 157199787 5_1_2_1118 . - chr11 133317176 133317208 5_1_3_920 . + chr12 7497006 7497038 5_1_3_971 . + chr01 201404048 201404081 5_1_3_1986 . +
  • 28. 6. Analysis example SNAIL_F9.bwt 5_1_0_1409 + gi|51511750|ref|NC_000021.7|NC_000021 34604194 AGTTGCACCTTTAACAATTTCCCAT %/6::9::;;;;7279######### 0 17:G>T,24:G>T 5_1_0_811 + gi|89161218|ref|NC_000023.9|NC_000023 77246408 TTCTGCAAGCCTCCGGAGCGCACGTG BBB@5<?=9<9@>96/:0######## 0 25:C>G 5_1_1_1665 + gi|89161199|ref|NC_000002.10|NC_000002 201785208 GCCCAGCTGTCACTGTGGTTTTGATTTGC BBCCCBBBCBBB@BABBBACCA####### 0 5_1_2_1637 + gi|51511731|ref|NC_000015.8|NC_000015 92942360 GGGCTTCTGCCCCGGTGGGTACATGAGTA BBBABABABBAB9@A??=?<AA####### 0 5_1_2_1359 + gi|89161205|ref|NC_000003.10|NC_000003 101351498 CAATTCCCTCCTTGAAAGGCTCCTCCACC BCCBBBBAAAABA9@B?59@ABA###### 0 5_1_2_730 - gi|51511721|ref|NC_000005.8|NC_000005 1314600 GGACTTCCATGCAAACAAGCTGCTTTCCA ########BB>9@B@;@B<;??ABCBABB 0 5_1_2_1118 - gi|89161213|ref|NC_000007.12|NC_000007 157199758 CATCTTTGATGAGTTACTACCTGTGGGGT ########@B@?=B@;8@659@@BAABAB 0 5_1_3_920 + gi|51511727|ref|NC_000011.8|NC_000011 133317176 GGTAGACTCACAAAACTACCAAAGTCCTCTAC ABABAABCBBBBCBCBCBBBCCA>@@###### 0 5_1_3_971 + gi|89161190|ref|NC_000012.10|NC_000012 7497006 TTTTCATGCAGCCCGAGACATCAAGCTAGCAG B@86646330/250################## 0 31:T>G Parsing SNAIL_F9.bwt.bed chr21 34604194 34604219 5_1_0_1409 . + chr23 77246408 77246434 5_1_0_811 . + chr02 201785208 201785237 5_1_1_1665 . + chr15 92942360 92942389 5_1_2_1637 . + chr03 101351498 101351527 5_1_2_1359 . + chr05 1314600 1314629 5_1_2_730 . - chr07 chr11 157199758 133317176 157199787 133317208 5_1_2_1118 5_1_3_920 . . MACS - + chr12 7497006 7497038 5_1_3_971 . + chr01 201404048 201404081 5_1_3_1986 . +
  • 29. 6. Analysis example MACS pipeline Output: - Peak locations in BED and XLS format (genome browser) - Tag count in wiggle format (genome browser) - Bimodal model in R scripts
  • 31. 6. Analysis example snail_mfold_15_tsize41_newbwt_peaks.bed track name="MACS peaks for snail_mfold_15_tsize41_newbwt" chr1 559644 559924 MACS_peak_1 79.29 chr1 2435221 2435542 MACS_peak_2 51.58 chr1 14624217 14624571 MACS_peak_3 66.12 chr1 15610639 15611000 MACS_peak_4 56.69 chr1 16822564 16822753 MACS_peak_5 52.84 chr1 18411948 18412187 MACS_peak_6 82.46 chr1 22857612 22857985 MACS_peak_7 88.74 chr1 27541904 27542134 MACS_peak_8 69.47 snail_mfold_15_MACS.wig track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS tag counts for every 10 bp" variableStep chrom=chr10 span=10 85171 1 85181 1 85191 1 85201 1 85211 1 85221 1 85231 2 85371 2
  • 32. 6. Analysis example snail_mfold_15_tsize41_newbwt_peaks.bed track name="MACS peaks for snail_mfold_15_tsize41_newbwt" chr1 559644 559924 MACS_peak_1 79.29 chr1 2435221 2435542 MACS_peak_2 51.58 chr1 14624217 14624571 MACS_peak_3 66.12 chr1 15610639 15611000 MACS_peak_4 56.69 chr1 16822564 16822753 MACS_peak_5 52.84 chr1 18411948 18412187 MACS_peak_6 82.46 chr1 22857612 22857985 MACS_peak_7 88.74 chr1 27541904 27542134 MACS_peak_8 69.47 snail_mfold_15_MACS.wig track type=wiggle_0 name="MACS_counts_after_shifting" description="Shifted Merged MACS tag counts for every 10 bp" variableStep chrom=chr10 span=10 85171 1 85181 1 85191 1 85201 1 85211 1 CEAS 85221 1 85231 2 85371 2
  • 33. 6. Analysis example Input: -BED format peak locations - Optional signal profile in wiggle format - BED format extra regions of interest
  • 40. 7. Future challenges Re-analyze data with new algorithms – sequences remain the same ChIP-seq combined with Chromatin Conformation Capture (3C) – long-range physical interactions Technical improvements: RNA-seq will benefit from longer reads Integrated computational analyses – integration of TF, histone marks, methylation, polymerase loading to predict regulatory output
  • 41. 8. Where to look for help... Seqanswers.com
  • 42. 8. Where to look for help... Seqanswers.com Google groups, mailing lists of each project MACS CEAS FindPeaks
  • 43. 8. Where to look for help... Seqanswers.com Google groups, mailing lists of each project MACS CEAS FindPeaks Lab mates!