Your SlideShare is downloading. ×
  • Like
Macs course
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Macs course

  • 4,866 views
Published

 

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,866
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
146
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ChIP-seq analysis Luca Cozzuto BioinformaticsCore
  • 2. ChIP-seq analysis•ChIP-seq is the combination of chromatin immuno-precipitation with ultra-sequencing.• Allows to detect genomic portions bound by proteins suchas: • Transcription factors •Histones • Polymerase II •…
  • 3. ChIP-seq analysisTypical workflow
  • 4. ChIP-seq analysisTypical workflow
  • 5. ChIP-seq analysisStarting the analysis.• Typically you will receive from 10 to 30 millions of rawreads per sample corresponding to a zipped file of 0.5-1.5Gbytes.FASTQ format@HWUSI-EAS621:69:64EKPAAXX:3:1:11477:1265 1:N:0: @(HEADER)GAAACTTGAGGACTGCCCAGCTCGACAGACACTGGA (SEQUENCE)+ +(HEADER)GEGGDGG@GGDGGGGGGGBDGGDG8GG@3D6:3:67 (QUALITY)The quality is encoded with a ASCII character andrepresents the Phred quality score.p = probability that that base call is incorrectQ = 20 means base call accuracy of 99%
  • 6. ChIP-seq analysisStarting the analysis.• It is strongly recommended to check the quality of thesequences we received before doing the analysis! Fastqc analysis
  • 7. ChIP-seq analysisStarting the analysis.Mapping by using ultra-fast mappers: • GEM • Bowtie • BWA • StampyIt is required to index the reference genome before doingthe analysis.
  • 8. ChIP-seq analysisPeak calling – MACSModel-based Analysis of ChIP-Seq data. TF
  • 9. ChIP-seq analysisPeak calling – MACS Sequences from IP TF
  • 10. ChIP-seq analysisPeak calling – MACS Sequences from IP TFSequenced tags on+ strand- strand
  • 11. ChIP-seq analysisPeak calling - MACS
  • 12. ChIP-seq analysisPeak calling – MACSGiven a sonication size (bandwith) and a fold-enrichment(mfold), MACS slides 2*bandwidth windows across the genome to findregions enriched to a random tag genome distribution >= mfold (defaultbetween 10 and 30).
  • 13. ChIP-seq analysisPeak calling – MACSMACS select at least 1,000 “model peaks” for calculating the distance“d” between paired peaks.
  • 14. ChIP-seq analysisPeak calling – MACSHow to determine if peaks are greater than expected by chance?•x = observed read number•λ= expected read numberProbability to find a peak higher than x.Tag distribution along the genome could be modeled by a Poissondistribution.
  • 15. ChIP-seq analysisPeak calling – MACSExample:Tag count = 2Number of reads = 30,000,000Read length = 36Mappable human genome = 2,700,000,000
  • 16. ChIP-seq analysisPeak calling – MACSExample:Tag count = 10Number of reads = 30,000,000Read length = 36Mappable human genome = 2,700,000,000
  • 17. ChIP-seq analysisPeak calling – MACS• shifting each tag d/2 to the 3’• sliding windows with 2*d length across the genome todetect the enriched regions (Poisson distribution p-value<= 1e-5).• Overlapping enriched regions are fused.• Summit of the peak is considered the putative binding site TF
  • 18. ChIP-seq analysisPeak calling – MACSIn order to address local biases in the genome such as local chromatinstructure, sequencing bias, genome copy number variation… MACSevaluates candidates peaks by comparing them against a “local”distribution. Fold enrichment = Enrichment over the λlocal
  • 19. ChIP-seq analysisPeak calling – MACSFalse Discovery Rate (FDR) is calculated as number of control peakscalled / number of sample peaks. Control peaks are calculated byswapping control and sample. FDR is calculated only when a control is provided!
  • 20. ChIP-seq analysis Practical part
  • 21. ChIP-seq analysis Practical partConnect to the Etna machine by using ssh. • MAC or Linux users can do using this command$ ssh –X course@xxx.crg.escourse@xxx.crg.esspassword: Password:xxxxxxx • Windows users should first download Putty and PSCP programs and then use them for accessing that machine. http://goo.gl/4BWud
  • 22. ChIP-seq analysis course@xxx.crg.esPassword:xxxxxx
  • 23. ChIP-seq analysisDifferent formats can be used as input files:BED, ELAND, SAM, BAM, BOWTIE and for paired endsELAND-MULTIPET $ head ../data/Input_tags.bed chr1 233604 233639 0 2 - chr1 559767 559802 0 3 + chr1 742600 742635 0 2 + chr1 742600 742635 0 0 + chr1 744231 744266 0 0 + chr1 744307 744342 0 2 - chr1 746885 746920 0 2 + chr1 746958 746993 0 1 + chr1 748226 748261 0 2 + chr1 748357 748392 0 0 -Bed fields: chromosome name, start, end, name, scorestrand
  • 24. ChIP-seq analysis Launching MACS passing the sample, the control, the genome size (hs = homo sapiens) and the name$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs-n FoxA1
  • 25. ChIP-seq analysis Check the output printed to the screen.$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1INFO @ Thu, 29 Mar 2012 14:58:35:# ARGUMENTS LIST:# name = FoxA1# format = AUTO# ChIP-seq file = ./Treatment_tags.bed# control file = ./Input_tags.bed# effective genome size = 2.70e+09# band width = 300# model fold = 10,30# pvalue cutoff = 1.00e-05# Small dataset will be scaled towards larger dataset.# Range for calculating regional lambda is: 1000 bps and 10000 bpsINFO @ Thu, 29 Mar 2012 14:58:35: #1 read tag files...INFO @ Thu, 29 Mar 2012 14:58:35: #1 read treatment tags...INFO @ Thu, 29 Mar 2012 14:58:35: Detected format is: BEDRegional lambda has two values in this version: small toconsider bias around the summit and large for thesurrounding area.
  • 26. ChIP-seq analysis Check the output printed to the screen.INFO @ Thu, 29 Mar 2012 14:59:41: #1 tag size is determined as 35 bpsINFO @ Thu, 29 Mar 2012 14:59:41: #1 tag size = 35INFO @ Thu, 29 Mar 2012 14:59:41: #1 total tags in treatment: 3909805..INFO @ Thu, 29 Mar 2012 14:59:46: #2 Build Peak Model...INFO @ Thu, 29 Mar 2012 15:00:00: #2 number of paired peaks: 11861INFO @ Thu, 29 Mar 2012 15:00:00: #2 finished!INFO @ Thu, 29 Mar 2012 15:00:00: #2 predicted fragment length is 119 bpsINFO @ Thu, 29 Mar 2012 15:00:00: #2.2 Generate R script for model : FoxA1_model.rINFO @ Thu, 29 Mar 2012 15:00:00: #3 Call peaks...INFO @ Thu, 29 Mar 2012 15:00:00: #3 shift treatment dataINFO @ Thu, 29 Mar 2012 15:00:01: #3 merge +/- strand of treatment dataINFO @ Thu, 29 Mar 2012 15:00:01: #3 call peak candidatesINFO @ Thu, 29 Mar 2012 15:00:13: #3 shift control dataINFO @ Thu, 29 Mar 2012 15:00:13: #3 merge +/- strand of control dataINFO @ Thu, 29 Mar 2012 15:00:15: #3 call negative peak candidatesINFO @ Thu, 29 Mar 2012 15:00:25: #3 use control data to filter peak candidates...INFO @ Thu, 29 Mar 2012 15:00:31: #3 Finally, 13591 peaks are called!INFO @ Thu, 29 Mar 2012 15:00:31: #3 find negative peaks by swapping treat and controlINFO @ Thu, 29 Mar 2012 15:00:36: #3 Finally, 594 peaks are called!
  • 27. ChIP-seq analysisOutput files•FoxA1_model.r• FoxA1_negative_peaks.xls• FoxA1_peaks.bed• FoxA1_peaks.xls• FoxA1_summits.bed
  • 28. ChIP-seq analysis MACS peak model$R --vanilla < FoxA1_model.r..$evince FoxA1_model.pdf
  • 29. ChIP-seq analysis FoxA1_peaks.xls - 10*LOG10 fold_enri chr start end length summit tags (pvalue) chment FDR(%)chr1 858357 858641 285 128 6 51 13.93 4.09chr1 998955 999229 275 106 9 74.39 18.28 0.26chr1 1050021 1050286 266 154 13 152 52.23 0chr1 1684288 1684577 290 176 9 89.7 32.14 0.01chr1 1775031 1775371 341 270 6 51.08 16.71 4.06chr1 1780682 1780965 284 183 6 61.17 19.9 1.45FoxA1_negative_peaks.xls - 10*LOG1 fold_enric chr start end length summit tags 0(pvalue) hment chr1 7155010 7155530 521 311 9 61.64 44.47 chr1 11265816 11266025 210 106 6 59.86 38.12 chr1 18597004 18597307 304 188 8 66.25 31.77 chr1 33412779 33412964 186 94 6 58.68 22.92 chr1 33759125 33759514 390 234 9 62.88 19.77 chr1 37102727 37102952 226 114 6 55.14 31.51
  • 30. ChIP-seq analysis FoxA1_peaks.bed chr, start, end, peak id and score = -10*LOG10(pvalue) chr1 858356 858641 MACS_peak_1 51 chr1 998954 999229 MACS_peak_2 74.39 chr1 1050020 1050286 MACS_peak_3 152 chr1 1684287 1684577 MACS_peak_4 89.7 chr1 1775030 1775371 MACS_peak_5 51.08 chr1 1780681 1780965 MACS_peak_6 61.17 chr1 1923146 1923449 MACS_peak_7 164.87FoxA1_summits.bedchr, start, end, peak id and score = height of the summit chr1 858483 858484 MACS_peak_1 4 chr1 999059 999060 MACS_peak_2 7 chr1 1050173 1050174 MACS_peak_3 12 chr1 1684462 1684463 MACS_peak_4 8 chr1 1775299 1775300 MACS_peak_5 4 chr1 1780863 1780864 MACS_peak_6 4 chr1 1923347 1923348 MACS_peak_7 14
  • 31. ChIP-seq analysis$macs14 -t ../data/Treatment_tags.bed -c ../data/Input_tags.bed -ghs -n FoxA1 -w-w option allows to create“wiggle” files for eachchromosome analyzed.-B option creates “bedgraph” files.-S option together with either –w or –B creates a singlehuge file for the whole genome.--space=NUM can be used for change the resolution of thewiggle file
  • 32. ChIP-seq analysisUpload files in the UCSC genome browserhttp://genome.ucsc.edu/index.html
  • 33. ChIP-seq analysisUpload files in the UCSC genome browserhttp://genome.ucsc.edu/index.html
  • 34. ChIP-seq analysisUpload files in the UCSC genome browserhttp://genome.ucsc.edu/index.html
  • 35. ChIP-seq analysisUpload files in the UCSC genome browserhttp://genome.ucsc.edu/index.html
  • 36. ChIP-seq analysisUpload files in the UCSC genome browser
  • 37. ChIP-seq analysisUpload files in the UCSC genome browserPeak example: chr22:20141500..20141987
  • 38. ChIP-seq analysis Analyze histone modifications • Broader peaks • No clear shape (more summits) • The peak model is often impossible to create.$macs14 -t ../data/ES.H3K27me3.bed –g mm --nomodel --nolambda -n H3K27me3 • It is recommended to skip the model with the --nomodel option. • Since no control is available the comparison will be done against the sample background. It is recommended to skip the local background when you have no control and very broad peaks.
  • 39. ChIP-seq analysisUpload files in the UCSC genome browserPeak example: chrX:47,922,749-47,926,228
  • 40. ChIP-seq analysisGalaxy platformSoon a local installation at CRG!!! https://main.g2.bx.psu.edu/
  • 41. ChIP-seq analysisBibliography:• http://en.wikipedia.org/wiki/File:ChIP-sequencing.svg•http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html• http://liulab.dfci.harvard.edu/MACS/•http://sourceforge.net/apps/mediawiki/gemlibrary/index.php?title=The_GEM_library• http://bio-bwa.sourceforge.net/•http://www.well.ox.ac.uk/project-stampy•http://bowtie-bio.sourceforge.net/index.shtml•http://genome.ucsc.edu/•http://www.r-project.org/•http://www.bioinformatics.babraham.ac.uk/projects/fastqc/•https://main.g2.bx.psu.edu/root