Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CDAC 2018 Boeva analysis chromatin

15 views

Published on

Presentation at the CDAC 2018 Workshop and School on Cancer Development and Complexity
http://cdac2018.lakecomoschool.org

Published in: Science
  • Be the first to comment

  • Be the first to like this

CDAC 2018 Boeva analysis chromatin

  1. 1. ANALYSIS OF EPIGENETICS AND CHROMATIN STATES IN NORMAL AND CANCER CELLS Valentina BOEVA Institut Cochin, Inserm U1016
  2. 2. Epigenetic profiles = combination of CpG methylation of DNA and histone modifications M. S. Yan et al, J. Appl. Physiol., 2010 -CH3 + Information about the 3D structure of chromatin 2 -CH3
  3. 3. Relation between CpG methylation and gene expression 3 Kapourani and Sanguinetti, Bioinformatics 2016 Cluster 1: Uniformly unmethylated; generally repressed Cluster 2: U-shape profile, hypo-methylation around the TSS surrounded by hyper-methylation; high expression Cluster 3: S-shape profile, hypo-methylated before TSS; intermediate expression Cluster 4: hyper-methylated; repressed Cluster 5: Reverse S-shape, profile hyper-methylated before TSS; intermediate expression
  4. 4. Bisulfite sequencing employed to detect methylation status of Cytosine • Bisulfite treatment transforms unmethylated cytosine in uracil 4
  5. 5. RRBS (Reduced representation bisulfite sequencing) – a cheap way to profile CpG methylation • Using restriction enzyme targeting 5’CCGG3’ sequences 5
  6. 6. DNA methylation arrays • Illumina Infinium MethylationEPIC array (850K) or 450K BeadChip • Agilent 244K array 6
  7. 7. Visualization of the array data in the UCSC genome browser orange = methylated (>= 60%) purple = partially methylated (20% < 60%) bright blue = unmethylated (<= 20%) 7
  8. 8. Epigenetic profiles = combination of CpG methylation of DNA and histone modifications M. S. Yan et al, J. Appl. Physiol., 2010 -CH3 + Information about the 3D structure of chromatin 8
  9. 9. Histone modifications correlate with gene transcription levels • Histone modifications Bhaumik et al, Nat Str & Mol Biol, 2007 Li et al, Cell, 2007 9
  10. 10. Histone modifications correlate with gene transcription levels H4K20me1H3K9acH3K9me3 Haitham Ashoor Correlation of different histone marks with gene expression H3K27me3 H3K36me3 H3K79me2 TSS TSS TSS TSS TSS TSS Average density of histone modification signal and have specific distribution around gene Transcription Start Sites (TSSs) 10 HeLa-S3 cell line +30Kb-30Kb
  11. 11. With histone marks, one can predict gene expression ENCODE Project Consortium, Nature, 2012 R=0.9 11
  12. 12. ChIP-seq technique can provide information about modifications of histone tails Mains steps of ChIP-Seq technique: 12 ChIP-seq = chromatin immunoprecipitation + sequencing
  13. 13. ChIP-seq technique can provide information about modifications of histone tails Mains steps of ChIP-Seq technique: 35-100bp Cluster of reads (peak) in the UCSC genome browser 13 Q?
  14. 14. Analysis of ChIP-seq data: density profile calculation chromosome reads putative fragments density 4 2 binned density We calculate the density both for the ChIP and control sample 0 .wig file 14
  15. 15. Visualization of ChIP-seq signal in UCSC GB or IGV IGV Normal Cancer 15
  16. 16. Peak calling: detection of coordinates of regions enriched in a given histone mark CLB-GA neuroblastoma cell line ZMYZ1 H3K27ac H3K27ac peaks H3K4me3 H3K4me3 peaks Active promoter Active enhancer ~70kb
  17. 17. Histone modifications form groups and indicate distinct chromatin states • Histone modifications, histone variants, binding sites (Pol II, CTCF, p300,…) chromatin states ENCODE Project Consortium, Nature, 2012 17
  18. 18. Histone modifications form groups and indicate distinct chromatin states • Histone modifications, histone variants, binding sites (Pol II, CTCF, p300,…) chromatin states ENCODE Project Consortium, Nature, 2012 18
  19. 19. Histone modifications form groups and indicate distinct chromatin states • Histone modifications, histone variants, binding sites (Pol II, CTCF, p300,…) chromatin states ENCODE Project Consortium, Nature, 2012 19
  20. 20. Histone modifications form groups and indicate distinct chromatin states • Histone modifications, histone variants, binding sites (Pol II, CTCF, p300,…) chromatin states ENCODE Project Consortium, Nature, 2012 R Predicted repressed or low-activity region T Predicted transcribed region WE Predicted weak enhancer or open chromatin cis-regulatory element E Predicted enhancer CTCF CTCF-enriched element PF Predicted promoter flanking region TSS Predicted promoter region including TSS 7states 20
  21. 21. Histone modifications form groups and indicate distinct chromatin states Ernst & Kellis, Nature Biotechnology, 2010 Input chromatin mark information and resulting chromatin state annotation for a 120-kb region of human chromosome 7 surrounding the CAPZA2 gene 51states
  22. 22. How many states to select? • Use your biological intuition • Score each model based on the log likelihood of the model minus a penalization on the model complexity determined by the Bayesian Information Criterion (BIC) of one-half the number of parameters times the natural log of the number of intervals "There are three kinds of lies: lies, damned lies, and statistics." Benjamin Disraeli
  23. 23. What is going on with chromatin states and overall epigenetic profiles in cancer? • Do epigenetic states change compared to normal ancestral cells? • Is there any global phenomenon related to epigenetics present in cancer cells? Lung cancer close-up. MOREDUN ANIMAL HEALTH LTD/SPL / Gettyimages 23
  24. 24. Histone and CpG-methyl modifying proteins are often mutated or deleted in cancer Timp & Feinberg, Nature Rev. Cancer, 2013 Epigenome-modifyinggenemutationsinhumancancer 24 Q? More than 50% of human cancers harbor mutations in enzymes that are involved in chromatin organization
  25. 25. Changes in CpG methylation are common in cancer • Loss of imprinting (e.g. of IGF2) • Hypermethylation of CpG islands of tumor suppressor genes • Genome-wide DNA hypomethylation 25
  26. 26. DNA methylation status can be associated with tumor aggressiveness Kaplan–Meier curves showing the correlation of pre- biochemotherapy serum ER-α methylation status with OS (p = 0.003) ER-α methylation Skin cancer Kaplan–Meier survival curves of biochemotherapy patients: Correlation of pre-BC serum RASSF1A methylation BM with overall survival (p = .013). RASSF1A methylation Skin cancer Mori et al, 2006; From Mori et al, 2005 26
  27. 27. DNA methylation status can define cancer subtypes (and be associated with tumor aggressiveness) The degree of DNA methylation of 553 genes directly correlates with poor prognosis in ACCs CpG island methylator phenotype Non - CpG island methylator phenotype Barreau et al., J Clin Endocrinol Metab., 2013) CpG island methylator phenotype in adrenocortical carcinomas 27
  28. 28. CpG island methylator phenotype (CIMP) can be associated with good or poor prognosis in different cancers 28 Hughes et al., Cancer Research 2013
  29. 29. Cancer treatment with inhibitors of DNMTs 29 Survival stratified by target gene methylation status. (Promoter methylation of APC, CDH13, RASS1a, and CDKN2a) Juergens et al., CANCER DISCOVERY 2011 A phase I/II trial of combined epigenetic therapy with azacitidine and entinostat, inhibitors of DNA methylation and histone deacetylation, respectively, in extensively pretreated patients with recurrent metastatic non–small cell lung cancer.
  30. 30. LRES & LOCKs: Global changes in epigenetic patterns in cancer • Histone modification patterns are altered in human tumors – Gain of Long Range Epigenetic Silencing (LRES) – Loss of Large organized chromatin- lysine-(K9) modifications (LOCKs) S.J. Clark, Hum. Mol. Genet., 2007 Hypothetical view of LRES in cancer 30 B. Wen, Hum. Nat. Genet., 2009
  31. 31. Example of epigenetic silencing of HOXD gene cluster in bladder cancer Cluster of HOXD genes repressed by epigenetic mechanisms (PRC2) Enrichment in repressive histone mark H3K27me3 31
  32. 32. Cancer treatment with Ezh2 inhibitors 32 Knutson et al., PNAS 2013 EPZ-6438 Kim & Roberts, Nature Medicine, 2016 rhabdoid tumors mutated SMARCB1
  33. 33. Creation of cancer specific super-enhancers Super-enhancer has high H3K27ac 33 Whyte et al., Cell 2013
  34. 34. Example: Detection of super-enhancer regions using HMCan and LILY 34 H3K27ac profiles in NB and normal cells Controls Example: SE in PHOX2B in NB cell lines NB cell line NB cell line NB cell line NB cell line Ashoor et al. 2013, Bioinformatics Boeva et al. 2017, Nature Genetics
  35. 35. Creation of cancer specific super-enhancers Super-enhancer has high H3K27ac 35
  36. 36. Creation of cancer specific super-enhancers Hnisz et al., Cell 2013 Super-enhancer has high H3K27ac 36 Colorectal cancer
  37. 37. Analysis of histone modification profiles can suggest “epigenetic” treatment for cancer patients Chipumuro et al, Cell, 2014 Neuroblastomas with MYCN-amplification have a specific epigenetic profile (super-enhancers) MYCN-amplified cells are sensitive to a specific drug (CDK7-inhibitor) Application of this drug reduces tumor volume 37
  38. 38. De novo enhancer creation or enhancer hijacking • T-cell acute lymphoblastic leukemia: somatic mutations => binding motifs for MYB => a super-enhancer upstream of the TAL1 oncogene • Neuroblastoma: TERT activation via enhancer hijacking • Medulloblastoma: GFI1 family oncogenes activation via enhancer hijacking Mansour et al, Science, 2014 Northcott et al. Nature, 2014 Peifer et al., Nature, 2015 38
  39. 39. Rewiring of core regulatory circuitries (CRCs) in cancer In cancer: 39 Normal cell Cancer cell TFs gain/lose SEs (+ Number of gene copies change and affect expression) Cell identity change Change in transcriptional networks
  40. 40. Rewiring of core regulatory circuitries (CRCs) in cancer • CRCs = set of TFs that autoregulate themselves and define cell identity in normal cells 40 Saint-André et al, Genome Research, 2016
  41. 41. Rewiring of core regulatory circuitries (CRCs) in cancer • CRCs = set of TFs that autoregulate themselves and define cell identity in normal cells 41 Saint-André et al, Genome Research, 2016
  42. 42. Summary • DNA methylation & Histone modifications/histone variants have direct effect on gene transcription • Histone modifications form groups and indicate distinct chromatin states • Epigenetic profiles change in cancer compared to normal ancestral cells (>30 epigenome-modifying proteins can be mutated in different cancers; ½ cancers have at least one mutation in a chomatin gene) • These changes can be used to stratify patients and/or define efficient ‘epigenetic’ drugs • Discovery of genetic event associated with oncogenic epigenetic changes may provide clinical markers for patient stratification 42
  43. 43. Computational strategies for the analysis of ChIP-seq data in cancer and normal cells 43
  44. 44. Mains steps of ChIP-Seq technique + Control (e.g., input DNA) 35-100bp Valouev et al., Nat Methods 2008 44 >20M reads
  45. 45. Framework for the analysis of histone modification profiles & TFBSs • Nebula: web-service for analysis of ChIP-seq data V. Boeva, A. Lermine et al, Bioinformatics, 2012 nebula.curie.fr 45
  46. 46. Nebula: web-service for analysis of ChIP-seq data • Peak calling • Calculation of the density and cumulative distribution of peak locations relative to gene transcription start sites • Annotation of peaks with genomic features and genes with peak information 0.00.20.4 0 0.5 1 1.5 2 down-regulated no-response up-regulated Distance from TSS (Kb) Proportionofgeneswithapeak atagivendistance(cumulative) -2000 -1000 0 1000 2000 2e-076e-07 ChIP Control Distance from TSS (bp) Proportionofgeneswithapeak atagivendistance(density) Enh. Prom. Imm.Down. Intrag. GeneDown. F.Intron Exons 2,3,etc.Introns E.I.Junctions Proportionofgeneswithapeak 0.00.10.20.30.40.5 down-regulated no-response up-regulated Control 10 20 30 40 50 110010000 Peak height Peakcount ChIP Control GeneDown. Enh. Imm.Down. Interg. Intrag. Prom. Proportionofpeaks 0.00.10.20.30.4 ChIP Control D E CBA Some graphs produced produced by Nebula V. Boeva, A. Lermine et al, Bioinformatics, 2012 46
  47. 47. There is another Nebula instance (https://galaxy-public.curie.fr/)
  48. 48. There are other ChIP-Seq tool boxes (http://cistrome.org/ap/)
  49. 49. Read alignment: .fastq format 49 • Illumina or SOLiD row data format: . fastq A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect). Phred quality score: 𝑄 = −10 log10(𝑝) 10 corresponds to probability of error = 0.1 20 corresponds to probability of error = 0.01 30 corresponds to probability of error = 0.001 @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC Sanger Phred+64 Phred+33 @HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNN +HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBB @3_36_77_R17C1 T23031.313.20222.2.0220222.2.2.22002.2.2222222..222 + '/%&/!&'#!#%##&!%!%$&#%##!#!#!$##$&!#!%*##'%,!!(#)
  50. 50. Read alignment to the reference genome • Any tool will be OK: – BWA – Bowtie – GEM – Novoalign 50 ACTGATGCGATGCATGCGATGCTGCATTACGGCATGCTAGCTAGCTGCAGTAGATCGCA ATGCTGCATTACGGA Read (50-150bp) Genome (30Mb-3Gb)
  51. 51. SAM and BAM (binary SAM) • SAM = Sequence Alignment/Map 51
  52. 52. BED format • Rarely used for reads 52
  53. 53. Detection of regions enriched in H3K27ac (peak calling) H3K4me3 signal H3K4me3 peaks Sequenced reads (.BAM)
  54. 54. .wig file for ChIP-seq signal density 54
  55. 55. .bed file for ChIP-seq peaks 55 chr1 798049 798600 peak6 32.882393 + chr1 798649 798900 peak7 18.051716 + chr1 803999 804950 peak14 34.563721 + chr1 806149 806500 peak18 31.643387 + chr1 806599 807250 peak19 16.159706 + chr1 807799 808100 peak22 17.287043 +
  56. 56. BED format for peaks (or similar format) 56
  57. 57. There is > a dozen tools to detect read clusters (or peaks)  HMCan  GLITR  F-Seq  SICER  FindPeaks  QuEST  PeakSeq  Spp  MACS  ERANGE  Useq  SiSSRs 57  CCAT  FindPeaks  MACS2  ZINBA  HMCan  BayesPeak  SICER  MOSAiCS  CisGenome  MUSIC  MACS  BroadPeak TFs and narrow histone marks: Narrow and/or broad histone marks:
  58. 58. There exist two main methods to construct peaks Read clusters Peaks Tag extension Fragmentcount Adopted from S. Pepke et al., 2009 Nat Methods + different statistical methods to eliminate ‘false’ peaks (low or short peaks) two ways 58
  59. 59. Quality measures: ChIP-seq signal-to-noise ratio From the ENCODE consortium: Fraction of reads in peaks (FRiP): FRiP  = Npeak/Nnonred Npeak is the number of reads falling within peak regions Cross-correlation profiles (CCPs): - Normalized strand coefficient NSC  =  Cfrag/Cmin - Relative strand correlation RSC = (Cfrag − Cmin)/(Cread − Cmin) where Cmin is the minimum CC observed; Cfrag is CC corresponding to the fragment length; Cread is CC corresponding the read length. 59 FRiP>1% NSC  ≥ 1.05 and an RSC  ≥ 0.8
  60. 60. Quality measures: irreproducible discovery rate The irreproducible discovery rate (IDR) assesses the rank consistency of common peaks between two replicates. Based on a copula mixture model, IDR estimates the reproducibility of each peak pair, and reports the expected rate of irreproducible discoveries in the obtained peaks in a similar way to the FDR. Package 'idr' at CRAN-R 60
  61. 61. The irreproducible discovery rate (IDR) framework for assessing reproducibility of ChIP-seq data sets Stephen G. Landt et al. Genome Res. 2012;22:1813-1831© 2012, Published by Cold Spring Harbor Laboratory Press
  62. 62. Primary analysis of ChIP-seq data • Peak calling • Differential peak calling (Condition 1 vs 2) • Detection of chromatin states • Super-enhancer calling 62
  63. 63. One should apply specific methods to detect histone modifications in cancer • Specific feature of cancer samples: large copy number changes Lung Adenocarcinoma 24 color karyotype 63
  64. 64. Standard methods for signal detection can miss signal in regions of loss in cancer Copy number profile MACS SICER H3K27me3peaks Position along chr8 Peaks predicted by tools: Zhang,Y. et al. (2008) Genome Biol., 9, R137 Zang,C. et al. (2009) Bioinformatics, 25, 1952–1958. chr8 64
  65. 65. Solution: explicit normalization for copy number status • Hidden Markov model after correction of ChIP-seq signal for copy number and GC-content bias H. Ashoor et al, Bioinformatics, 2013 Software: HMCan www.cbrc.kaust. edu.sa/hmcan 65
  66. 66. HMCan uses FREEC’s algorithm for annotation of copy number alterations Copy number profile for Hela-S3 cell line obtained using the Input data (ENCODE dataset) V. Boeva et al, Bioinformatics, 2011 66
  67. 67. Peaks predicted by HMCan do not show copy number bias H. Ashoor et al, Bioinformatics, 2013 67 Copy number HMCan MACS SICER
  68. 68. Detection of changes in histone marks between two conditions 68
  69. 69. HMCan-diff: a method to detect changes in histone marks in cells with different genetic backgrounds 69 H. Ashoor et al, Submitted data simulated without copy number bias data simulated with copy number bias
  70. 70. HMCan-diff: a method to detect changes in histone marks in cells with different genetic backgrounds 70 H. Ashoor et al, Submitted • Library size correction • GC-content correction • Copy number correction • Variable signal-to-noise ratio correction • Iterative Hidden Markov Models
  71. 71. ChIP-seq post-processing methods: calling chromatin states • ChromHMM and Segway were developed to systematically identify the specific combination patterns of histone modifications as a chromatin state 71 R Predicted repressed or low-activity region T Predicted transcribed region WE Predicted weak enhancer or open chromatin cis-regulatory element E Predicted enhancer CTCF CTCF-enriched element PF Predicted promoter flanking region TSS Predicted promoter region including TSS 7states
  72. 72. Definition of Super-enhancers using H3K27ac read counts ROSE Enhancer rank H3K27acreadcount(ChIP-Input) Super-enhancers Enhancers
  73. 73. For super-enhancer calling in cancer data: use LILY 73 Without copy number correction With copy number correction LILYROSE V. Boeva et al, Nature Genetics, 2017
  74. 74. Summary of the Methods part • Main steps of the primary ChIP-seq data analysis: – Alignment of reads – Peak calling – Quality controls – Annotation of peaks according to genes • Possible next steps: – ROSE or LILY for H3K27ac – ChromHMM for chromatin states 74

×