Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computational models for the analysis of gene
expression regulation and its alteration
Anthony Mathelier
Centre for Molecu...
One genome, multiple cells, transcriptomes, and proteomes
Source : http ://stemcells.nih.gov
Sources : D. Melton, D. Pyott...
Multiple layers of gene expression regulation
TSS
Transcriptionalregulation
Post-transcriptional
regulation
Transcription ...
Multiple layers of gene expression regulation
TSS
Transcriptionalregulation
Post-transcriptional
regulation
Transcription ...
Outline
1. Genome scale identification and analyzes of transcription
factor binding sites (TFBSs) alterations
Identification...
Genome scale identification and
analyses of TFBS alterations
TSS
adapted from Kelvin Song's work on Wikimedia Commons
Trans...
Whole genome sequencing (WGS) era
Figure from Atif Rahman.
Previous analyzes of patients’ genomes focused on the ∼ 2%
of t...
Cis-regulatory mutations may impact gene expression
adapted from
Arenillas et al., poster at AMIA Conference, 2012.
7
Cis-regulatory mutations may impact gene expression
adapted from
Arenillas et al., poster at AMIA Conference, 2012.
One ne...
Genome-scale data capturing TFBSs : ChIP-seq
>Seq 1
AGTTCAAAGTTCAAGTTCAAA
GTTCAAGTTCAAAGTTCAAGT
TCAAAGTTCA
>Seq 2
AGTCCAAA...
Modeling TFBSs
PFMs reflect the preferred binding motifs associated to TFs.
9
Scoring potential TFBSs
10
JASPAR
Largest open-access database
of manually curated TF
binding profiles.
Mathelier et al., NAR, 2016.
Subset # TF bindi...
TFBSs as cis-regulatory elements
477 ChIP-seq data sets from ENCODE and the literature.
103 TFs with a JASPAR TF binding p...
Somatic mutations in B-cell lymphomas
WGS (normal and tumour cells) :
cohort 1 : 40 diffuse large B-cell lymphomas
cohort 2...
Promoters are frequent targets of cis-regulatory mutations
A BCohort 1 Cohort 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
...
Promoters of apoptotic, B-cell, and cancer pathway genes
are frequent targets of cis-regulatory mutations
lymphoma
small c...
Landscape of mutations and altered gene expression
mutation
TFBS
Protein-coding exon
TSS
A
B
C
xseq input
Ding et al., Nat...
Landscape of mutations and altered gene expression
mutation
TFBS
Protein-coding exon
TSS
A
B
C
xseq input
PC and disruptin...
Landscape of mutations and altered gene expression
chronic
myeloid
leukemia
erbb signaling
pathway
acute myeloid
leukemia
...
Summary
We analyzed ∼ 700, 000 somatic mutations from 84 B-cell
lymphoma samples
We characterized a set of cis-regulatory ...
The next generation of TFBS
prediction
TSS
adapted from Kelvin Song's work on Wikimedia Commons
Transcription factors
Gene...
Dinucleotide dependencies
PWMs are doing a great job but do not allow for modeling
dinucleotide dependencies shown in :
cr...
Transcription Factor Flexible Models (TFFMs) for
modeling TFBS
1st-order HMM :
A state models the background
Each position...
TFFMs workflow from ChIP-seq data
>HNF4A 1
AGTTCAAAGTTCA
>HNF4A 2
AGTCCAAAGTTCA
...
>HNF4A 73554
CTTGGAACCGGGG
>HNF4A 73555...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are informative
TFFMs allow to analyze the neighbouring dinucleotide
dependencies :
1 2 3 4 5 6 7 8 9 10 11 12 13 14...
TFFMs are performing better than weight matrices
TFFMs
Similar
WMs
1st-order TFFM
detailed TFFM
PWM
DWM
96 ChIP-seq data s...
Allowing for flexible length motifs : JunD
0
20
40
60
80
100
100 80 60 40 20 0
Sensitivity
Specificity
1st-order HMM
detail...
The TFFM framework
The implemented Python framework
(http://cisreg.ca/TFFM/doc/) is based on the continuously
maintained G...
Transcription Factor Flexible Models
Pros :
Consider dinucleotide composition
Are able to model flexible length TFBSs
Bette...
DNA shape features
The DNAshape tool allows for the prediction of four DNA shape
features in a high throughput manner.
Con...
Combining DNA sequence and structure to predict TFBSs
Studies showed DNA shapes importance to model TFBSs from :
SELEX-seq...
Combining DNA sequence and structure to predict TFBSs
Studies showed DNA shapes importance to model TFBSs from :
SELEX-seq...
Combining DNA sequence and shape features at TFBSs
A G A A G C C A G A A A A G G C A C C C A
PSSM / TFFM
MGW
2nd order MGW...
Overview of the ChIP-seq data sets considered
400 ENCODE ChIP-seq experiments are considered with ChIP’ed
TFs are coming f...
Considering DNA shape features improve predictive power
A
B
32
Considering DNA shape features improve predictive power
A
B D
C
DNA shape features are most benefitial for E2F and MADS
fam...
DNA shape features at TFBS flanking sequences further
improve discriminative power
A B
33
Summary
Our analyses of ChIP-seq data reprensent the in vivo
conterpart of the published in vitro studies.
We show that co...
Acknowledgements
Wyeth Wasserman
Allen Zhang
David Arenillas
Rebecca Worsley-Hunt
all the Wasserman lab
Sohrab Shah
Calvin...
TSS
adapted from Kelvin Song's work on Wikimedia Commons
Transcription factors
Gene
Thank you
TSS
adapted from Kelvin Song...
Upcoming SlideShare
Loading in …5
×

Computational models for the analysis of gene expression regulation and its alteration

189 views

Published on

Anthony Mathelier has recently been appointed as a new group leader in Computational Biology at the NCMM in Oslo, focusing on computational methods to study gene regulation. He will present one of his recent studies that coupled experimental data and targeted computational analysis with TF binding profiles to interpret cis-regulatory somatic mutations of 84 matched tumour-normal whole genomes from B-cell lymphomas. Transcription factor binding sites (TFBSs) representing the core of gene cis-regulation, he will finally introduce new models to improve the prediction of TFBSs from ChIP-seq data.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Computational models for the analysis of gene expression regulation and its alteration

  1. 1. Computational models for the analysis of gene expression regulation and its alteration Anthony Mathelier Centre for Molecular Medicine Norway (NCMM) Nordic EMBL Partnership for Molecular Medicine anthony.mathelier@ncmm.uio.no @AMathelier 2016 June 14th 1
  2. 2. One genome, multiple cells, transcriptomes, and proteomes Source : http ://stemcells.nih.gov Sources : D. Melton, D. Pyott, Google figs, wiki commons. In human, more than 400 distinct cell types arise from a single totipotent cell. They all share the same genome but express different genes in a time- and space-specific manner. 2
  3. 3. Multiple layers of gene expression regulation TSS Transcriptionalregulation Post-transcriptional regulation Transcription factors, epigenetics, open chromatin, close chromatin, etc. miRNAs, mRNA localization, RNA-binding proteins, splicing, etc. adapted from Kelvin Song's work on Wikimedia Commons Enhancers Transcription factors RNA polymerase PromoterGene Messenger RNA strand RNA silencing complex miRNA 3' 5' 3
  4. 4. Multiple layers of gene expression regulation TSS Transcriptionalregulation Post-transcriptional regulation Transcription factors, epigenetics, open chromatin, close chromatin, etc. miRNAs, mRNA localization, RNA-binding proteins, splicing, etc. Enhancers Transcription factors RNA polymerase PromoterGene Messenger RNA strand RNA silencing complex miRNA 3' 5' adapted from Kelvin Song's work on Wikimedia Commons 3
  5. 5. Outline 1. Genome scale identification and analyzes of transcription factor binding sites (TFBSs) alterations Identification and analysis of cis-regulatory mutations Cis-regulatory mutations and gene expression alteration 2. The next generation of TFBS prediction Transcription factor flexible models (TFFMs) DNA shape features improve TFBS prediction in vivo 4
  6. 6. Genome scale identification and analyses of TFBS alterations TSS adapted from Kelvin Song's work on Wikimedia Commons Transcription factors Gene 5
  7. 7. Whole genome sequencing (WGS) era Figure from Atif Rahman. Previous analyzes of patients’ genomes focused on the ∼ 2% of the genome coding for proteins. It becomes affordable to do WGS in the clinic. It is time to focus on regulatory mutations that alter the transcriptional regulation of gene expression. 6
  8. 8. Cis-regulatory mutations may impact gene expression adapted from Arenillas et al., poster at AMIA Conference, 2012. 7
  9. 9. Cis-regulatory mutations may impact gene expression adapted from Arenillas et al., poster at AMIA Conference, 2012. One needs to accurately locate TFBSs to identify and characterize the regulatory sequences controlling specific genes transcription. 7
  10. 10. Genome-scale data capturing TFBSs : ChIP-seq >Seq 1 AGTTCAAAGTTCAAGTTCAAA GTTCAAGTTCAAAGTTCAAGT TCAAAGTTCA >Seq 2 AGTCCAAAGTTCAAGTCCAAA GTTCAAGTCCAAA ... >Seq 33554 CTACCGGGGACCGGGGTGGA ACCGGGG >Seq 33555 ACCGGGGACCGGGGACCGGG GACCGGGGGGCAAGGTTCATA adapted from A.M. Szalkowski and C.D. Schmid, Briefings in Bioinformatics, 2010. 8
  11. 11. Modeling TFBSs PFMs reflect the preferred binding motifs associated to TFs. 9
  12. 12. Scoring potential TFBSs 10
  13. 13. JASPAR Largest open-access database of manually curated TF binding profiles. Mathelier et al., NAR, 2016. Subset # TF binding profiles Vertebrates 519 Plants 227 Insects 133 Nematodes 26 Fungi 176 Urochordata 1 Total 1082 11
  14. 14. TFBSs as cis-regulatory elements 477 ChIP-seq data sets from ENCODE and the literature. 103 TFs with a JASPAR TF binding profile. 76,160,823 bp in TFBSs (∼ 2% of the human genome). 12
  15. 15. Somatic mutations in B-cell lymphomas WGS (normal and tumour cells) : cohort 1 : 40 diffuse large B-cell lymphomas cohort 2 : 44 B-cell lymphomas of mixed histology Morin et al., Blood, 2013 Richter et al., Nat. Genetics, 2012 RNA-seq for the cancer samples 406,611 SNVs and 15,739 indels in cohort 1 samples 282,636 SNVs and 8,080 indels in cohort 2 samples 13
  16. 16. Promoters are frequent targets of cis-regulatory mutations A BCohort 1 Cohort 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 x y HIST1H1B ST6GAL1 TMSB4X ZFP36L1 NEDD9 BCL7A RHOH BIRC3 CIITA IG LL5 BTG2 SG K1 CD74 BCL2 BCL6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 x y HIS T1H1C HIST1H1E TMSB4X ZFP36L1 BCL2L11 BZRAP1 DNM T1 ZNF860 N CO A3 FOXO1 BACH2 CXCR4 DUSP2 RFTN1 TCL1A BCL7A SOCS1 SEPT9 P2RX5 S1PR2 RHOH EPS15 BIRC3 CIITA DTX1 IG LL5 BTG2 BTG1 SG K1 CD74 CD83 BCL2 BCL6 PIM 1 MYC B2M IRF1 IRF4 LTB ID3 p = 1.16 x 10-75 p = 3.28 x 10-156 75% of the genes previously described 13 new genes of interest frequently targeted in their promoters −→ 6 of them exclusively mutated in promoters. 14
  17. 17. Promoters of apoptotic, B-cell, and cancer pathway genes are frequent targets of cis-regulatory mutations lymphoma small cell lung cancer apoptosis regulation of B cell proliferation positive regulation of B cell proliferation lymphocyte differentiationnegative regulation of lymphocyte apoptotic process leukocyte activation negative regulation of B cell apoptotic process regulation of B cell activation leukocyte differentiation positive regulation of lymphocyte proliferation regulation of cell growth lymphocyte activation T cell differentiation positive regulation of B cell activation negative regulation of intracellular signal transduction positive regulation of leukocyte proliferation regulation of type 2 immune response regulation of B cell apoptotic process negative regulation of T cell differentiation positive regulation of mononuclear cell proliferation GO Biological Process OMIM KEGG Pathway FDR < 0.05 15
  18. 18. Landscape of mutations and altered gene expression mutation TFBS Protein-coding exon TSS A B C xseq input Ding et al., Nature Communications, 2015. Mathelier et al., Genome Biology, 2015. 16
  19. 19. Landscape of mutations and altered gene expression mutation TFBS Protein-coding exon TSS A B C xseq input PC and disrupting TFBS Protein coding (PC) PC and TFBS Disrupting TFBS TFBS MYC EYS TP53 PTPRD SMARCA4 BCL6 RYR2 ITPKB WWC1 FCGBP SGK1 TBL1XR1 ID3 CSMD3 SIN3A VPS13C UNC5D NBAS MTOR PPP1R16B USP25 ASCC3 GPHN DHX35 PEX2 XRCC4 PXDN JRKL WHSC1L1 FBXW11 SRP72 CRIM1 FOXO1 DGKD PHIP PYGL USP15 BRD2 C2CD3 LMO4 FMN2 SRFBP1 N4BP2 CCNG1 RHOA DUSP2 TGFBR2 ARRDC3 CADPS2 PIM1 STIM2 GNA13 SA321012 SA320920 SA320824 SA320932 SA321004 SA320860 SA320992 SA320830 SA321030 SA320914 SA320842 SA320818 SA320980 SA320998 SA320848 SA320866 SA321106 SA320872 SA321119 SA320968 SA320962 SA320944 SA321050 SA321048 SA320956 SA321021 SA321103 SA320836 SA320902 SA321128 SA320974 0 2 4 6 8 10 12 0 5 10 15 20 16
  20. 20. Landscape of mutations and altered gene expression chronic myeloid leukemia erbb signaling pathway acute myeloid leukemia pancreatic cancer prostate cancer endometrial cancer glioma ecm receptor interaction focal adhesion epithelial cell signaling in helicobacter pylori infection renal cell carcinoma small cell lung cancer oxidative phosphorylation colorectal cancer RB in Cancer Integrated Breast Cancer Pathway Androgen receptor signaling pathway Focal Adhesion EGF/EGFR Signaling Pathway Integrated Pancreatic CancerPathway Signaling Pathways in Glioblastoma B Cell Receptor Signaling Pathway Integrin-mediated Cell Adhesion IL-4 Signaling Pathway PDGF Pathway Cardiac Hypertrophic Response IL-2 Signaling Pathway MAPK Signaling Pathway Leptin signaling pathway AGE/RAGE pathway Type II interferon signaling IL-3 Signaling Pathway Oncostatin M SignalingPathway Alpha 6 Beta 4 signaling pathway 17
  21. 21. Summary We analyzed ∼ 700, 000 somatic mutations from 84 B-cell lymphoma samples We characterized a set of cis-regulatory elements from ChIP-seq Cis-regulatory mutations are enriched in promoter regions of genes involved in apoptosis or growth/proliferation We combined gene expression and mutation data from the coding and non-coding spaces We highlight candidate regulatory-disrupting variations dysregulating the gene expression program in cancer pathways 18
  22. 22. The next generation of TFBS prediction TSS adapted from Kelvin Song's work on Wikimedia Commons Transcription factors Gene 19
  23. 23. Dinucleotide dependencies PWMs are doing a great job but do not allow for modeling dinucleotide dependencies shown in : crystal structures of TF-DNA complexes (Luscombe et al., 2001) biochemical studies of specific proteins (Man and Stormo, 2001 ; Bulyk et al., 2002 ; Berger et al., 2006) statistical analysis from TRANSFAC and JASPAR TFBSs (Barash, 2003 ; Tomovic and Oakeley, 2007 ; Zhou and Liu, 2004) quantitative analysis of Protein Binding Microarray data (Zhao et al., 2012) Results from Zhao, Ruan, Pandey and Stormo, 2012 : Interactions between neighbouring bases are stronger than others Improvements by considering dinucleotide dependencies are usually slight with some significant exceptions Their method is not applicable to ChIP-seq data 20
  24. 24. Transcription Factor Flexible Models (TFFMs) for modeling TFBS 1st-order HMM : A state models the background Each position in the motif is modeled by a state Parametrizable transition probabilities for staying in the bg and switching bg/fg States emit a nucleotide depending on the previous one position n ...... 1 En 1 BG bg/bg bg/fg E0 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT position 1 1 E1 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT position 2 1 E2 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT Mathelier and Wasserman, PLOS Comp. Biol., 2014. 21
  25. 25. TFFMs workflow from ChIP-seq data >HNF4A 1 AGTTCAAAGTTCA >HNF4A 2 AGTCCAAAGTTCA ... >HNF4A 73554 CTTGGAACCGGGG >HNF4A 73555 GGCAAGGTTCATA Sequences TFFMs position n ...... 1 En BG bg/bg bg/fg E0 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT position 1 1 E1 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT position 2 1 E2 AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT AA AC AG AT TA TC TG TT GA GC GG GT CA CC CG CT 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits C 10 11 12 13 22
  26. 26. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  27. 27. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  28. 28. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  29. 29. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  30. 30. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  31. 31. TFFMs are informative TFFMs allow to analyze the neighbouring dinucleotide dependencies : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 bits Mathelier and Wasserman, PLOS Comp. Biol., 2014. 23
  32. 32. TFFMs are performing better than weight matrices TFFMs Similar WMs 1st-order TFFM detailed TFFM PWM DWM 96 ChIP-seq data sets AUCratio 0.70 0.75 0.80 0.85 0.90 0.95 1.00 24
  33. 33. Allowing for flexible length motifs : JunD 0 20 40 60 80 100 100 80 60 40 20 0 Sensitivity Specificity 1st-order HMM detailed HMM PFM DWM flexible 1st-order HMM flexible detailed HMM GLAM2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 70% 30% Method AUC 1st-order TFFM 65.41 PWM 64.47 DWM 64.92 flexible 1st-order HMM 71.57 Mathelier and Wasserman, PLOS Comp. Biol., 2014. 25
  34. 34. The TFFM framework The implemented Python framework (http://cisreg.ca/TFFM/doc/) is based on the continuously maintained GHMM library (A. Schliep’s group) and allows : constructing TFFMs starting from a set of ChIP-seq sequences predicting TFBSs within a set of input sequences using a set of TFFMs constructing logos associated to a TFFM 26
  35. 35. Transcription Factor Flexible Models Pros : Consider dinucleotide composition Are able to model flexible length TFBSs Better predict TFBSs than PWMs overall Generate scores consistent with DNA-protein binding affinities measured experimentally Able to easily compute probabilities of occupancy Cons : Need larger data sets than PWM to train the underlying HMM Need to define a threshold on the TFFM score when making predictions Sequence-based only 27
  36. 36. DNA shape features The DNAshape tool allows for the prediction of four DNA shape features in a high throughput manner. Considered DNA shape features are : Minor Groove Width (MGW) Roll Propeller Twist (ProT) Helix Twist (HelT) T. Zhou, et al.. Nucl. Acids Res., 2013. 28
  37. 37. Combining DNA sequence and structure to predict TFBSs Studies showed DNA shapes importance to model TFBSs from : SELEX-seq experiments. Protein-binding microarray experiments. BunDLE-seq experiments. N. Abe et al., Cell, 2015. T. Zhou et al., PNAS, 2015. M. Levo et al., Genome Res., 2015. 29
  38. 38. Combining DNA sequence and structure to predict TFBSs Studies showed DNA shapes importance to model TFBSs from : SELEX-seq experiments. Protein-binding microarray experiments. BunDLE-seq experiments. N. Abe et al., Cell, 2015. T. Zhou et al., PNAS, 2015. M. Levo et al., Genome Res., 2015. Aims of our study : Construct computational models from large scale in vivo data (ChIP-seq) by combining DNA sequence and shape features. Show TFBS prediction improvements on in vivo data. Analyze whether DNA shape induced improvements are TF family specific. 29
  39. 39. Combining DNA sequence and shape features at TFBSs A G A A G C C A G A A A A G G C A C C C A PSSM / TFFM MGW 2nd order MGW ProT 2nd order ProT Roll 2nd order Roll HelT 2nd order HelT n nucleotides araTha10 Chr1: 27,243,678 - 27,243,702 8n+1features MGWProTRollHelT Feature vector Hitscore 6.2 Å 2.5 Å 0.6 0 0° -16.5° 0.8 0 8.6° -8. 6° 0.4 0 38° 31° 0.7 0 30
  40. 40. Overview of the ChIP-seq data sets considered 400 ENCODE ChIP-seq experiments are considered with ChIP’ed TFs are coming from 24 TF families. Family # exp BetaBetaAlpha-zinc Finger 146 Helix-Loop-Helix 56 Leucine Zipper 55 ETS 21 Stat 13 GATA 12 Rel 10 E2F 10 NFY CCAAT-binding 8 Hormone-nuclear Receptor 8 Forkhead 8 High Mobility Group (Box) 8 Homeodomain 7 MADS 7 RFX 5 TATA-binding 5 NRF 5 IRF 4 Other 4 STAT 4 Arid 2 HSF 1 THAP 1 31
  41. 41. Considering DNA shape features improve predictive power A B 32
  42. 42. Considering DNA shape features improve predictive power A B D C DNA shape features are most benefitial for E2F and MADS families. 32
  43. 43. DNA shape features at TFBS flanking sequences further improve discriminative power A B 33
  44. 44. Summary Our analyses of ChIP-seq data reprensent the in vivo conterpart of the published in vitro studies. We show that combining DNA sequence and shape features improves the prediction of TFBSs within ChIP-seq peaks. We highlight that TFs from the E2F and MADS families most benefit from incorporating DNA shape information to predict TFBSs. We further improve predictive power when incorporating DNA-shape features at sequences flanking TFBSs. 34
  45. 45. Acknowledgements Wyeth Wasserman Allen Zhang David Arenillas Rebecca Worsley-Hunt all the Wasserman lab Sohrab Shah Calvin Lefebvre Jiarui Ding Boris Lenhard Ge Tan Albin Sandelin Xiaobei Zhao Sandelin lab Remo Rohs Beibei Xin Tsu-Pei Chiu Lin Yang François Parcy Centre for Molecular Medicine and Therapeutics 35
  46. 46. TSS adapted from Kelvin Song's work on Wikimedia Commons Transcription factors Gene Thank you TSS adapted from Kelvin Song's work on Wikimedia Commons Transcription factors Gene 36

×