J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Sequencing Experiments
Discovery of motif-basedregulatory signatures in NextGenSequencing Experimentshttp://code.google.com/p/nextgen-signaturesGNU General Public License, version 3.0 (GPLv3)Jens LichtenbergHematopoiesis Section, Genetics and Molecular Biology BranchNational Human Genome Research Institute, National Institutes of Health
Motivation● Large variety of omics approaches that produce sequencing data● Common threads in the Methylation evaluation process Seq● Few approaches exist RNA Seq ChIP Seq that attempt the large Comprehensive scale analysis of omics Analysis data Protein Seq Histone Seq● Direct correlation of Systems Biology Insights multiple omics data into actual biological insights
Requirements● General ○ Quantification of sequencing data requires dynamic pipeline allowing for frequent adjustments ○ Close interaction between bench and analysis personnel● Specific ○ Quantitative analysis ○ Functional analysis ○ Regulatory analysis ○ Visualizations
ChIP Seq Peak Calling Methylation Correlation ERY (Meth.) MEP (Meth.) Total 1187 587 Dist. Prom. 210 102 Prox. Prom. 29 21 Downstream 345 207 RefSeq 983 513 Functional Analysis Motif Discovery● EKLF control in MEP can be found in the first intron (Siatecka and Bieker, Blood, 2011)● During erythropoiesis EKLF is restricted to hematopoietic organs (Siatecka and Bieker, Blood, 2011)● Down-regulation of EKLF expression in MEP cells leads megakaryopoiesis (Siatecka and Bieker, Blood, 2011)
RNA Seq Peak Calling Functional Analysis MEG Pathway Name ERY, MEP, MEG MEG, MEP ERY, MEG ERY, MEP 241 ERK/MAPK Sig. 1.83E-09 4.47E-16 5.01E-10 IGF-1 Sig. 1.04E-15 1.25E-10 47 1308 3338 MolMech. 3.72E-10 1.59E-22 1.13E-10 3.72E-10 Cancer 216 966 2408 ... PI3K/AKT Sig. 3.22E-20 2.84E-24 6.24E-18 1.33E-15ERY MEP mRNA Differentiation Motif Discovery Increase DecreaseMEP -> MEG 1238 7323MEP -> ERY 1198 9307
Comprehensive ApproachCurrent Status● Perl Framework ○ Commonly used applications and repositories ● Next-Generation Sequencing ○ Read Mapping ■ UCSC Genomic Data ○ Peak Calling/Partitioning ■ UCSC Genomic Data ○ Transcript Quantification ■ UCSC/Ensembl Genomic Data ● Functional Genomics ● Regulatory Genomics ○ Expression Correlation ○ Enumerative motif discovery ■ BloodExpress Database ■ Transfac/Jaspar ○ Pathway Analysis Database ■ KEGG/IPA ○ Occupancy validation ○ Ontology Analysis ■ Literature specific data ■ GO/IPA sets
Future IssuesData● Complete case study for Protein SeqImplementation● Complete implementation of all analysis facets● Transition Perl framework to C++ architecture● Parallelize software architecture for higher performance/throughputSupport● Update web-interface and documentation to allow unassisted data analysis
Conclusions and Availability● A comprehensive approach is possible● Meaningful results can be extracted using the approach● Regulatory genomics can be used as a suitable post- processing analysis● Comprehensive hematopoiesis study is feasible● http://code.google.com/p/nextgen-signatures (GNU General Public License, version 3.0)
Acknowledgements NHGRI - GMBB - Hematopoiesis Section David Bodine and Amber Hogart NHGRI Intramural Training Program
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.