• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Computational approaches to the regulatory genomics of neurogenesis
 

Computational approaches to the regulatory genomics of neurogenesis

on

  • 1,209 views

Short talk given at Edinburgh Neuroscience Day held at the Royal College of Physicicans, Edinburgh on 29th March 2010.

Short talk given at Edinburgh Neuroscience Day held at the Royal College of Physicicans, Edinburgh on 29th March 2010.

Statistics

Views

Total Views
1,209
Views on SlideShare
1,201
Embed Views
8

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 8

http://www.slideshare.net 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Computational approaches to the regulatory genomics of neurogenesis Computational approaches to the regulatory genomics of neurogenesis Presentation Transcript

    • Computational approaches to the regulatory genomics of neurogenesis Dr. Ian Simpson Centre for Integrative Physiology University of Edinburgh Edinburgh Neuroscience Day, March 2010 1 / 20
    • Introduction animal model of neurogenesis Anatomy of the Drosophila PNS - Sense organs 2 / 20
    • Introduction animal model of neurogenesis Development of the Drosophila PNS 3 / 20
    • main gene regulatory networks GRN for endomesoderm specification in the Sea Urchin from Peter and Davidson (2009) 4 / 20
    • main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) profile, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS profiles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 5 / 20
    • main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) profile, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS profiles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 6 / 20
    • main example 1 : Clustering with re-sampling statistics Gene expression profiles of cells expressing atonal 7 / 20
    • main example 1 : Clustering with re-sampling statistics An example annotated cluster cluster membership Cluster Size C1 13 C2 36 C3 23 C4 16 C5 65 C6 6 cluster 3 Sensory Organ Development GO:0007423 (p=6e-6) Gene name argos ato CG6330 CG31464 CG13653 nrm unc sca rho ImpL3 CG11671 CG7755 CG16815 CG15704 CG32150 knrl CG32037 Toll-6 phyl nvy cato 8 / 20
    • main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 9 / 20
    • main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 10 / 20
    • main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 11 / 20
    • main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 12 / 20
    • main example 1 : Clustering with re-sampling statistics Heatmap of the consensus matrix 13 / 20
    • main example 1 : Clustering with re-sampling statistics Gene prioritisation by consensus clustering Re-sampling using hclust, it=1000, rf=80% cluster robustness membership robustness cluster3 affy_id mem affy_id mem 1639896_at 0.68 1641578_at 0.56 cluster rob 1640363_a_at 0.54 1623314_at 0.53 1 0.4731433 1636998_at 0.49 1637035_at 0.36 2 0.7704514 1631443_at 0.35 1639062_at 0.31 3 0.7295124 1623977_at 0.31 1627520_at 0.3 4 0.7196309 1637824_at 0.28 1632882_at 0.27 5 0.7033960 1624262_at 0.26 1640868_at 0.26 6 0.6786388 1631872_at 0.26 1637057_at 0.24 1625275_at 0.24 1624790_at 0.22 1635227_at 0.08 1623462_at 0.07 1635462_at 0.03 1628430_at 0.03 1626059_at 0.02 there are 8 out of 23 genes with <25% conservation in the cluster 14 / 20
    • main example 2 : TFBS and CRM detection on the genomic scale An example of intersecting a state list with developmental module normal high low off 15 / 20
    • main example 2 : TFBS and CRM detection on the genomic scale cis-regulatory module detection by HMM after Wu and Xie, JCB 2008 16 / 20
    • main example 2 : TFBS and CRM detection on the genomic scale TFBS binding probability calculation with a Bayesian integration framework Mulitple prior data sources are combined in a probabilistic model to predict the probability of TF binding PWMs, ChIP-ChIP, Chip-Seq, damID, conservation, nucleosome positioning, regulatory potential... after Lahdesmaki et al. PLoSOne, 2008 17 / 20
    • summary Summary Benefits of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 18 / 20
    • summary Summary Benefits of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 19 / 20
    • Acknowledgements University of Edinburgh Centre for Integrative Physiology Andrew Jarman Douglas Armstrong Ian Simpson Petra zur Lage Lynn Powell Sebastian Cachero Lina Ma Fay Newton Guiseppe Gallone Daniel Moore Sadie Kemp 20 / 20