Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Hämeenlinna 21.5.2010 by Media Maja 806 views
- Stephen Friend Complex Traits: Geno... by Sage Base 461 views
- Genetic Disorders by guest8088b5 474 views
- Friend Oslo 2012-09-09 by Sage Base 341 views
- New genetic tests for women who are... by MaxiMedRx 100 views
- Donna Gitter, "Informed Consent and... by The Petrie-Flom C... 126 views

1,134 views

Published on

Published in:
Education

No Downloads

Total views

1,134

On SlideShare

0

From Embeds

0

Number of Embeds

9

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Computational approaches to the regulatory genomics of neurogenesis Dr. Ian Simpson Centre for Integrative Physiology University of Edinburgh Edinburgh Neuroscience Day, March 2010 1 / 20
- 2. Introduction animal model of neurogenesis Anatomy of the Drosophila PNS - Sense organs 2 / 20
- 3. Introduction animal model of neurogenesis Development of the Drosophila PNS 3 / 20
- 4. main gene regulatory networks GRN for endomesoderm speciﬁcation in the Sea Urchin from Peter and Davidson (2009) 4 / 20
- 5. main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) proﬁle, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS proﬁles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 5 / 20
- 6. main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) proﬁle, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS proﬁles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 6 / 20
- 7. main example 1 : Clustering with re-sampling statistics Gene expression proﬁles of cells expressing atonal 7 / 20
- 8. main example 1 : Clustering with re-sampling statistics An example annotated cluster cluster membership Cluster Size C1 13 C2 36 C3 23 C4 16 C5 65 C6 6 cluster 3 Sensory Organ Development GO:0007423 (p=6e-6) Gene name argos ato CG6330 CG31464 CG13653 nrm unc sca rho ImpL3 CG11671 CG7755 CG16815 CG15704 CG32150 knrl CG32037 Toll-6 phyl nvy cato 8 / 20
- 9. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 9 / 20
- 10. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 10 / 20
- 11. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 11 / 20
- 12. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 12 / 20
- 13. main example 1 : Clustering with re-sampling statistics Heatmap of the consensus matrix 13 / 20
- 14. main example 1 : Clustering with re-sampling statistics Gene prioritisation by consensus clustering Re-sampling using hclust, it=1000, rf=80% cluster robustness membership robustness cluster3 affy_id mem affy_id mem 1639896_at 0.68 1641578_at 0.56 cluster rob 1640363_a_at 0.54 1623314_at 0.53 1 0.4731433 1636998_at 0.49 1637035_at 0.36 2 0.7704514 1631443_at 0.35 1639062_at 0.31 3 0.7295124 1623977_at 0.31 1627520_at 0.3 4 0.7196309 1637824_at 0.28 1632882_at 0.27 5 0.7033960 1624262_at 0.26 1640868_at 0.26 6 0.6786388 1631872_at 0.26 1637057_at 0.24 1625275_at 0.24 1624790_at 0.22 1635227_at 0.08 1623462_at 0.07 1635462_at 0.03 1628430_at 0.03 1626059_at 0.02 there are 8 out of 23 genes with <25% conservation in the cluster 14 / 20
- 15. main example 2 : TFBS and CRM detection on the genomic scale An example of intersecting a state list with developmental module normal high low off 15 / 20
- 16. main example 2 : TFBS and CRM detection on the genomic scale cis-regulatory module detection by HMM after Wu and Xie, JCB 2008 16 / 20
- 17. main example 2 : TFBS and CRM detection on the genomic scale TFBS binding probability calculation with a Bayesian integration framework Mulitple prior data sources are combined in a probabilistic model to predict the probability of TF binding PWMs, ChIP-ChIP, Chip-Seq, damID, conservation, nucleosome positioning, regulatory potential... after Lahdesmaki et al. PLoSOne, 2008 17 / 20
- 18. summary Summary Beneﬁts of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 18 / 20
- 19. summary Summary Beneﬁts of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 19 / 20
- 20. Acknowledgements University of Edinburgh Centre for Integrative Physiology Andrew Jarman Douglas Armstrong Ian Simpson Petra zur Lage Lynn Powell Sebastian Cachero Lina Ma Fay Newton Guiseppe Gallone Daniel Moore Sadie Kemp 20 / 20

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment