Your SlideShare is downloading. ×
0
STRING Prediction of a functional association network for the yeast mitochondrial system Lars Juhl Jensen EMBL Heidelberg
Overview <ul><li>Prediction of functional associations between proteins </li></ul><ul><ul><li>What is STRING? </li></ul></...
Part 1 Prediction of functional association between proteins Lars Juhl Jensen EMBL Heidelberg
What is STRING? Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray ...
Let the data speak for themselves ... <ul><li>Classification schemes are obviously difficult to predict if they are not su...
Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted ...
Genomic context methods © Nature Biotechnology, 2004
Score calibration against a common reference <ul><li>Many diverse types of evidence </li></ul><ul><ul><li>The quality of e...
Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently...
Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combi...
Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment...
Multiple evidence types from several species
Predicting and defining metabolic pathways and other functional modules Image: Molecular Biology of the Cell, 3 . rd editi...
Part 2 The yeast mitochondrial system Lars Juhl Jensen EMBL Heidelberg
Yeast mitochondria – why it should work well <ul><li>Because it is metabolism </li></ul><ul><ul><li>STRING was developed u...
Strategy for extracting a functional association network of the mitochondrial system <ul><li>Starting point: </li></ul><ul...
Predicting mitochondrial proteins <ul><li>Training was done with 5-fold cross validation </li></ul><ul><ul><li>Reference s...
TOM MRPL Ribosome  Related MRPS Vacuolar  Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCC_Asy RCCII RCCIV RCCV RC...
TOM MRPL Ribosome  Related MRPS Vacuolar  Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCC_Asy RCCII RCCIV RCCV RC...
TOM MRPL Ribosome  Related MRPS Vacuolar  Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCCII RCCIV RCCV RCC_Asy HA...
 
Composition and interconnectivity of clusters <ul><li>A network of clusters </li></ul><ul><ul><li>Most probably path betwe...
Correlations among gene features <ul><li>Expression data agree with data on NF specific growth defects </li></ul><ul><li>G...
Can human disease genes be predicted? <ul><li>Mitochondrial genes are already enriched in disease genes </li></ul><ul><li>...
Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterog...
Conclusions <ul><li>Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequen...
Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel ...
Thank you!
Upcoming SlideShare
Loading in...5
×

STRING - Prediction of a functional association network for the yeast mitochondrial system

830

Published on

Technical University of Denmark, Lyngby, Denmark, December 14, 2004

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
830
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "STRING - Prediction of a functional association network for the yeast mitochondrial system"

    1. 1. STRING Prediction of a functional association network for the yeast mitochondrial system Lars Juhl Jensen EMBL Heidelberg
    2. 2. Overview <ul><li>Prediction of functional associations between proteins </li></ul><ul><ul><li>What is STRING? </li></ul></ul><ul><ul><li>Genomic context methods </li></ul></ul><ul><ul><li>Integration of large-scale experimental data </li></ul></ul><ul><ul><li>Combination and cross-species transfer of evidence </li></ul></ul><ul><li>(Coffee break) </li></ul><ul><li>The yeast mitochondrial system </li></ul><ul><ul><li>Prediction of mitochondrial proteins </li></ul></ul><ul><ul><li>A functional association network for mitochondria </li></ul></ul><ul><ul><li>Mapping and correlating features of mitochondrial proteins </li></ul></ul>
    3. 3. Part 1 Prediction of functional association between proteins Lars Juhl Jensen EMBL Heidelberg
    4. 4. What is STRING? Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
    5. 5. Let the data speak for themselves ... <ul><li>Classification schemes are obviously difficult to predict if they are not supported by the data – there are no obvious features separating: </li></ul><ul><ul><li>Presidents vs. non-presidents </li></ul></ul><ul><ul><li>Actors vs. non-actors </li></ul></ul><ul><li>Unsupervised methods may discover a more meaningful classification: </li></ul><ul><ul><li>Holding your pinky to your mouth is a clear sign of evil </li></ul></ul><ul><ul><li>Wearing a bowtie is a sign of good </li></ul></ul><ul><ul><li>So is consumption of alcoholic drinks </li></ul></ul>
    6. 6. Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted protuberance Cellulose © Trends Microbiol, 1999 Cell Cell wall Anchoring proteins Cellulosomes Cellulose The “Cellulosome”
    7. 7. Genomic context methods © Nature Biotechnology, 2004
    8. 8. Score calibration against a common reference <ul><li>Many diverse types of evidence </li></ul><ul><ul><li>The quality of each is judged by very different raw scores </li></ul></ul><ul><ul><li>Quality differences exist among data sets of the same type </li></ul></ul><ul><li>Solved by calibrating all scores against a common reference </li></ul><ul><ul><li>Scores are directly comparable </li></ul></ul><ul><ul><li>Probabilistic scores allow evidence to be combined </li></ul></ul><ul><li>Requirements for the reference </li></ul><ul><ul><li>Must represent a compromise of the all types of evidence </li></ul></ul><ul><ul><li>Broad species coverage </li></ul></ul>
    9. 9. Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
    10. 10. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
    11. 11. Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment of orthology is difficult for distant species </li></ul></ul><ul><ul><li>Functional equivalence cannot be guaranteed for in-paralogs </li></ul></ul><ul><li>These problems are addressed by our “fuzzy orthology” scheme </li></ul><ul><ul><li>Confidence scores for functional equivalence are calculated from all-against-all alignment </li></ul></ul><ul><ul><li>Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships </li></ul></ul>? Source species Target species
    12. 12. Multiple evidence types from several species
    13. 13. Predicting and defining metabolic pathways and other functional modules Image: Molecular Biology of the Cell, 3 . rd edition Metabolism overview Defined manually: cutting metabolic maps into pathways Purine biosynthesis Histidine biosynthesis Defined objectively: standard clustering of genome-scale data
    14. 14. Part 2 The yeast mitochondrial system Lars Juhl Jensen EMBL Heidelberg
    15. 15. Yeast mitochondria – why it should work well <ul><li>Because it is metabolism </li></ul><ul><ul><li>STRING was developed using KEGG pathways as a reference </li></ul></ul><ul><ul><li>This may have caused STRING to function best on metabolism </li></ul></ul><ul><li>Because it is yeast </li></ul><ul><ul><li>By far the best covered organism in terms of physical interactions </li></ul></ul><ul><ul><li>Many microarray gene expression studies </li></ul></ul><ul><ul><li>Literature mining works well due to standardization of gene names </li></ul></ul><ul><li>Because it is prokaryotic </li></ul><ul><ul><li>Evolutionarily, mitochondria are of bacterial origin </li></ul></ul><ul><ul><li>The genomic context methods in STRING are very powerful, but can only provide evidence for proteins with prokaryotic orthologs </li></ul></ul>
    16. 16. Strategy for extracting a functional association network of the mitochondrial system <ul><li>Starting point: </li></ul><ul><ul><li>Reference set of proteins known to mitochondrial </li></ul></ul><ul><ul><li>A large, diverse set of experiments relevant for predicting mitochondrial proteins </li></ul></ul><ul><ul><li>The global STRING network for yeast </li></ul></ul><ul><li>Predict mitochondrial candidate genes </li></ul><ul><ul><li>Use reference set to train neural networks for predicting candidate genes based on experimental data </li></ul></ul><ul><ul><li>Use very high-confidence STRING links to suggest additional candidates based interactions with reference and candidate genes </li></ul></ul><ul><li>Extract network that includes lower confidence interactions and identify functional modules by clustering </li></ul>
    17. 17. Predicting mitochondrial proteins <ul><li>Training was done with 5-fold cross validation </li></ul><ul><ul><li>Reference set used as positive examples </li></ul></ul><ul><ul><li>All other genes used as negative examples </li></ul></ul><ul><li>Top 800 contains more than 90% of known mitochondrial genes </li></ul><ul><li>Surprising performance of the linear model </li></ul><ul><ul><li>As good as NN with 250 hidden neurons </li></ul></ul><ul><ul><li>Better than MitoP2 </li></ul></ul>
    18. 18. TOM MRPL Ribosome Related MRPS Vacuolar Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCC_Asy RCCII RCCIV RCCV RCC_Asy HAP Complex Arg Biosynth. PDH/KGD/ GCV Cell Wall & pH Reg. DNA Repair Glucose sensing and CH remodelling APC Fission/ Fusion rRNA Processing mRNA Processing TFIIIC Complex m-AAA Complex TCA Cycle Iron Homeostasis/ Chaperone Activity RCCI rRNA Processing Leu/Val/Ile Biosynth. DNA Repair GARP Complex Cytosolic Ribosome TIM RCC_Asy Actin tRNA Splicing RCCIII NUP Replication/ DNA Repair
    19. 19. TOM MRPL Ribosome Related MRPS Vacuolar Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCC_Asy RCCII RCCIV RCCV RCC_Asy HAP Complex Arg Biosynth. PDH/KGD/ GCV Cell Wall & pH Reg. DNA Repair Glucose sensing and CH remodelling APC Fission/ Fusion rRNA Processing mRNA Processing TFIIIC Complex m-AAA Complex TCA Cycle Iron Homeostasis/ Chaperone Activity RCCI rRNA Processing Leu/Val/Ile Biosynth. DNA Repair GARP Complex Cytosolic Ribosome TIM RCC_Asy Actin tRNA Splicing RCCIII NUP Replication/ DNA Repair Protobacterial orthologs
    20. 20. TOM MRPL Ribosome Related MRPS Vacuolar Acidification Fatty Acid Biosynth. Secondary RCC_Asy RCCII RCCIV RCCV RCC_Asy HAP Complex Arg Biosynth. PDH/KGD/ GCV Cell Wall & pH Reg. DNA Repair Glucose sensing and CH remodelling APC Fission/ Fusion rRNA Processing mRNA Processing TFIIIC Complex m-AAA Complex TCA Cycle Iron Homeostasis/ Chaperone Activity RCCI rRNA Processing Leu/Val/Ile Biosynth. DNA Repair GARP Complex Cytosolic Ribosome TIM RCC_Asy Actin tRNA Splicing RCCIII NUP Replication/ DNA Repair Human disease orthologs RCC_Asy
    21. 22. Composition and interconnectivity of clusters <ul><li>A network of clusters </li></ul><ul><ul><li>Most probably path between clusters used as score </li></ul></ul><ul><li>Interacting clusters are preferentially within the same compartment </li></ul><ul><li>Protobacterial clusters typically localize to the mitochondria </li></ul>
    22. 23. Correlations among gene features <ul><li>Expression data agree with data on NF specific growth defects </li></ul><ul><li>Genes with detectable human orthologs are more conserved among yeasts </li></ul><ul><li>Disease orthologs are often protobacterial </li></ul><ul><li>Knockout of disease orthologs cause less severe growth defects </li></ul>
    23. 24. Can human disease genes be predicted? <ul><li>Mitochondrial genes are already enriched in disease genes </li></ul><ul><li>Previous slide showed that mitochondrial genes of protobacterial origin and are further enriched in disease gene orthologs </li></ul><ul><li>Disease gene orthologs show less growth defect than other mitochondrial genes with human orthologs </li></ul>
    24. 25. Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterogeneous data </li></ul><ul><li>The different types of data do not all tell us about the same kind of functional associations </li></ul><ul><li>It should be possible to assign likely interaction types from supporting evidence types </li></ul><ul><li>An accurate model of the yeast mitotic cell cycle </li></ul><ul><li>Approach </li></ul><ul><ul><li>High confidence set of physical interactions </li></ul></ul><ul><ul><li>Custom analysis of cell cycle expression data </li></ul></ul><ul><li>Observations </li></ul><ul><ul><li>Dynamic assembly of cell cycle complexes </li></ul></ul><ul><ul><li>Temporal regulation of Cdk specificity </li></ul></ul>Dynamic complex formation during the yeast cell cycle Ulrik de Lichtenberg, Lars Juhl Jensen, Søren Brunak and Peer Bork to appear in Science
    25. 26. Conclusions <ul><li>Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequences alone </li></ul><ul><li>New genomic context methods are still being developed </li></ul><ul><li>Integration of large-scale experimental data allows similar predictions to be made for eukaryotic proteins </li></ul><ul><li>Successful data integration requires benchmarking and cross-species transfer of information </li></ul><ul><li>Protein networks are useful for the analysis of large, complex biological systems </li></ul>
    26. 27. Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Daniel Jaeggi </li></ul></ul><ul><ul><li>Steffen Schmidt </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>New genomic context methods </li></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>ArrayProspector web service </li></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Chris Workman </li></ul></ul><ul><li>NetView visualization tool </li></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><li>Study of yeast mitochondria </li></ul><ul><ul><li>Fabiana Perocchi </li></ul></ul><ul><ul><li>Lars Steinmetz </li></ul></ul><ul><li>Analysis of yeast cell cycle </li></ul><ul><ul><li>Ulrik de Lichtenberg </li></ul></ul><ul><ul><li>Thomas Skøt </li></ul></ul><ul><ul><li>Anders Fausbøll </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><li>Web resources </li></ul><ul><ul><li>string.embl.de </li></ul></ul><ul><ul><li>www.bork.embl.de/ArrayProspector </li></ul></ul><ul><ul><li>www.bork.embl.de/synonyms </li></ul></ul>
    27. 28. Thank you!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×