STRING Prediction of functional relations, modules, and networks from heterogeneous genome-scale data Lars Juhl Jensen EMB...
Cross-species integration of diverse data <ul><li>Challenges and promises of large-scale data integration </li></ul><ul><u...
STRING provides a modular protein network by integrating diverse types of evidence Genomic neighborhood Species co-occurre...
Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted ...
Genomic context methods © Nature Biotechnology, 2004
Score calibration against a common reference <ul><li>Many diverse types of evidence </li></ul><ul><ul><li>The quality of e...
Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently...
Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combi...
Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment...
Multiple evidence types from several species
Predicting and defining metabolic pathways and other functional modules Image: Molecular Biology of the Cell, 3 . rd editi...
 
Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterog...
Conclusions <ul><li>Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequen...
Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel ...
Thank you!
Upcoming SlideShare
Loading in …5
×

STRING - Prediction of functional relations, modules, and networks from heterogeneous genome-scale data

1,096 views

Published on

Netherlands Conference on Bioinformatics 2004, Hanzehogeschool, Groningen, Netherlands, October 7-8, 2004

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,096
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • STRING - Prediction of functional relations, modules, and networks from heterogeneous genome-scale data

    1. 1. STRING Prediction of functional relations, modules, and networks from heterogeneous genome-scale data Lars Juhl Jensen EMBL Heidelberg
    2. 2. Cross-species integration of diverse data <ul><li>Challenges and promises of large-scale data integration </li></ul><ul><ul><li>Explosive increase in both the amounts and different types of high-throughput data sets that are being produced </li></ul></ul><ul><ul><li>These data are highly heterogeneous and lack standardization </li></ul></ul><ul><ul><li>Most data sets are error-prone and suffer from systematic biases </li></ul></ul><ul><ul><li>Experiments should be integrated across model organisms </li></ul></ul><ul><li>STRING is a web resource that integrates and transfers diverse large-scale data across 100+ species, but it is not </li></ul><ul><ul><li>a primary repository for experimental data </li></ul></ul><ul><ul><li>a curated database of complexes or pathways </li></ul></ul><ul><ul><li>a substitute for expert annotation </li></ul></ul>
    3. 3. STRING provides a modular protein network by integrating diverse types of evidence Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
    4. 4. Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted protuberance Cellulose © Trends Microbiol, 1999 Cell Cell wall Anchoring proteins Cellulosomes Cellulose The “Cellulosome”
    5. 5. Genomic context methods © Nature Biotechnology, 2004
    6. 6. Score calibration against a common reference <ul><li>Many diverse types of evidence </li></ul><ul><ul><li>The quality of each is judged by very different raw scores </li></ul></ul><ul><ul><li>Quality differences exist among data sets of the same type </li></ul></ul><ul><li>Solved by calibrating all scores against a common reference </li></ul><ul><ul><li>Scores are directly comparable </li></ul></ul><ul><ul><li>Probabilistic scores allow evidence to be combined </li></ul></ul><ul><li>Requirements for the reference </li></ul><ul><ul><li>Must represent a compromise of the all types of evidence </li></ul></ul><ul><ul><li>Broad species coverage </li></ul></ul>
    7. 7. Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
    8. 8. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
    9. 9. Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment of orthology is difficult for distant species </li></ul></ul><ul><ul><li>Functional equivalence cannot be guaranteed for in-paralogs </li></ul></ul><ul><li>These problems are addressed by our “fuzzy orthology” scheme </li></ul><ul><ul><li>Confidence scores for functional equivalence are calculated from all-against-all alignment </li></ul></ul><ul><ul><li>Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships </li></ul></ul>? Source species Target species
    10. 10. Multiple evidence types from several species
    11. 11. Predicting and defining metabolic pathways and other functional modules Image: Molecular Biology of the Cell, 3 . rd edition Metabolism overview Defined manually: cutting metabolic maps into pathways Purine biosynthesis Histidine biosynthesis Defined objectively: standard clustering of genome-scale data
    12. 13. Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterogeneous data </li></ul><ul><li>The different types of data do not all tell us about the same kind of functional associations </li></ul><ul><li>It should be possible to assign likely interaction types from supporting evidence types </li></ul><ul><li>An accurate model of the yeast mitotic cell cycle </li></ul><ul><li>Approach </li></ul><ul><ul><li>High confidence set of physical interactions </li></ul></ul><ul><ul><li>Custom analysis of cell cycle expression data </li></ul></ul><ul><li>Observations </li></ul><ul><ul><li>Dynamic assembly of cell cycle complexes </li></ul></ul><ul><ul><li>Temporal regulation of Cdk specificity </li></ul></ul>
    13. 14. Conclusions <ul><li>Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequences alone </li></ul><ul><li>New genomic context methods are still being developed </li></ul><ul><li>Integration of large-scale experimental data allows similar predictions to be made for eukaryotic proteins </li></ul><ul><li>Successful data integration requires benchmarking and cross-species transfer of information </li></ul><ul><li>STRING does all this – try it at http://string.embl.de </li></ul>
    14. 15. Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Daniel Jaeggi </li></ul></ul><ul><ul><li>Steffen Schmidt </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>New genomic context methods </li></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>ArrayProspector web service </li></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Chris Workman </li></ul></ul><ul><li>NetView visualization tool </li></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><li>Study of yeast mitochondria </li></ul><ul><ul><li>Fabiana Perocchi </li></ul></ul><ul><ul><li>Lars Steinmetz </li></ul></ul><ul><li>Analysis of yeast cell cycle </li></ul><ul><ul><li>Ulrik de Lichtenberg </li></ul></ul><ul><ul><li>Thomas Skøt </li></ul></ul><ul><ul><li>Anders Fausbøll </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><li>Web resources </li></ul><ul><ul><li>string.embl.de </li></ul></ul><ul><ul><li>www.bork.embl.de/ArrayProspector </li></ul></ul><ul><ul><li>www.bork.embl.de/synonyms </li></ul></ul>
    15. 16. Thank you!

    ×