0
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg
Genomes to systems – how do we get there? <ul><li>Challenges and promises of large-scale data integration </li></ul><ul><u...
A modular network of functional associations Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp....
Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted ...
Formalizing the phylogenetic profile method Align all proteins against all Calculate best-hit profile Join similar species...
Score calibration against a common reference <ul><li>Different pieces of evidence are not directly comparable </li></ul><u...
Inferring functional associations from evolutionarily conserved operons Identify runs of adjacent genes with the same dire...
Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment...
Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently...
Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combi...
Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (c...
The power of cross-species transfer and evidence integration
Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterog...
Summary <ul><li>Quality assessment of each individual large-scale data set is a prerequisite for successful data integrati...
Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel ...
Thank you!
Upcoming SlideShare
Loading in...5
×

STRING - Prediction of protein networks through integration of diverse large-scale data sets

668

Published on

Genomes to Systems, Manchester International Convension Center, Manchester, England, September 1-3, 2004

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
668
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "STRING - Prediction of protein networks through integration of diverse large-scale data sets"

    1. 1. STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg
    2. 2. Genomes to systems – how do we get there? <ul><li>Challenges and promises of large-scale data integration </li></ul><ul><ul><li>Explosive increase in both the amounts and types of genomics scale data sets being produced </li></ul></ul><ul><ul><li>These data are highly heterogeneous and lack standardization </li></ul></ul><ul><ul><li>Most data sets are error-prone and suffer from systematic biases </li></ul></ul><ul><li>STRING is a web resource that integrates and transfers diverse types of large-scale data across 100+ species </li></ul><ul><li>We do not intend STRING to be </li></ul><ul><ul><li>a primary repository for experimental data </li></ul></ul><ul><ul><li>a curated database of complexes or pathways </li></ul></ul><ul><ul><li>a substitute for expert annotation </li></ul></ul>
    3. 3. A modular network of functional associations Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
    4. 4. Inferring functional modules from gene presence/absence patterns T rends in Microbiology Resting protuberances Protracted protuberance Cellulose © Trends Microbiol, 1999 Cell Cell wall Anchoring proteins Cellulosomes Cellulose The “Cellulosome”
    5. 5. Formalizing the phylogenetic profile method Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps
    6. 6. Score calibration against a common reference <ul><li>Different pieces of evidence are not directly comparable </li></ul><ul><ul><li>A different raw quality score is used for each evidence type </li></ul></ul><ul><ul><li>Quality differences exist among data sets of the same type </li></ul></ul><ul><li>Solved by calibrating all scores against a common reference </li></ul><ul><li>Requirements for the reference </li></ul><ul><ul><li>Must represent a compromise of the all types of evidence </li></ul></ul><ul><ul><li>Broad species coverage </li></ul></ul><ul><ul><li>Our chosen reference is KEGG metabolic maps </li></ul></ul>
    7. 7. Inferring functional associations from evolutionarily conserved operons Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species
    8. 8. Evidence transfer based on “fuzzy orthology” <ul><li>Orthology transfer is tricky </li></ul><ul><ul><li>Correct assignment of orthology is difficult for distant species </li></ul></ul><ul><ul><li>Functional equivalence cannot be guaranteed for in-paralogs </li></ul></ul><ul><li>These problems are addressed by our “fuzzy orthology” scheme </li></ul><ul><ul><li>Confidence scores for functional equivalence are calculated from all-against-all alignment </li></ul></ul><ul><ul><li>Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships </li></ul></ul>? Source species Target species
    9. 9. Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
    10. 10. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
    11. 11. Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Infer associations in other species
    12. 12. The power of cross-species transfer and evidence integration
    13. 13. Getting more specific – generally speaking <ul><li>Benchmarking against one common reference allows integration of heterogeneous data </li></ul><ul><li>The different types of data do not all tell us about the same kind of functional associations </li></ul><ul><li>It should be possible to assign likely interaction types from supporting evidence types </li></ul><ul><li>An accurate model of the yeast mitotic cell cycle </li></ul><ul><li>Approach </li></ul><ul><ul><li>High confidence set of physical interactions </li></ul></ul><ul><ul><li>Custom analysis of cell cycle expression data </li></ul></ul><ul><li>Observations </li></ul><ul><ul><li>Dynamic assembly of cell cycle complexes </li></ul></ul><ul><ul><li>Temporal regulation of Cdk specificity </li></ul></ul>
    14. 14. Summary <ul><li>Quality assessment of each individual large-scale data set is a prerequisite for successful data integration </li></ul><ul><li>High confidence prediction of functional associations and modules is possible when combining lines of evidence </li></ul><ul><li>Transfer of evidence between species is an increasingly important aspect of large-scale data integration </li></ul><ul><li>Take a look at STRING – an update is in the pipeline </li></ul>
    15. 15. Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Daniel Jaeggi </li></ul></ul><ul><ul><li>Steffen Schmidt </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>ArrayProspector web service </li></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Chris Workman </li></ul></ul><ul><li>NetView visualization tool </li></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><li>Analysis of yeast cell cycle </li></ul><ul><ul><li>Ulrik de Lichtenberg </li></ul></ul><ul><ul><li>Thomas Skøt </li></ul></ul><ul><ul><li>Anders Fausbøll </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><li>Web resources </li></ul><ul><ul><li>string.embl.de </li></ul></ul><ul><ul><li>www.bork.embl.de/ArrayProspector </li></ul></ul><ul><ul><li>www.bork.embl.de/synonyms </li></ul></ul>
    16. 16. Thank you!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×