STRING - Cross-species integration of known and predicted protein-protein interactions

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    STRING - Cross-species integration of known and predicted protein-protein interactions - Presentation Transcript

    1. STRING Cross-species integration of known and predicted protein-protein interactions Lars Juhl Jensen EMBL Heidelberg
    2. STRING provides a protein network based on integration of diverse types of evidence Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
    3. Inferring functional modules from gene presence/absence patterns T Resting protuberances Protracted protuberance Cellulose © Trends Microbiol, 1999 Cell Cell wall Anchoring proteins Cellulosomes Cellulose The “Cellulosome”
    4. Genomic context methods © Nature Biotechnology, 2004
    5. Formalizing the phylogenetic profile method Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps
    6. Predicting functional and physical interactions from gene fusion/fission events Find in A genes that match a the same gene in B Exclude overlapping alignments Calibrate against KEGG maps Calculate all-against-all pairwise alignments
    7. Inferring functional associations from evolutionarily conserved operons Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species
    8. Score calibration against a common reference
      • Many diverse types of evidence
        • The quality of each is judged by very different raw scores
        • Quality differences exist among data sets of the same type
      • Solved by calibrating all scores against a common reference
        • Scores are directly comparable
        • Probabilistic scores allow evidence to be combined
      • Requirements for the reference
        • Must represent a compromise of the all types of evidence
        • Broad species coverage
    9. Integrating physical interaction screens Complex pull-down experiments Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
    10. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
    11. Evidence transfer based on “fuzzy orthology”
      • Orthology transfer is tricky
        • Correct assignment of orthology is difficult for distant species
        • Functional equivalence cannot be guaranteed for in-paralogs
      • These problems are addressed by our “fuzzy orthology” scheme
        • Confidence scores for functional equivalence are calculated from all-against-all alignment
        • Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships
      ? Source species Target species
    12. Multiple evidence types from several species
    13. Getting more specific – generally speaking
      • Benchmarking against one common reference allows integration of heterogeneous data
      • The different types of data do not all tell us about the same kind of functional associations
      • It should be possible to assign likely interaction types from supporting evidence types
      • The aim: to construct an accurate, qualitative models of biological systems or processes
      • The models should be accurate even at the level of individual interactions
      • This allows specific, testable hypotheses to be made based on high-throughput experimental data
    14. Conclusions
      • Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequences alone
      • Integration of large-scale experimental data allows similar predictions to be made for eukaryotic proteins
      • Benchmarking is a prerequisite for data integration
      • Cross-species transfer is essential for making the most of the available data
      • Try STRING at http://string.embl.de
    15. Acknowledgments
      • The STRING team
        • Christian von Mering
        • Berend Snel
        • Martijn Huynen
        • Daniel Jaeggi
        • Steffen Schmidt
        • Mathilde Foglierini
        • Peer Bork
      • ArrayProspector
        • Julien Lagarde
        • Chris Workman
      • New context methods
        • Jan Korbel
        • Christian von Mering
        • Peer Bork
    16. Thank you!

    + Lars Juhl JensenLars Juhl Jensen, 2 years ago

    custom

    631 views, 0 favs, 0 embeds more stats

    Samuel Lunenfeld Research Institute, Mt. Sinai Hosp more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 631
      • 631 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories