Exploring proteins,
  chemicals and their
      interactions
with STRING and STITCH
       Michael Kuhn
(talk and practical session)
interactions of proteins
     and chemicals
example



Tryptophan synthase beta chain
         E. Coli K12
example



  aspirin
Homo sapiens
STRING: version 8.3
    soon: version 9
interactions of proteins


  STITCH: version 2
    interactions of
proteins and ch...
content




STRING 8
630 genomes



only completely sequenced genomes

    STRING 9: >1100 genomes
2.5 million genes




   (not proteins)
74,000 chemicals




(including 2200 drugs)
many sources of
 interactions
genomic context methods
gene neighborhood
gene fusion
phylogenetic profiles
curated knowledge
experimental evidence




      T
co-expression




GEO: Gene Expression Omnibus
experimental databases
literature
variable quality




different “raw scores”
benchmarking



calibrate against “gold standard”
             (KEGG)
probabilistic scores




e.g. “70% chance for an association”
combine all evidence
Bayesian scoring scheme
e.g.: two scores of 0.7
combined probability: ?
e.g.: two scores of 0.7
combined probability: 0.91



      1-   (1-0.7) 2   = 0.91
evidence spread
over many species
evidence transfer
transfer by orthology




  (or “fuzzy orthology”)
von Mering et al., Nucleic Acids Research, 2005
von Mering et al., Nucleic Acids Research, 2005
two modes
proteins mode
von Mering et al., Nucleic Acids Research, 2005
maximum specificity
   lower coverage



information will be relevant for
       selected species
COG mode




“clusters of orthologous groups”
von Mering et al., Nucleic Acids Research, 2005
higher coverage
     lower specificity


 includes all available evidence

some orthologous groups are too
     large to be...
STRING plans

• next big release (9.0):
 • coming end of 2010 / early 2011
 • more genomes
 • allow users to add more data...
STITCH plans

• next minor release (2.1):
 • add ChEMBLdb
• next big release (3.0):
 • “zoom” into stereo-isomers, salt fo...
Acknowledgements
STRING
Christian von Mering   STITCH
Lars Juhl Jensen       Damian Szklarczyk
Manuel Stark           Andr...
string-db.org
Jensen et al., NAR Database Issue 2009




      stitch-db.org
Kuhn et al., NAR Database Issue 2010
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
STRING/STITCH tutorial
Upcoming SlideShare
Loading in …5
×

STRING/STITCH tutorial

2,409 views

Published on

Talk about STRING and STITCH, given as part of the EMBO Practical Course 'Computational aspects of protein structure determination and analysis: from data to structure to function' at the EBI in Hinxton (Sept. 10, 2010)

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,409
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide












  • next: neighborhood view
  • next: text mining


  • next: actions aspirin -- PTGS1






  • four different areas of sources
  • especially for procaryotes

























  • In this mode, STRING predicts interaction partners for one protein in a specific species.

    This allows for maximum specificity, but has slightly lower coverage. Why? Because, in protein mode, STRING does not precisely know about orthologs in other species - instead, it resorts to estimating orthology through sequence similarity searches. In short, interaction information is transferred between species based on 'degree of orthology' (whereby 'degree of orthology' is a measure of how confident STRING is that two proteins are orthologs. The measure is derived from all-against-all similarity searches, and takes into account putative paralogs in both species. The fewer paralogs there are, the more confident STRING is about orthology).



  • In this mode, STRING predicts interaction partners for a group of orthologous proteins.

    This generally has higher coverage, but may result in slightly lower specificity. Again, the reason is in how STRING derives orthologs. In COG-Mode, information about orthology is derived from the database 'Clusters of Orthologous Groups' (Tatusov & Koonin, NCBI). There, orthology is an 'all-or-nothing' decision, and all proteins considered orthologous are grouped into a single entity. Therefore, a prediction made for one protein applies to all proteins in the group - which is why STRING shows its predictions at the level of the groups. Coverage is higher, because the groups are partly based on manual curation and contain orthology assignments which are difficult to derive through an automated procedure. Specificity is lower, however, because some groups are (for technical reasons) relatively 'inclusive' - i.e. they contain a large number of proteins which cannot be resolved further. For example, almost all Serine/Threonine kinases are grouped into one COG - making predictions for a specific subset impossible. Nevertheless, COGs are very powerful and are the first choice for proteins which do not show much lineage-specific expansions, especially in prokaryotes.







  • STRING/STITCH tutorial

    1. 1. Exploring proteins, chemicals and their interactions with STRING and STITCH Michael Kuhn
    2. 2. (talk and practical session)
    3. 3. interactions of proteins and chemicals
    4. 4. example Tryptophan synthase beta chain E. Coli K12
    5. 5. example aspirin Homo sapiens
    6. 6. STRING: version 8.3 soon: version 9 interactions of proteins STITCH: version 2 interactions of proteins and chemicals
    7. 7. content STRING 8
    8. 8. 630 genomes only completely sequenced genomes STRING 9: >1100 genomes
    9. 9. 2.5 million genes (not proteins)
    10. 10. 74,000 chemicals (including 2200 drugs)
    11. 11. many sources of interactions
    12. 12. genomic context methods
    13. 13. gene neighborhood
    14. 14. gene fusion
    15. 15. phylogenetic profiles
    16. 16. curated knowledge
    17. 17. experimental evidence T
    18. 18. co-expression GEO: Gene Expression Omnibus
    19. 19. experimental databases
    20. 20. literature
    21. 21. variable quality different “raw scores”
    22. 22. benchmarking calibrate against “gold standard” (KEGG)
    23. 23. probabilistic scores e.g. “70% chance for an association”
    24. 24. combine all evidence
    25. 25. Bayesian scoring scheme
    26. 26. e.g.: two scores of 0.7 combined probability: ?
    27. 27. e.g.: two scores of 0.7 combined probability: 0.91 1- (1-0.7) 2 = 0.91
    28. 28. evidence spread over many species
    29. 29. evidence transfer
    30. 30. transfer by orthology (or “fuzzy orthology”)
    31. 31. von Mering et al., Nucleic Acids Research, 2005
    32. 32. von Mering et al., Nucleic Acids Research, 2005
    33. 33. two modes
    34. 34. proteins mode
    35. 35. von Mering et al., Nucleic Acids Research, 2005
    36. 36. maximum specificity lower coverage information will be relevant for selected species
    37. 37. COG mode “clusters of orthologous groups”
    38. 38. von Mering et al., Nucleic Acids Research, 2005
    39. 39. higher coverage lower specificity includes all available evidence some orthologous groups are too large to be meaningful
    40. 40. STRING plans • next big release (9.0): • coming end of 2010 / early 2011 • more genomes • allow users to add more data to the network
    41. 41. STITCH plans • next minor release (2.1): • add ChEMBLdb • next big release (3.0): • “zoom” into stereo-isomers, salt forms
    42. 42. Acknowledgements STRING Christian von Mering STITCH Lars Juhl Jensen Damian Szklarczyk Manuel Stark Andrea Franceschini Samuel Chaffron Monica Campillos Chris Creevey Christian von Mering Jean Muller Lars Juhl Jensen Tobias Doerks Andreas Beyer Philippe Julien Peer Bork Alexander Roth Milan Simonovic Peer Bork
    43. 43. string-db.org Jensen et al., NAR Database Issue 2009 stitch-db.org Kuhn et al., NAR Database Issue 2010

    ×