The STRING database Quality scores for heterogeneous interaction data Lars Juhl Jensen EMBL Heidelberg
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
functional interactions
Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expressi...
373 proteomes
model organism databases
Ensembl
Genome Reviews
RefSeq
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
scoring schemes
benchmarking
cross-species transfer
primary experimental data
many sources
different formats
different gene identifiers
redundancy
physical protein interactions
IntAct
BIND Biomolecular Interaction Network Database
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
GRID General Repository for Interaction Datasets
HPRD Human Protein Reference Database
PSI-MI
reference proteomes
merge data by publication
thousands of interactions
correct interactions
wrong interactions
scoring scheme
complex pull-down
von Mering et al., Nucleic Acids Research, 2005
log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
yeast two-hybrid
non-shared interactors
-log((N 1 +1) · (N 2 +1))
not directly comparable
calibrate vs. gold standard
 
other types of evidence
co-expression
GEO Gene Expression Omnibus
species-specific datasets
correlation coefficient
calibrate vs. gold standard
directly comparable
curated knowledge
many sources
different formats
different gene identifiers
redundancy
protein complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathway databases
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
PID NCI-Nature Pathway Interaction Database
STKE Signal Transduction Knowledge Environment
BioPAX
reference proteomes
literature mining
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
different gene identifiers
synonyms lists
black list
flexible matching
co-occurrence
log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
NLP Natural Language Processing
<ul><li>Gene  and protein  names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation e...
calibrate vs. gold standard
directly comparable
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
 
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
 
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
add probabilistic scores
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expressi...
Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn...
Upcoming SlideShare
Loading in …5
×

The STRING database - Quality scores for heterogeneous interaction data

1,584 views

Published on

Lyon, France, April 23-25, 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,584
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The STRING database - Quality scores for heterogeneous interaction data

  1. 1. The STRING database Quality scores for heterogeneous interaction data Lars Juhl Jensen EMBL Heidelberg
  2. 2. data integration
  3. 3. Jensen et al., Drug Discovery Today: Targets, 2004
  4. 4. functional interactions
  5. 5. Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expression data Literature mining
  6. 6. 373 proteomes
  7. 7. model organism databases
  8. 8. Ensembl
  9. 9. Genome Reviews
  10. 10. RefSeq
  11. 11. genomic context methods
  12. 12. gene fusion
  13. 14. gene neighborhood
  14. 16. phylogenetic profiles
  15. 18. scoring schemes
  16. 19. benchmarking
  17. 20. cross-species transfer
  18. 21. primary experimental data
  19. 22. many sources
  20. 23. different formats
  21. 24. different gene identifiers
  22. 25. redundancy
  23. 26. physical protein interactions
  24. 27. IntAct
  25. 28. BIND Biomolecular Interaction Network Database
  26. 29. MINT Molecular Interactions Database
  27. 30. DIP Database of Interacting Proteins
  28. 31. GRID General Repository for Interaction Datasets
  29. 32. HPRD Human Protein Reference Database
  30. 33. PSI-MI
  31. 34. reference proteomes
  32. 35. merge data by publication
  33. 36. thousands of interactions
  34. 37. correct interactions
  35. 38. wrong interactions
  36. 39. scoring scheme
  37. 40. complex pull-down
  38. 41. von Mering et al., Nucleic Acids Research, 2005
  39. 42. log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
  40. 43. yeast two-hybrid
  41. 44. non-shared interactors
  42. 45. -log((N 1 +1) · (N 2 +1))
  43. 46. not directly comparable
  44. 47. calibrate vs. gold standard
  45. 49. other types of evidence
  46. 50. co-expression
  47. 51. GEO Gene Expression Omnibus
  48. 52. species-specific datasets
  49. 53. correlation coefficient
  50. 54. calibrate vs. gold standard
  51. 55. directly comparable
  52. 56. curated knowledge
  53. 57. many sources
  54. 58. different formats
  55. 59. different gene identifiers
  56. 60. redundancy
  57. 61. protein complexes
  58. 62. MIPS Munich Information center for Protein Sequences
  59. 63. Gene Ontology
  60. 64. pathway databases
  61. 65. KEGG Kyoto Encyclopedia of Genes and Genomes
  62. 66. Reactome
  63. 67. PID NCI-Nature Pathway Interaction Database
  64. 68. STKE Signal Transduction Knowledge Environment
  65. 69. BioPAX
  66. 70. reference proteomes
  67. 71. literature mining
  68. 72. M EDLINE
  69. 73. SGD Saccharomyces Genome Database
  70. 74. The Interactive Fly
  71. 75. OMIM Online Mendelian Inheritance in Man
  72. 76. different gene identifiers
  73. 77. synonyms lists
  74. 78. black list
  75. 79. flexible matching
  76. 80. co-occurrence
  77. 81. log[(N 12 · N)/((N 1 +1) · (N 2 +1))]
  78. 82. NLP Natural Language Processing
  79. 83. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxgene The GAL4 gene ] </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  80. 84. calibrate vs. gold standard
  81. 85. directly comparable
  82. 86. combine all evidence
  83. 87. spread over many species
  84. 88. transfer by orthology
  85. 89. von Mering et al., Nucleic Acids Research, 2005
  86. 90. two modes
  87. 92. orthologous groups
  88. 93. von Mering et al., Nucleic Acids Research, 2005
  89. 95. fuzzy orthology
  90. 96. von Mering et al., Nucleic Acids Research, 2005
  91. 97. add probabilistic scores
  92. 98. P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
  93. 99. Genomic neighborhood Species co-occurrence Gene fusions Database imports Experimental interaction data Microarray expression data Literature mining
  94. 100. Acknowledgments <ul><li>The STRING team </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Literature mining project </li></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul>

×