Prediction of protein networks through data integration

477 views

Published on

MIPS Retreat, Kloster Frauenchiemsee, Chiemsee, Germany, July 9-10, 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
477
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Prediction of protein networks through data integration

  1. 1. Prediction of protein networks through data integration Lars Juhl Jensen EMBL Heidelberg
  2. 2. prediction of interactions
  3. 3. STRING
  4. 5. functional interactions
  5. 6. 373 genomes
  6. 7. model organism databases
  7. 8. Ensembl
  8. 9. Genome Reviews
  9. 10. RefSeq
  10. 11. genomic context methods
  11. 12. gene neighborhood
  12. 14. gene fusion
  13. 16. phylogenetic profiles
  14. 21. Cell Cellulosomes Cellulose
  15. 22. correct interactions
  16. 23. wrong associations
  17. 24. phylogenetic profiles
  18. 26. SVD Singular Value Decomposition
  19. 27. Euclidian distance
  20. 28. gene neighborhood
  21. 30. sum of intergenic distances
  22. 31. raw quality scores
  23. 32. rank by reliability
  24. 33. not comparable
  25. 34. Euclidian distance
  26. 35. sum of intergenic distances
  27. 36. benchmarking
  28. 37. calibrate vs. gold standard
  29. 39. raw quality scores
  30. 40. probabilistic scores
  31. 41. curated knowledge
  32. 42. many sources
  33. 43. KEGG Kyoto Encyclopedia of Genes and Genomes
  34. 44. Reactome
  35. 45. PID NCI-Nature Pathway Interaction Database
  36. 46. STKE Signal Transduction Knowledge Environment
  37. 47. MIPS Munich Information center for Protein Sequences
  38. 48. Gene Ontology
  39. 49. different gene identifiers
  40. 50. synonyms list
  41. 51. literature mining
  42. 52. M EDLINE
  43. 53. SGD Saccharomyces Genome Database
  44. 54. The Interactive Fly
  45. 55. OMIM Online Mendelian Inheritance in Man
  46. 56. co-mentioning
  47. 57. NLP Natural Language Processing
  48. 58. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxgene The GAL4 gene ] </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  49. 59. calibrate vs. gold standard
  50. 61. primary experimental data
  51. 62. gene expression
  52. 63. GEO Gene Expression Omnibus
  53. 64. expression compendia
  54. 65. protein interactions
  55. 66. BIND Biomolecular Interaction Network Database
  56. 67. BioGRID General Repository for Interaction Datasets
  57. 68. DIP Database of Interacting Proteins
  58. 69. IntAct
  59. 70. MINT Molecular Interactions Database
  60. 71. HPRD Human Protein Reference Database
  61. 72. many sources
  62. 73. different gene identifiers
  63. 74. redundancy
  64. 75. not comparable
  65. 76. merge data by publication
  66. 77. raw quality scores
  67. 78. calibrate vs. gold standard
  68. 80. combine all evidence
  69. 81. spread over many species
  70. 82. transfer by orthology
  71. 83. na ïve Bayesian scoring
  72. 85. prediction of interactions
  73. 86. NetworKIN
  74. 88. the idea
  75. 89. phosphoproteomics
  76. 90. mass spectrometry
  77. 92. phosphorylation sites
  78. 93. Phospho.ELM
  79. 94. in vivo
  80. 95. kinases are unknown
  81. 96. computational methods
  82. 97. NetPhosK
  83. 98. Scansite
  84. 99. sequence motifs
  85. 101. kinase families
  86. 102. overprediction
  87. 103. no context
  88. 104. what a kinase could do
  89. 105. not what it actually does
  90. 106. context
  91. 107. co-activators
  92. 108. scaffolders
  93. 109. protein networks
  94. 111. the algorithm
  95. 112. NetworKIN
  96. 114. benchmarking
  97. 115. Phospho.ELM
  98. 117. 2.5-fold better accuracy
  99. 118. context is crucial
  100. 119. global statistics
  101. 121. visualization
  102. 123. ATM signaling
  103. 125. experimental validation
  104. 126. summary
  105. 127. reanalysis
  106. 128. benchmarking
  107. 129. integration
  108. 130. complementary data types
  109. 131. computational methods
  110. 132. reproduce what is know
  111. 133. biological discoveries
  112. 134. testable hypotheses
  113. 135. Acknowledgments <ul><li>The STRING database </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Literature mining </li></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul><ul><li>The NetworKIN method </li></ul><ul><ul><li>Rune Linding </li></ul></ul><ul><ul><li>Gerard Ostheimer </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Karen Colwill </li></ul></ul><ul><ul><li>Jing Jin </li></ul></ul><ul><ul><li>Pavel Metalnikov </li></ul></ul><ul><ul><li>Vivian Nguyen </li></ul></ul><ul><ul><li>Adrian Pasculescu </li></ul></ul><ul><ul><li>Jin Gyoon Park </li></ul></ul><ul><ul><li>Leona D. Samson </li></ul></ul><ul><ul><li>Rob Russell </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Michael Yaffe </li></ul></ul><ul><ul><li>Tony Pawson </li></ul></ul>

×