Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Integration of diverse large-scale datasets

547 views

Published on

IPAM Proteomics Reunion Conference, UCLA, Lake Arrowhead, California, December 11-16, 2005

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Integration of diverse large-scale datasets

  1. 1. Integration of diverse large-scale datasets
  2. 2. Lars Juhl Jensen
  3. 6. promoter analysis
  4. 7. Jensen et al., Bioinformatics, 2000
  5. 8. DNA structure
  6. 9. genome visualization
  7. 10. Pedersen et al., Journal of Molecular Biology, 2000
  8. 11. microarray normalization
  9. 12. Workman et al., Genome Biology, 2002
  10. 13. protein function prediction
  11. 18. STRING
  12. 20. integrate diverse evidence
  13. 21. functional interactions
  14. 22. Bork et al., Current Opinion in Structural Biology, 2005
  15. 23. 179 proteomes
  16. 24. evolution
  17. 27. statistics
  18. 28. (the original sin)
  19. 29. prokaryotes
  20. 30. genomic context methods
  21. 31. gene fusion
  22. 33. gene neighborhood
  23. 35. phylogenetic profiles
  24. 40. Cell Cellulosomes Cellulose
  25. 41. eukaryotes
  26. 42. integrate diverse datasets
  27. 43. Jensen et al., Drug Discovery Today: Targets, 2004
  28. 44. curated knowledge
  29. 45. MIPS Munich Information center for Protein Sequences
  30. 46. KEGG Kyoto Encyclopedia of Genes and Genomes
  31. 47. STKE Signal Transduction Knowledge Environment
  32. 48. Reactome
  33. 49. literature mining
  34. 50. M EDLINE
  35. 51. SGD Saccharomyces Genome Database
  36. 52. The Interactive Fly
  37. 53. OMIM Online Mendelian Inheritance in Man
  38. 54. co-mentioning
  39. 55. NLP Natural Language Processing
  40. 56. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxgene The GAL4 gene ] </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  41. 58. primary experimental data
  42. 59. microarray expression data
  43. 60. GEO Gene Expression Omnibus
  44. 61. physical protein interactions
  45. 62. BIND Biomolecular Interaction Network Database
  46. 63. MINT Molecular Interactions Database
  47. 64. GRID General Repository for Interaction Datasets
  48. 65. DIP Database of Interacting Proteins
  49. 66. HPRD Human Protein Reference Database
  50. 67. problems
  51. 68. many sources
  52. 69. (different gene identifiers)
  53. 70. many types of evidence
  54. 71. questionable quality
  55. 72. not directly comparable
  56. 73. spread over many species
  57. 74. huge synonyms lists
  58. 75. calculate raw quality scores
  59. 76. calibrate vs. gold standard
  60. 77. KEGG Kyoto Encyclopedia of Genes and Genomes
  61. 78. von Mering et al., Nucleic Acids Research, 2005
  62. 79. transfer based on orthology
  63. 80. combine all evidence
  64. 81. Bork et al., Current Opinion in Structural Biology, 2005
  65. 82. cell cycle
  66. 83. qualitative modeling
  67. 85. Chen et al., Molecular Biology of the Cell, 2004
  68. 86. Chen et al., Molecular Biology of the Cell, 2004
  69. 87. synchronized cell culture
  70. 89. microarray time series
  71. 91. periodically expressed genes
  72. 93. S. cerevisiae
  73. 94. Cho et al.
  74. 95. Spellman et al.
  75. 96. numerous analysis methods
  76. 97. Cho et al.
  77. 98. Spellman et al.
  78. 99. Zhao et al.
  79. 100. Johansson et al.
  80. 101. Luan and Li
  81. 102. Lu et al.
  82. 103. Ahdesm äki et al.
  83. 104. Willbrand et al.
  84. 105. no benchmarking
  85. 106. de Lichtenberg et al., Bioinformatics, 2005
  86. 107. reproducibility
  87. 108. de Lichtenberg et al., Bioinformatics, 2005
  88. 109. regulation vs. periodicity
  89. 110. de Lichtenberg et al., Bioinformatics, 2005
  90. 111. list of 600 periodic genes
  91. 112. S. pombe
  92. 113. several expression studies
  93. 114. reproducibility
  94. 115. Marguerat et al., Yeast, 2006
  95. 116. name inconsistencies
  96. 117. Marguerat et al., Yeast, 2006
  97. 118. different analysis methods
  98. 119. no benchmarking
  99. 120. Marguerat et al., Yeast, 2006
  100. 121. Marguerat et al., Yeast, 2006
  101. 122. too many genes suggested
  102. 123. Marguerat et al., Yeast, 2006
  103. 124. Marguerat et al., Yeast, 2006
  104. 125. averaging better than voting
  105. 126. Marguerat et al., Yeast, 2006
  106. 127. S. cerevisiae
  107. 128. list of 600 periodic genes
  108. 129. protein interaction data
  109. 131. von Mering et al., Nucleic Acids Research, 2005
  110. 132. de Lichtenberg et al., Science, 2005
  111. 133. dynamic proteins
  112. 134. static proteins
  113. 135. de Lichtenberg et al., Science, 2005
  114. 136. reproduces what is known
  115. 137. de Lichtenberg et al., Science, 2005
  116. 138. many detailed predictions
  117. 139. de Lichtenberg et al., Science, 2005
  118. 140. global trends
  119. 141. dynamic proteins
  120. 142. de Lichtenberg et al., Science, 2005
  121. 143. static proteins
  122. 144. de Lichtenberg et al., Science, 2005
  123. 145. just-in-time assembly
  124. 146. de Lichtenberg et al., Science, 2005
  125. 147. de Lichtenberg et al., Science, 2005
  126. 148. coordinated regulation
  127. 149. periodically expressed genes
  128. 150. Cdc28p substrates
  129. 151. PEST degradation signals
  130. 152. the human interactome
  131. 153. yeast two-hybrid
  132. 154. 1936 13 4 4 1385 65 18465 Stelzl et al. Rual et al. Small-scale studies
  133. 155. 32 0 3 4 18 4 23 Stelzl et al. Rual et al. Small-scale studies
  134. 156. 62 8 39 Small-scale studies Stelzl et al. Rual et al. 852 17 473 432 69 260
  135. 157. 3.5% and 21% sensitivity
  136. 158. in a couple of years
  137. 159. the human interactome
  138. 160. 100% = 1/5?
  139. 161. the yeast interactome
  140. 162. five years ago
  141. 163. yeast two-hybrid
  142. 164. 1150 117 117 72 4053 118 4469 Uetz et al. Ito et al. Small-scale studies
  143. 165. 162 53 34 72 180 29 338 Uetz et al. Ito et al. Small-scale studies
  144. 166. 511 189 616 Small-scale studies Uetz et al. Ito et al. 439 178 759 897 190 1347
  145. 167. 19% and 12% sensitivity
  146. 168. the challenge
  147. 169. how to get from here …
  148. 170. 1936 13 4 4 1385 65 18465 Stelzl et al. Rual et al. Small-scale studies
  149. 171. … to there …
  150. 172. de Lichtenberg et al., Science, 2005
  151. 173. Acknowledgments <ul><li>The STRING team (EMBL) </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><ul><li>Mathilde Foglierini </li></ul></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Literature mining project (EML Research) </li></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul><ul><li>Cell cycle studies (CBS) </li></ul><ul><ul><li>Ulrik de Lichtenberg </li></ul></ul><ul><ul><li>Thomas Skøt Jensen </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><li>S. pombe cell cycle (Sanger) </li></ul><ul><ul><li>Samuel Marguerat </li></ul></ul><ul><ul><li>J ürg Bähler </li></ul></ul><ul><li>Inspiration for presentation </li></ul><ul><ul><li>Lawrence Lessig </li></ul></ul><ul><ul><li>Dick Clarence Hardt </li></ul></ul><ul><ul><li>Anders Gorm Pedersen </li></ul></ul>
  152. 174. Thank you!

×