Integration of heterogeneous data

448 views

Published on

9th Course in Bioinformatics for Molecular Biologist, Bertinoro, Italy, March 22-26, 2009

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
448
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integration of heterogeneous data

  1. 1. Lars Juhl Jensen Integration of heterogeneous data
  2. 2. Lars Juhl Jensen Integration of heterogeneous data
  3. 3. Lars Juhl Jensen Integration of heterogeneous data
  4. 6. what went wrong?
  5. 7. a good question
  6. 8. signaling networks
  7. 9. Oda & Kitano, Molecular Systems Biology , 2006
  8. 10. long way to go
  9. 11. mass spectrometry
  10. 12. Linding, Jensen, Ostheimer et al., Cell , 2007
  11. 13. phosphorylation sites
  12. 14. in vivo
  13. 15. kinases are unknown
  14. 16. peptide assays
  15. 17. Miller, Jensen et al., Science Signaling , 2008
  16. 18. sequence specificity
  17. 19. kinase-specific
  18. 20. in vitro
  19. 21. no context
  20. 22. what a kinase could do
  21. 23. not what it actually does
  22. 24. computational methods
  23. 25. sequence specificity
  24. 26. Miller, Jensen et al., Science Signaling , 2008
  25. 27. kinase-specific
  26. 28. no context
  27. 29. what a kinase could do
  28. 30. not what it actually does
  29. 31. in vitro
  30. 32. in vivo
  31. 33. context
  32. 34. co-activators
  33. 35. scaffolders
  34. 36. expression
  35. 37. association networks
  36. 38. Linding, Jensen, Ostheimer et al., Cell , 2007
  37. 39. a good idea
  38. 40. Linding, Jensen, Ostheimer et al., Cell , 2007
  39. 41. Part I sequence motifs
  40. 42. curated motifs
  41. 43. PROSITE
  42. 44. ELM
  43. 45. HPRD
  44. 46. regular expressions
  45. 47. [ST]P.[KR]
  46. 48. no score
  47. 49. Miller, Jensen et al., Science Signaling , 2008
  48. 50. insufficient
  49. 51. machine learning
  50. 52. NetPhosK
  51. 53. PredPhospho
  52. 54. PHOSITE
  53. 55. GPS
  54. 56. KinasePhos
  55. 57. PPSP
  56. 58. GANNPhos
  57. 59. PhoScan
  58. 60. no regular updates
  59. 61. NetPhorest
  60. 62. Miller, Jensen et al., Science Signaling , 2008
  61. 63. data sources
  62. 64. Phospho.ELM
  63. 65. Diella et al., Nucleic Acids Res. , 2008
  64. 66. Diella et al., Nucleic Acids Res. , 2008
  65. 67. Scansite
  66. 68. Obenauer et al., Nucleic Acids Res. , 2003
  67. 69. Miller, Jensen et al., Science Signaling , 2008
  68. 70. common basis
  69. 71. Miller, Jensen et al., Science Signaling , 2008
  70. 72. automated pipeline
  71. 73. compilation of datasets
  72. 74. classification vs. prediction
  73. 75. Miller, Jensen et al., Science Signaling , 2008
  74. 76. homology reduction
  75. 77. Miller, Jensen et al., Science Signaling , 2008
  76. 78. training and evaluation
  77. 79. cross-validation
  78. 80. Miller, Jensen et al., Science Signaling , 2008
  79. 81. classifier selection
  80. 82. Miller, Jensen et al., Science Signaling , 2008
  81. 83. motif atlas
  82. 85. 179 kinases
  83. 86. 93 SH2 domains
  84. 87. 8 PTB domains
  85. 88. BRCT domains
  86. 89. WW domains
  87. 90. 14-3-3 proteins
  88. 91. phosphatases
  89. 92. model organisms
  90. 93. S. cerevisiae
  91. 94. D. melanogaster
  92. 95. C. elegans
  93. 96. biological insights
  94. 97. docking domains
  95. 98. Miller, Jensen et al., Science Signaling , 2008
  96. 99. disease-related kinases
  97. 100. Miller, Jensen et al., Science Signaling , 2008
  98. 101. predictive power
  99. 102. ROC curves
  100. 103. Miller, Jensen et al., Science Signaling , 2008
  101. 104. comparison
  102. 105. Miller, Jensen et al., Science Signaling , 2008
  103. 106. conclusions
  104. 107. data collection
  105. 108. automation
  106. 109. benchmarking
  107. 110. homology reduction!
  108. 111. Part II association networks
  109. 112. STRING
  110. 113. Jensen, Kuhn et al., Nucleic Acids Research , 2009
  111. 114. functional associations
  112. 115. data integration
  113. 116. common basis
  114. 117. 630 genomes
  115. 118. model organism databases
  116. 119. Ensembl
  117. 120. RefSeq
  118. 121. genomic context methods
  119. 122. gene fusion
  120. 123. Korbel et al., Nature Biotechnology , 2004
  121. 124. conserved neighborhood
  122. 125. operons
  123. 126. Korbel et al., Nature Biotechnology , 2004
  124. 127. bidirectional promoters
  125. 128. Korbel et al., Nature Biotechnology , 2004
  126. 129. phylogenetic profiles
  127. 130. Korbel et al., Nature Biotechnology , 2004
  128. 131. primary experimental data
  129. 132. protein interactions
  130. 133. yeast two-hybrid
  131. 134. affinity purification
  132. 135. fragment complementation
  133. 136. Jensen & Bork, Science , 2008
  134. 137. genetic interactions
  135. 138. Beyer et al., Nature Reviews Genetics , 2007
  136. 139. BIND Biomolecular Interaction Network Database
  137. 140. BioGRID General Repository for Interaction Datasets
  138. 141. DIP Database of Interacting Proteins
  139. 142. IntAct
  140. 143. MINT Molecular Interactions Database
  141. 144. HPRD Human Protein Reference Database
  142. 145. PDB Protein Data Bank
  143. 146. inferred associations
  144. 147. gene coexpression
  145. 149. GEO Gene Expression Omnibus
  146. 150. expression compendia
  147. 151. curated knowledge
  148. 152. complexes
  149. 153. MIPS Munich Information center for Protein Sequences
  150. 154. Gene Ontology
  151. 155. pathways
  152. 156. Letunic & Bork, Trends in Biochemical Sciences , 2008
  153. 157. KEGG Kyoto Encyclopedia of Genes and Genomes
  154. 158. MetaCyc
  155. 159. Reactome
  156. 160. PID NCI-Nature Pathway Interaction Database
  157. 161. literature mining
  158. 162. M EDLINE
  159. 163. SGD Saccharomyces Genome Database
  160. 164. The Interactive Fly
  161. 165. OMIM Online Mendelian Inheritance in Man
  162. 166. co-mentioning
  163. 167. statistical methods
  164. 168. NLP Natural Language Processing
  165. 169. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxgene The GAL4 gene ] </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  166. 171. easy in theory …
  167. 172. … but not in practice
  168. 173. different formats
  169. 174. parsers
  170. 175. different identifiers
  171. 176. thesaurus
  172. 177. redundant sources
  173. 178. book keeping
  174. 179. variable quality
  175. 180. raw quality scores
  176. 181. reproducibility
  177. 182. von Mering et al., Nucleic Acids Research , 2005
  178. 183. benchmarking
  179. 184. von Mering et al., Nucleic Acids Research , 2005
  180. 185. spread over 630 genomes
  181. 186. transfer by orthology
  182. 187. von Mering et al., Nucleic Acids Research , 2005
  183. 188. two modes
  184. 189. COG mode
  185. 190. von Mering et al., Nucleic Acids Research , 2005
  186. 191. protein mode
  187. 192. von Mering et al., Nucleic Acids Research , 2005
  188. 193. combine all evidence
  189. 194. visualize
  190. 195. Frishman et al., Modern Genome Annotation , 2009
  191. 196. STITCH
  192. 198. metabolite–enzyme links
  193. 199. pathway databases
  194. 200. Letunic & Bork, Trends in Biochemical Sciences , 2008
  195. 201. drug–target links
  196. 202. Drugbank
  197. 203. PDSP K i
  198. 204. MATADOR
  199. 205. Campillos & Kuhn et al., Science , 2008
  200. 206. chemical–chemical links
  201. 207. shared targets
  202. 208. fingerprint similarity
  203. 209. chemical–protein network
  204. 211. conclusions
  205. 212. more data is better
  206. 213. quality scores
  207. 214. benchmarking
  208. 215. cross-species integration
  209. 216. Part III putting it all together
  210. 217. Linding, Jensen, Ostheimer et al., Cell , 2007
  211. 218. NetworKIN
  212. 220. benchmarking
  213. 221. Linding, Jensen, Ostheimer et al., Cell , 2007
  214. 222. 2.5-fold better accuracy
  215. 223. context is crucial
  216. 224. localization
  217. 225. Linding, Jensen, Ostheimer et al., Cell , 2007
  218. 226. DNA damage response
  219. 227. Linding, Jensen, Ostheimer et al., Cell , 2007
  220. 228. Linding, Jensen, Ostheimer et al., Cell , 2007
  221. 229. small-scale validation
  222. 230. ATM phosphorylates Rad50
  223. 231. Linding, Jensen, Ostheimer et al., Cell , 2007
  224. 232. Cdk1 phosphorylates 53BP1
  225. 233. Linding, Jensen, Ostheimer et al., Cell , 2007
  226. 234. high-throughput validation
  227. 235. multiple reaction monitoring
  228. 236. Linding, Jensen, Ostheimer et al., Cell , 2007
  229. 237. systematic validation
  230. 238. kinase inhibitor matrix
  231. 239. Fedorov et al., PNAS , 2007
  232. 240. design optimal experiments
  233. 241. integration with literature
  234. 242. Reflect
  235. 246. conclusions
  236. 247. complementary data
  237. 248. visualization
  238. 249. a good question
  239. 251. Acknowledgments <ul><li>NetworKIN.info </li></ul><ul><ul><li>Rune Linding </li></ul></ul><ul><ul><li>Gerard Ostheimer </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Karen Colwill </li></ul></ul><ul><ul><li>Jing Jin </li></ul></ul><ul><ul><li>Pavel Metalnikov </li></ul></ul><ul><ul><li>Vivian Nguyen </li></ul></ul><ul><ul><li>Adrian Pasculescu </li></ul></ul><ul><ul><li>Jin Gyoon Park </li></ul></ul><ul><ul><li>Leona D. Samson </li></ul></ul><ul><ul><li>Rob Russell </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Michael Yaffe </li></ul></ul><ul><ul><li>Tony Pawson </li></ul></ul><ul><li>STITCH.embl.de </li></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Monica Campillos </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>NetPhorest.info </li></ul><ul><ul><li>Martin Lee Miller </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Claus Jørgensen </li></ul></ul><ul><ul><li>Michele Tinti </li></ul></ul><ul><ul><li>Lei Li </li></ul></ul><ul><ul><li>Marilyn Hsiung </li></ul></ul><ul><ul><li>Sirlester A. Parker </li></ul></ul><ul><ul><li>Jennifer Bordeaux </li></ul></ul><ul><ul><li>Thomas Sicheritz-Pontén </li></ul></ul><ul><ul><li>Marina Olhovsky </li></ul></ul><ul><ul><li>Adrian Pasculescu </li></ul></ul><ul><ul><li>Jes Alexander </li></ul></ul><ul><ul><li>Stefan Knapp </li></ul></ul><ul><ul><li>Nikolaj Blom </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Shawn Li </li></ul></ul><ul><ul><li>Gianni Cesareni </li></ul></ul><ul><ul><li>Tony Pawson </li></ul></ul><ul><ul><li>Benjamin E. Turk </li></ul></ul><ul><ul><li>Michael B. Yaffe </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><li>STRING.embl.de </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Manuel Stark </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Philippe Julien </li></ul></ul><ul><ul><li>Tobias Doerks </li></ul></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Reflect.ws </li></ul><ul><ul><li>Sean O’Donoghue </li></ul></ul><ul><ul><li>Evangelos Pafilis </li></ul></ul><ul><ul><li>Heiko Horn </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Nigel Brown </li></ul></ul><ul><ul><li>Reinhardt Schneider </li></ul></ul>

×