Integration of heterogeneous data
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Integration of heterogeneous data

  • 787 views
Uploaded on

9th Course in Bioinformatics for Molecular Biologist, Bertinoro, Italy, March 22-26, 2009

9th Course in Bioinformatics for Molecular Biologist, Bertinoro, Italy, March 22-26, 2009

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
787
On Slideshare
787
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lars Juhl Jensen Integration of heterogeneous data
  • 2. Lars Juhl Jensen Integration of heterogeneous data
  • 3. Lars Juhl Jensen Integration of heterogeneous data
  • 4.  
  • 5.  
  • 6. what went wrong?
  • 7. a good question
  • 8. signaling networks
  • 9. Oda & Kitano, Molecular Systems Biology , 2006
  • 10. long way to go
  • 11. mass spectrometry
  • 12. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 13. phosphorylation sites
  • 14. in vivo
  • 15. kinases are unknown
  • 16. peptide assays
  • 17. Miller, Jensen et al., Science Signaling , 2008
  • 18. sequence specificity
  • 19. kinase-specific
  • 20. in vitro
  • 21. no context
  • 22. what a kinase could do
  • 23. not what it actually does
  • 24. computational methods
  • 25. sequence specificity
  • 26. Miller, Jensen et al., Science Signaling , 2008
  • 27. kinase-specific
  • 28. no context
  • 29. what a kinase could do
  • 30. not what it actually does
  • 31. in vitro
  • 32. in vivo
  • 33. context
  • 34. co-activators
  • 35. scaffolders
  • 36. expression
  • 37. association networks
  • 38. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 39. a good idea
  • 40. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 41. Part I sequence motifs
  • 42. curated motifs
  • 43. PROSITE
  • 44. ELM
  • 45. HPRD
  • 46. regular expressions
  • 47. [ST]P.[KR]
  • 48. no score
  • 49. Miller, Jensen et al., Science Signaling , 2008
  • 50. insufficient
  • 51. machine learning
  • 52. NetPhosK
  • 53. PredPhospho
  • 54. PHOSITE
  • 55. GPS
  • 56. KinasePhos
  • 57. PPSP
  • 58. GANNPhos
  • 59. PhoScan
  • 60. no regular updates
  • 61. NetPhorest
  • 62. Miller, Jensen et al., Science Signaling , 2008
  • 63. data sources
  • 64. Phospho.ELM
  • 65. Diella et al., Nucleic Acids Res. , 2008
  • 66. Diella et al., Nucleic Acids Res. , 2008
  • 67. Scansite
  • 68. Obenauer et al., Nucleic Acids Res. , 2003
  • 69. Miller, Jensen et al., Science Signaling , 2008
  • 70. common basis
  • 71. Miller, Jensen et al., Science Signaling , 2008
  • 72. automated pipeline
  • 73. compilation of datasets
  • 74. classification vs. prediction
  • 75. Miller, Jensen et al., Science Signaling , 2008
  • 76. homology reduction
  • 77. Miller, Jensen et al., Science Signaling , 2008
  • 78. training and evaluation
  • 79. cross-validation
  • 80. Miller, Jensen et al., Science Signaling , 2008
  • 81. classifier selection
  • 82. Miller, Jensen et al., Science Signaling , 2008
  • 83. motif atlas
  • 84.  
  • 85. 179 kinases
  • 86. 93 SH2 domains
  • 87. 8 PTB domains
  • 88. BRCT domains
  • 89. WW domains
  • 90. 14-3-3 proteins
  • 91. phosphatases
  • 92. model organisms
  • 93. S. cerevisiae
  • 94. D. melanogaster
  • 95. C. elegans
  • 96. biological insights
  • 97. docking domains
  • 98. Miller, Jensen et al., Science Signaling , 2008
  • 99. disease-related kinases
  • 100. Miller, Jensen et al., Science Signaling , 2008
  • 101. predictive power
  • 102. ROC curves
  • 103. Miller, Jensen et al., Science Signaling , 2008
  • 104. comparison
  • 105. Miller, Jensen et al., Science Signaling , 2008
  • 106. conclusions
  • 107. data collection
  • 108. automation
  • 109. benchmarking
  • 110. homology reduction!
  • 111. Part II association networks
  • 112. STRING
  • 113. Jensen, Kuhn et al., Nucleic Acids Research , 2009
  • 114. functional associations
  • 115. data integration
  • 116. common basis
  • 117. 630 genomes
  • 118. model organism databases
  • 119. Ensembl
  • 120. RefSeq
  • 121. genomic context methods
  • 122. gene fusion
  • 123. Korbel et al., Nature Biotechnology , 2004
  • 124. conserved neighborhood
  • 125. operons
  • 126. Korbel et al., Nature Biotechnology , 2004
  • 127. bidirectional promoters
  • 128. Korbel et al., Nature Biotechnology , 2004
  • 129. phylogenetic profiles
  • 130. Korbel et al., Nature Biotechnology , 2004
  • 131. primary experimental data
  • 132. protein interactions
  • 133. yeast two-hybrid
  • 134. affinity purification
  • 135. fragment complementation
  • 136. Jensen & Bork, Science , 2008
  • 137. genetic interactions
  • 138. Beyer et al., Nature Reviews Genetics , 2007
  • 139. BIND Biomolecular Interaction Network Database
  • 140. BioGRID General Repository for Interaction Datasets
  • 141. DIP Database of Interacting Proteins
  • 142. IntAct
  • 143. MINT Molecular Interactions Database
  • 144. HPRD Human Protein Reference Database
  • 145. PDB Protein Data Bank
  • 146. inferred associations
  • 147. gene coexpression
  • 148.  
  • 149. GEO Gene Expression Omnibus
  • 150. expression compendia
  • 151. curated knowledge
  • 152. complexes
  • 153. MIPS Munich Information center for Protein Sequences
  • 154. Gene Ontology
  • 155. pathways
  • 156. Letunic & Bork, Trends in Biochemical Sciences , 2008
  • 157. KEGG Kyoto Encyclopedia of Genes and Genomes
  • 158. MetaCyc
  • 159. Reactome
  • 160. PID NCI-Nature Pathway Interaction Database
  • 161. literature mining
  • 162. M EDLINE
  • 163. SGD Saccharomyces Genome Database
  • 164. The Interactive Fly
  • 165. OMIM Online Mendelian Inheritance in Man
  • 166. co-mentioning
  • 167. statistical methods
  • 168. NLP Natural Language Processing
  • 169.
    • Gene and protein names
    • Cue words for entity recognition
    • Verbs for relation extraction
    • [ nxgene The GAL4 gene ]
    • [ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ]
  • 170.  
  • 171. easy in theory …
  • 172. … but not in practice
  • 173. different formats
  • 174. parsers
  • 175. different identifiers
  • 176. thesaurus
  • 177. redundant sources
  • 178. book keeping
  • 179. variable quality
  • 180. raw quality scores
  • 181. reproducibility
  • 182. von Mering et al., Nucleic Acids Research , 2005
  • 183. benchmarking
  • 184. von Mering et al., Nucleic Acids Research , 2005
  • 185. spread over 630 genomes
  • 186. transfer by orthology
  • 187. von Mering et al., Nucleic Acids Research , 2005
  • 188. two modes
  • 189. COG mode
  • 190. von Mering et al., Nucleic Acids Research , 2005
  • 191. protein mode
  • 192. von Mering et al., Nucleic Acids Research , 2005
  • 193. combine all evidence
  • 194. visualize
  • 195. Frishman et al., Modern Genome Annotation , 2009
  • 196. STITCH
  • 197.  
  • 198. metabolite–enzyme links
  • 199. pathway databases
  • 200. Letunic & Bork, Trends in Biochemical Sciences , 2008
  • 201. drug–target links
  • 202. Drugbank
  • 203. PDSP K i
  • 204. MATADOR
  • 205. Campillos & Kuhn et al., Science , 2008
  • 206. chemical–chemical links
  • 207. shared targets
  • 208. fingerprint similarity
  • 209. chemical–protein network
  • 210.  
  • 211. conclusions
  • 212. more data is better
  • 213. quality scores
  • 214. benchmarking
  • 215. cross-species integration
  • 216. Part III putting it all together
  • 217. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 218. NetworKIN
  • 219.  
  • 220. benchmarking
  • 221. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 222. 2.5-fold better accuracy
  • 223. context is crucial
  • 224. localization
  • 225. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 226. DNA damage response
  • 227. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 228. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 229. small-scale validation
  • 230. ATM phosphorylates Rad50
  • 231. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 232. Cdk1 phosphorylates 53BP1
  • 233. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 234. high-throughput validation
  • 235. multiple reaction monitoring
  • 236. Linding, Jensen, Ostheimer et al., Cell , 2007
  • 237. systematic validation
  • 238. kinase inhibitor matrix
  • 239. Fedorov et al., PNAS , 2007
  • 240. design optimal experiments
  • 241. integration with literature
  • 242. Reflect
  • 243.  
  • 244.  
  • 245.  
  • 246. conclusions
  • 247. complementary data
  • 248. visualization
  • 249. a good question
  • 250.  
  • 251. Acknowledgments
    • NetworKIN.info
      • Rune Linding
      • Gerard Ostheimer
      • Francesca Diella
      • Karen Colwill
      • Jing Jin
      • Pavel Metalnikov
      • Vivian Nguyen
      • Adrian Pasculescu
      • Jin Gyoon Park
      • Leona D. Samson
      • Rob Russell
      • Peer Bork
      • Michael Yaffe
      • Tony Pawson
    • STITCH.embl.de
      • Michael Kuhn
      • Christian von Mering
      • Monica Campillos
      • Peer Bork
    • NetPhorest.info
      • Martin Lee Miller
      • Francesca Diella
      • Claus Jørgensen
      • Michele Tinti
      • Lei Li
      • Marilyn Hsiung
      • Sirlester A. Parker
      • Jennifer Bordeaux
      • Thomas Sicheritz-Pontén
      • Marina Olhovsky
      • Adrian Pasculescu
      • Jes Alexander
      • Stefan Knapp
      • Nikolaj Blom
      • Peer Bork
      • Shawn Li
      • Gianni Cesareni
      • Tony Pawson
      • Benjamin E. Turk
      • Michael B. Yaffe
      • Søren Brunak
    • STRING.embl.de
      • Christian von Mering
      • Michael Kuhn
      • Manuel Stark
      • Samuel Chaffron
      • Philippe Julien
      • Tobias Doerks
      • Jan Korbel
      • Berend Snel
      • Martijn Huynen
      • Peer Bork
    • Reflect.ws
      • Sean O’Donoghue
      • Evangelos Pafilis
      • Heiko Horn
      • Michael Kuhn
      • Nigel Brown
      • Reinhardt Schneider
  • 252.