Your SlideShare is downloading. ×
Data integration: The STITCH database of protein-small molecule interactions
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data integration: The STITCH database of protein-small molecule interactions

756
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
756
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • This is a conservative estimate based only on what is in PubMed
    Too much to read!
    Text mining used to extract relations
    Similar methods used to mine medical records and link diseases
  • Transcript

    • 1. Data integration The STITCH database of protein–small molecule interactions Lars Juhl Jensen
    • 2. guilt by association
    • 3. functional associations
    • 4. Kuhn et al., Nucleic Acids Research, 2010
    • 5. parts lists
    • 6. >2.5 million proteins
    • 7. 630 genomes
    • 8. many databases
    • 9. different formats
    • 10. model organism databases
    • 11. Ensembl
    • 12. RefSeq
    • 13. PubChem compounds
    • 14. >74,000 small molecules
    • 15. genomic context
    • 16. gene fusion
    • 17. Korbel et al., Nature Biotechnology, 2004
    • 18. conserved neighborhood
    • 19. operons
    • 20. Korbel et al., Nature Biotechnology, 2004
    • 21. bidirectional promoters
    • 22. Korbel et al., Nature Biotechnology, 2004
    • 23. phylogenetic profiles
    • 24. Korbel et al., Nature Biotechnology, 2004
    • 25. interaction data
    • 26. protein–small molecule
    • 27. in vitro binding assays
    • 28. protein–protein
    • 29. yeast two-hybrid
    • 30. affinity purification
    • 31. fragment complementation
    • 32. Jensen & Bork, Science, 2008
    • 33. genetic interactions
    • 34. Beyer et al., Nature Reviews Genetics, 2007
    • 35. gene coexpression
    • 36. many databases
    • 37. BindingDB
    • 38. CTD Comparative Toxicogenomics Database
    • 39. DrugBank
    • 40. GLIDA GPCR-Ligand Database
    • 41. PDSP Ki Psycoactive Drug Screening Program
    • 42. PharmGKB Pharmacogenomics Knowledge Base
    • 43. BIND Biomolecular Interaction Network Database
    • 44. BioGRID General Repository for Interaction Datasets
    • 45. DIP Database of Interacting Proteins
    • 46. IntAct
    • 47. MINT Molecular Interactions Database
    • 48. HPRD Human Protein Reference Database
    • 49. PDB Protein Data Bank
    • 50. GEO Gene Expression Omnibus
    • 51. different formats
    • 52. different identifiers
    • 53. partially redundant
    • 54. curated knowledge
    • 55. complexes
    • 56. pathways
    • 57. Letunic & Bork, Trends in Biochemical Sciences, 2008
    • 58. high confidence
    • 59. many databases
    • 60. MIPS Munich Information center for Protein Sequences
    • 61. Gene Ontology
    • 62. KEGG Kyoto Encyclopedia of Genes and Genomes
    • 63. MetaCyc
    • 64. PID NCI-Nature Pathway Interaction Database
    • 65. Reactome
    • 66. different formats
    • 67. different identifiers
    • 68. partially redundant
    • 69. text mining
    • 70. >10 km
    • 71. human readable
    • 72. not computer readable
    • 73. different names
    • 74. Reflect
    • 75. dictionary
    • 76. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
    • 77. text corpus
    • 78. MEDLINE
    • 79. SGD Saccharomyces Genome Database
    • 80. The Interactive Fly
    • 81. OMIM Online Mendelian Inheritance in Man
    • 82. co-mentioning
    • 83. NLP Natural Language Processing
    • 84. integration
    • 85. many data types
    • 86. not comparable
    • 87. variable quality
    • 88. spread over 630 genomes
    • 89. quality scores
    • 90. reproducibility
    • 91. von Mering et al., Nucleic Acids Research, 2005
    • 92. intergenic distances
    • 93. Korbel et al., Nature Biotechnology, 2004
    • 94. benchmarking
    • 95. calibrate vs. gold standard
    • 96. von Mering et al., Nucleic Acids Research, 2005
    • 97. raw quality scores
    • 98. probabilistic scores
    • 99. orthology transfer
    • 100. von Mering et al., Nucleic Acids Research, 2005
    • 101. combine all evidence
    • 102. Acknowledgments Damian Szklarczyk Andrea Franceschini Michael Kuhn Sune Frankild Heiko Horn Evangelos Pafilis Milan Simonovic Alexander Roth Pablo Minguez Tobias Doerks Jean Muller Manuel Stark Samuel Chaffron Chris Creevey Philippe Julien Jan Korbel Berend Snel Martijn Huynen Reinhardt Schneider Sean O’Donoghue Christian von Mering Peer Bork
    • 103. Predicting novel targets for existing drugs using side effect information Lars Juhl Jensen
    • 104. the problem
    • 105. new uses for old drugs
    • 106. drug–drug network
    • 107. shared target(s)
    • 108. chemical similarity
    • 109. Campillos & Kuhn et al., Science, 2008
    • 110. Campillos & Kuhn et al., Science, 2008
    • 111. similar drugs share targets
    • 112. only trivial predictions
    • 113. the idea
    • 114. chemical perturbations
    • 115. phenotypic readouts
    • 116. drug treatment
    • 117. side effects
    • 118. the hard work
    • 119. information on side effects
    • 120. no database
    • 121. package inserts
    • 122. Campillos & Kuhn et al., Science, 2008
    • 123. text mining
    • 124. side-effect ontology
    • 125. backtracking
    • 126. Campillos & Kuhn et al., Science, 2008
    • 127. manual validation
    • 128. SIDER Kuhn et al., Molecular Systems Biology, 2010
    • 129. side-effect correlations
    • 130. Campillos & Kuhn et al., Science, 2008
    • 131. GSC weighting
    • 132. side-effect frequencies
    • 133. Campillos & Kuhn et al., Science, 2008
    • 134. raw similarity score
    • 135. Campillos & Kuhn et al., Science, 2008
    • 136. p-values
    • 137. Campillos & Kuhn et al., Science, 2008
    • 138. side-effect similarity
    • 139. chemical similarity
    • 140. Campillos & Kuhn et al., Science, 2008
    • 141. confidence scores
    • 142. reference set
    • 143. incomplete databases
    • 144. text mining
    • 145. manual validation
    • 146. MATADOR Günther et al., Nucleic Acids Research, 2008
    • 147. Campillos & Kuhn et al., Science, 2008
    • 148. the results
    • 149. drug–drug network
    • 150. Campillos & Kuhn et al., Science, 2008
    • 151. categorization
    • 152. Campillos & Kuhn et al., Science, 2008
    • 153. 20 drug–drug pairs
    • 154. in vitro binding assays
    • 155. Ki<10 µM for 11 of 20
    • 156. cell assays
    • 157. 9 of 9 showed activity
    • 158. the future
    • 159. link side-effects to targets
    • 160. direct target prediction
    • 161. Acknowledgments Monica Campillos Michael Kuhn Anne-Claude Gavin Peer Bork
    • 162. larsjuhljensen