Data integration The STITCH database of protein–small molecule interactions Lars Juhl Jensen
Kuhn et al.,  Nucleic Acids Research , 2010
functional associations
protein–small molecule
protein–protein
parts lists
>2.5 million proteins
630 genomes
many databases
different formats
model organism databases
Ensembl
RefSeq
PubChem compounds
>74,000 small molecules
curated knowledge
complexes
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
high confidence
many databases
MIPS Munich Information center for Protein Sequences
Gene Ontology
KEGG Kyoto Encyclopedia of Genes and Genomes
MetaCyc
PID NCI-Nature Pathway Interaction Database
Reactome
different formats
different identifiers
partially redundant
interaction data
protein–small molecule
in vitro  binding assays
protein–protein
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
gene coexpression
 
many databases
BindingDB
CTD Comparative Toxicogenomics Database
DrugBank
GLIDA GPCR-Ligand Database
PDSP K i Psycoactive Drug Screening Program
PharmGKB Pharmacogenomics Knowledge Base
BIND Biomolecular Interaction Network Database
BioGRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
HPRD Human Protein Reference Database
PDB Protein Data Bank
GEO Gene Expression Omnibus
different formats
different identifiers
partially redundant
literature mining
>10 km
human readable
not computer readable
different names
text corpus
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
dictionary
co-mentioning
NLP Natural Language Processing
 
restricted access
genomic context
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
conserved neighborhood
operons
Korbel et al.,  Nature Biotechnology , 2004
bidirectional promoters
Korbel et al.,  Nature Biotechnology , 2004
phylogenetic profiles
Korbel et al.,  Nature Biotechnology , 2004
integration
many data types
not comparable
variable quality
spread over 630 genomes
quality scores
reproducibility
von Mering et al.,  Nucleic Acids Research , 2005
intergenic distances
Korbel et al.,  Nature Biotechnology , 2004
benchmarking
calibrate vs. gold standard
von Mering et al.,  Nucleic Acids Research , 2005
raw quality scores
probabilistic scores
orthology transfer
von Mering et al.,  Nucleic Acids Research , 2005
combine all evidence
 
Acknowledgments <ul><li>Michael Kuhn </li></ul><ul><li>Monica Campillos </li></ul><ul><li>Christian von Mering </li></ul><...
Predicting novel targets for existing drugs using side effect information Lars Juhl Jensen
the problem
new uses for old drugs
drug–drug network
shared target(s)
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
similar drugs share targets
only trivial predictions
the idea
chemical perturbations
phenotypic readouts
drug treatment
side effects
the implementation
information on side effects
package inserts
Campillos & Kuhn et al.,  Science , 2008
text mining
side-effect ontology
backtracking
Campillos & Kuhn et al.,  Science , 2008
side-effect correlations
Campillos & Kuhn et al.,  Science , 2008
GSC weighting
side-effect frequencies
Campillos & Kuhn et al.,  Science , 2008
raw similarity score
Campillos & Kuhn et al.,  Science , 2008
p-values
Campillos & Kuhn et al.,  Science , 2008
side-effect similarity
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
reference set
drug–target pairs
Campillos & Kuhn et al.,  Science , 2008
drug–drug pairs
score bins
benchmark
Campillos & Kuhn et al.,  Science , 2008
fit calibration function
Campillos & Kuhn et al.,  Science , 2008
probabilistic scores
the results
drug–drug network
ATC codes
Campillos & Kuhn et al.,  Science , 2008
categorization
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
map onto score space
Campillos & Kuhn et al.,  Science , 2008
the experiments
20 drug–drug relations
in vitro  binding assays
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
K i <10 µM for 11 of 20
cell assays
Campillos & Kuhn et al.,  Science , 2008
9 of 9 showed activity
the future
SIDER
integration with STITCH
Acknowledgments <ul><li>Monica Campillos </li></ul><ul><li>Michael Kuhn </li></ul><ul><li>Anne-Claude Gavin </li></ul><ul>...
larsjuhljensen
Upcoming SlideShare
Loading in …5
×

Data integration: The STITCH database of protein–small molecule interactions

1,327 views

Published on

Chemoinformatics Course, Technical University of Denmark, Lyngby, Denmark, November 19, 2009.

Published in: Technology, Spiritual
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,327
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This is a conservative estimate based only on what is in PubMed Too much to read! Text mining used to extract relations Similar methods used to mine medical records and link diseases
  • Data integration: The STITCH database of protein–small molecule interactions

    1. 1. Data integration The STITCH database of protein–small molecule interactions Lars Juhl Jensen
    2. 2. Kuhn et al., Nucleic Acids Research , 2010
    3. 3. functional associations
    4. 4. protein–small molecule
    5. 5. protein–protein
    6. 6. parts lists
    7. 7. >2.5 million proteins
    8. 8. 630 genomes
    9. 9. many databases
    10. 10. different formats
    11. 11. model organism databases
    12. 12. Ensembl
    13. 13. RefSeq
    14. 14. PubChem compounds
    15. 15. >74,000 small molecules
    16. 16. curated knowledge
    17. 17. complexes
    18. 18. pathways
    19. 19. Letunic & Bork, Trends in Biochemical Sciences , 2008
    20. 20. high confidence
    21. 21. many databases
    22. 22. MIPS Munich Information center for Protein Sequences
    23. 23. Gene Ontology
    24. 24. KEGG Kyoto Encyclopedia of Genes and Genomes
    25. 25. MetaCyc
    26. 26. PID NCI-Nature Pathway Interaction Database
    27. 27. Reactome
    28. 28. different formats
    29. 29. different identifiers
    30. 30. partially redundant
    31. 31. interaction data
    32. 32. protein–small molecule
    33. 33. in vitro binding assays
    34. 34. protein–protein
    35. 35. yeast two-hybrid
    36. 36. affinity purification
    37. 37. fragment complementation
    38. 38. Jensen & Bork, Science , 2008
    39. 39. genetic interactions
    40. 40. Beyer et al., Nature Reviews Genetics , 2007
    41. 41. gene coexpression
    42. 43. many databases
    43. 44. BindingDB
    44. 45. CTD Comparative Toxicogenomics Database
    45. 46. DrugBank
    46. 47. GLIDA GPCR-Ligand Database
    47. 48. PDSP K i Psycoactive Drug Screening Program
    48. 49. PharmGKB Pharmacogenomics Knowledge Base
    49. 50. BIND Biomolecular Interaction Network Database
    50. 51. BioGRID General Repository for Interaction Datasets
    51. 52. DIP Database of Interacting Proteins
    52. 53. IntAct
    53. 54. MINT Molecular Interactions Database
    54. 55. HPRD Human Protein Reference Database
    55. 56. PDB Protein Data Bank
    56. 57. GEO Gene Expression Omnibus
    57. 58. different formats
    58. 59. different identifiers
    59. 60. partially redundant
    60. 61. literature mining
    61. 62. >10 km
    62. 63. human readable
    63. 64. not computer readable
    64. 65. different names
    65. 66. text corpus
    66. 67. M EDLINE
    67. 68. SGD Saccharomyces Genome Database
    68. 69. The Interactive Fly
    69. 70. OMIM Online Mendelian Inheritance in Man
    70. 71. dictionary
    71. 72. co-mentioning
    72. 73. NLP Natural Language Processing
    73. 75. restricted access
    74. 76. genomic context
    75. 77. gene fusion
    76. 78. Korbel et al., Nature Biotechnology , 2004
    77. 79. conserved neighborhood
    78. 80. operons
    79. 81. Korbel et al., Nature Biotechnology , 2004
    80. 82. bidirectional promoters
    81. 83. Korbel et al., Nature Biotechnology , 2004
    82. 84. phylogenetic profiles
    83. 85. Korbel et al., Nature Biotechnology , 2004
    84. 86. integration
    85. 87. many data types
    86. 88. not comparable
    87. 89. variable quality
    88. 90. spread over 630 genomes
    89. 91. quality scores
    90. 92. reproducibility
    91. 93. von Mering et al., Nucleic Acids Research , 2005
    92. 94. intergenic distances
    93. 95. Korbel et al., Nature Biotechnology , 2004
    94. 96. benchmarking
    95. 97. calibrate vs. gold standard
    96. 98. von Mering et al., Nucleic Acids Research , 2005
    97. 99. raw quality scores
    98. 100. probabilistic scores
    99. 101. orthology transfer
    100. 102. von Mering et al., Nucleic Acids Research , 2005
    101. 103. combine all evidence
    102. 105. Acknowledgments <ul><li>Michael Kuhn </li></ul><ul><li>Monica Campillos </li></ul><ul><li>Christian von Mering </li></ul><ul><li>Manuel Stark </li></ul><ul><li>Samuel Chaffron </li></ul><ul><li>Philippe Julien </li></ul><ul><li>Tobias Doerks </li></ul><ul><li>Jan Korbel </li></ul><ul><li>Berend Snel </li></ul><ul><li>Martijn Huynen </li></ul><ul><li>Peer Bork </li></ul>
    103. 106. Predicting novel targets for existing drugs using side effect information Lars Juhl Jensen
    104. 107. the problem
    105. 108. new uses for old drugs
    106. 109. drug–drug network
    107. 110. shared target(s)
    108. 111. chemical similarity
    109. 112. Campillos & Kuhn et al., Science , 2008
    110. 113. Campillos & Kuhn et al., Science , 2008
    111. 114. similar drugs share targets
    112. 115. only trivial predictions
    113. 116. the idea
    114. 117. chemical perturbations
    115. 118. phenotypic readouts
    116. 119. drug treatment
    117. 120. side effects
    118. 121. the implementation
    119. 122. information on side effects
    120. 123. package inserts
    121. 124. Campillos & Kuhn et al., Science , 2008
    122. 125. text mining
    123. 126. side-effect ontology
    124. 127. backtracking
    125. 128. Campillos & Kuhn et al., Science , 2008
    126. 129. side-effect correlations
    127. 130. Campillos & Kuhn et al., Science , 2008
    128. 131. GSC weighting
    129. 132. side-effect frequencies
    130. 133. Campillos & Kuhn et al., Science , 2008
    131. 134. raw similarity score
    132. 135. Campillos & Kuhn et al., Science , 2008
    133. 136. p-values
    134. 137. Campillos & Kuhn et al., Science , 2008
    135. 138. side-effect similarity
    136. 139. chemical similarity
    137. 140. Campillos & Kuhn et al., Science , 2008
    138. 141. reference set
    139. 142. drug–target pairs
    140. 143. Campillos & Kuhn et al., Science , 2008
    141. 144. drug–drug pairs
    142. 145. score bins
    143. 146. benchmark
    144. 147. Campillos & Kuhn et al., Science , 2008
    145. 148. fit calibration function
    146. 149. Campillos & Kuhn et al., Science , 2008
    147. 150. probabilistic scores
    148. 151. the results
    149. 152. drug–drug network
    150. 153. ATC codes
    151. 154. Campillos & Kuhn et al., Science , 2008
    152. 155. categorization
    153. 156. Campillos & Kuhn et al., Science , 2008
    154. 157. Campillos & Kuhn et al., Science , 2008
    155. 158. Campillos & Kuhn et al., Science , 2008
    156. 159. map onto score space
    157. 160. Campillos & Kuhn et al., Science , 2008
    158. 161. the experiments
    159. 162. 20 drug–drug relations
    160. 163. in vitro binding assays
    161. 164. Campillos & Kuhn et al., Science , 2008
    162. 165. Campillos & Kuhn et al., Science , 2008
    163. 166. Campillos & Kuhn et al., Science , 2008
    164. 167. K i <10 µM for 11 of 20
    165. 168. cell assays
    166. 169. Campillos & Kuhn et al., Science , 2008
    167. 170. 9 of 9 showed activity
    168. 171. the future
    169. 172. SIDER
    170. 173. integration with STITCH
    171. 174. Acknowledgments <ul><li>Monica Campillos </li></ul><ul><li>Michael Kuhn </li></ul><ul><li>Anne-Claude Gavin </li></ul><ul><li>Peer Bork </li></ul>
    172. 175. larsjuhljensen

    ×