Data integration The STITCH database of protein–small molecule interactions Lars Juhl Jensen
Kuhn et al.,  Nucleic Acids Research , 2010
functional associations
protein–small molecule
protein–protein
parts lists
>2.5 million proteins
630 genomes
many databases
different formats
model organism databases
Ensembl
RefSeq
PubChem compounds
>74,000 small molecules
curated knowledge
complexes
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
high confidence
many databases
MIPS Munich Information center for Protein Sequences
Gene Ontology
KEGG Kyoto Encyclopedia of Genes and Genomes
MetaCyc
PID NCI-Nature Pathway Interaction Database
Reactome
different formats
different identifiers
partially redundant
interaction data
protein–small molecule
in vitro  binding assays
protein–protein
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
gene coexpression
 
many databases
BindingDB
CTD Comparative Toxicogenomics Database
DrugBank
GLIDA GPCR-Ligand Database
PDSP K i Psycoactive Drug Screening Program
PharmGKB Pharmacogenomics Knowledge Base
BIND Biomolecular Interaction Network Database
BioGRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
HPRD Human Protein Reference Database
PDB Protein Data Bank
GEO Gene Expression Omnibus
different formats
different identifiers
partially redundant
literature mining
>10 km
human readable
not computer readable
different names
text corpus
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
dictionary
co-mentioning
NLP Natural Language Processing
 
restricted access
genomic context
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
conserved neighborhood
operons
Korbel et al.,  Nature Biotechnology , 2004
bidirectional promoters
Korbel et al.,  Nature Biotechnology , 2004
phylogenetic profiles
Korbel et al.,  Nature Biotechnology , 2004
integration
many data types
not comparable
variable quality
spread over 630 genomes
quality scores
reproducibility
von Mering et al.,  Nucleic Acids Research , 2005
intergenic distances
Korbel et al.,  Nature Biotechnology , 2004
benchmarking
calibrate vs. gold standard
von Mering et al.,  Nucleic Acids Research , 2005
raw quality scores
probabilistic scores
orthology transfer
von Mering et al.,  Nucleic Acids Research , 2005
combine all evidence
 
Acknowledgments Michael Kuhn Monica Campillos Christian von Mering Manuel Stark Samuel Chaffron Philippe Julien Tobias Doerks Jan Korbel Berend Snel Martijn Huynen Peer Bork
Predicting novel targets for existing drugs using side effect information Lars Juhl Jensen
the problem
new uses for old drugs
drug–drug network
shared target(s)
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
similar drugs share targets
only trivial predictions
the idea
chemical perturbations
phenotypic readouts
drug treatment
side effects
the implementation
information on side effects
package inserts
Campillos & Kuhn et al.,  Science , 2008
text mining
side-effect ontology
backtracking
Campillos & Kuhn et al.,  Science , 2008
side-effect correlations
Campillos & Kuhn et al.,  Science , 2008
GSC weighting
side-effect frequencies
Campillos & Kuhn et al.,  Science , 2008
raw similarity score
Campillos & Kuhn et al.,  Science , 2008
p-values
Campillos & Kuhn et al.,  Science , 2008
side-effect similarity
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
reference set
drug–target pairs
Campillos & Kuhn et al.,  Science , 2008
drug–drug pairs
score bins
benchmark
Campillos & Kuhn et al.,  Science , 2008
fit calibration function
Campillos & Kuhn et al.,  Science , 2008
probabilistic scores
the results
drug–drug network
ATC codes
Campillos & Kuhn et al.,  Science , 2008
categorization
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
map onto score space
Campillos & Kuhn et al.,  Science , 2008
the experiments
20 drug–drug relations
in vitro  binding assays
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
Campillos & Kuhn et al.,  Science , 2008
K i <10 µM for 11 of 20
cell assays
Campillos & Kuhn et al.,  Science , 2008
9 of 9 showed activity
the future
SIDER
integration with STITCH
Acknowledgments Monica Campillos Michael Kuhn Anne-Claude Gavin Peer Bork
larsjuhljensen

Data integration: The STITCH database of protein–small molecule interactions

Editor's Notes

  • #63 This is a conservative estimate based only on what is in PubMed Too much to read! Text mining used to extract relations Similar methods used to mine medical records and link diseases