The STRING database Lars Juhl Jensen EMBL Heidelberg
data integration
 
functional interactions
 
179 proteomes
Ensembl
SWISS-PROT
genomic context methods
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
gene fusion
 
gene neighborhood
 
questionable reliability
raw quality scores
gene neighborhood
sum of intergenic distances
 
many types of evidence
raw quality scores
not directly comparable
benchmarking
calibrate against KEGG
 
curated knowledge
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
MIPS Munich Information center for Protein Sequences
STKE Signal Transduction Knowledge Environment
primary experimental data
many sources
parsers
co-expression
GEO Gene Expression Omnibus
SMD Stanford Microarray Database
physical protein interactions
BIND Biomolecular Interaction Network Database
MINT Molecular Interactions Database
GRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
HPRD Human Protein Reference Database
literature mining
different gene identifiers
synonyms lists
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxgene  The  GAL4   gene ] [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
 
combine all evidence
spread over many species
transfer by orthology
 
orthologous groups
 
fuzzy orthology
? Source species Target species
Bayesian scoring scheme
 
Acknowledgments The STRING team (EMBL) Christian von Mering Berend Snel Martijn Huynen Sean Hooper Samuel Chaffron Julien Lagarde Mathilde Foglierini Peer Bork Literature mining project (EML Research) Jasmin Saric Rossitza Ouzounova Isabel Rojas

The STRING database