Introduction to STRING Lars Juhl Jensen EMBL Heidelberg
STRING
integrate diverse evidence
functional interactions
 
hundreds of proteomes
Ensembl
SWISS-PROT
prokaryotes
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
eukaryotes
data integration
 
curated knowledge
MIPS Munich Information center for Protein Sequences
Reactome
KEGG Kyoto Encyclopedia of Genes and Genomes
STKE Signal Transduction Knowledge Environment
literature mining
co-mentioning
NLP Natural Language Processing
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
primary experimental data
microarray expression data
GEO Gene Expression Omnibus
SMD Stanford Microarray Database
physical protein interactions
BIND Biomolecular Interaction Network Database
MINT Molecular Interactions Database
GRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
HPRD Human Protein Reference Database
problems
many sources
different gene identifiers
many types of evidence
questionable quality
not directly comparable
spread over many species
parsers
synonyms lists
quality scores
benchmarking
orthology
how is it actually done?
gene fusion
Find in  A  genes that match a the same gene in  B Exclude overlapping alignments Calibrate against KEGG  maps Calculate all-against-all pairwise alignments
gene neighborhood
Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species
phylogenetic profiles
Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps
literature co-occurrence
Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Infer associations in other species
physical interaction data
Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
calibrate against KEGG
 
transfer by orthology
 
orthologous groups
 
fuzzy orthology
? Source species Target species
combine all evidence
 
Acknowledgments The STRING team (EMBL) Christian von Mering Berend Snel Martijn Huynen Sean Hooper Samuel Chaffron Julien Lagarde Mathilde Foglierini Peer Bork Literature mining project (EML Research) Jasmin Saric Rossitza Ouzounova Isabel Rojas
Thank you!

Introduction to STRING