Network integration of heterogeneous data Lars Juhl Jensen EMBL Heidelberg
association networks
 
STRING
 
STITCH
 
373 genomes
 
model organism databases
Ensembl
Genome Reviews
RefSeq
genomic context methods
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
conserved neighborhood
operons
 
bidirectional promoters
 
gene fusion
 
primary experimental data
expression profiles
 
GEO Gene Expression Omnibus
expression compendia
protein interactions
yeast two-hybrid
 
affinity purification
 
genetic interactions
synthetic lethality
 
BioGRID General Repository for Interaction Datasets
IntAct
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
BIND Biomolecular Interaction Network Database
HPRD Human Protein Reference Database
literature mining
 
co-mentioning
statistical methods
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
good synonyms list
manual curation
orthographic variation
disambiguation
curated knowledge
complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathways
 
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
PID NCI-Nature Pathway Interaction Database
STKE Signal Transduction Knowledge Environment
variable reliability
raw quality scores
conservation
 
 
reproducibility
 
 
not comparable
benchmarking
calibrate vs. gold standard
 
probabilistic scores
combine all evidence
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
spread over many species
transfer by orthology
 
two modes
COG mode
 
 
protein mode
 
 
signaling network
NetworKIN
 
NetPhorest
 
phosphoproteomics
mass spectrometry
 
in vivo  phosphosites
kinases are unknown
computational methods
sequence motifs
 
kinase families
overprediction
context
localization
expression
co-activators
scaffolders
association networks
 
the idea
 
NetworKIN
coverage
69 kinases
 
benchmarking
 
small-scale validation
ATM phosphorylates Rad50
 
Cdk1 phosphorylates 53BP1
 
high-throughput validation
multiple reaction monitoring
 
the future
more sequence motifs
NetPhorest
data organization
 
selection
 
benchmarking
 
179 kinases
89 SH2 domains
8 PTB domains
upstream signaling
downstream signaling
signaling pathways
Acknowledgments STRING & STITCH Christian von Mering Michael Kuhn Manuel Stark Samuel Chaffron Philippe Julien Tobias Doerks Jan Korbel Berend Snel Martijn Huynen Peer Bork Literature mining Evangelos Pafilis Jasmin Saric Rossitza Ouzounova Sean O’Donoghue Isabel Rojas NetworKIN & NetPhorest Rune Linding Martin Lee Miller Gerard Ostheimer Francesca Diella Karen Colwill Jing Jin Pavel Metalnikov Vivian Nguyen Adrian Pasculescu Jin Gyoon Park Leona D. Samson Nikolaj Blom Rob Russell Peer Bork Søren Brunak Michael Yaffe Tony Pawson
http://larsjuhljensen.wordpress.com

Network integration of heterogeneous data