Information integration




      Lars Juhl Jensen
Part 1
the eukaryotic cell cycle
essential process
grow and divide
one cell
two cells
four phases
G1 phase
growth
S phase
DNA replication
G2 phase
growth
M phase
cell division
regulation
gene expression
phosphorylation
targeted degradation
protein interactions
Example 1
my protein and friends
http://string-db.org
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
Part 2
association networks
guild by association
STRING
>1100 genomes
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
conserved neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
protein interactions
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
gene coexpression
curated knowledge
Letunic & Bork, Trends in Biochemical Sciences, 2008
>10 km
text mining
co-mentioning
NLP
Natural Language Processing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction


[nxgene The GAL4 gene]


[nxexpr The expression of
        [nxgene the cytochrome genes
            [nxpg CYC1 and CYC7]]]
    is controlled by
    [nxpg HAP1]
different sources
different formats
different names
not comparable
variable quality
many parsers
comprehensive lexicon
quality scores
look at the data
von Mering et al., Nucleic Acids Research, 2005
scoring scheme
benchmark
von Mering et al., Nucleic Acids Research, 2005
probabilistic scores
combine scores
Example 2
evidence filters and viewers
highest confidence only
experiments only
evidence viewers
Part 3
analysis of cell-cycle data
gene expression
cell cultures
synchronization
microarrays
time courses
look at the data
Gauthier et al., Nucleic Acids Research, 2007
scoring scheme
benchmark
time of peak expression
protein interactions
temporal network
de Lichtenberg, Jensen et al., Science, 2005
Example 3
a network for my proteins
http://string-db.org
high confidence only
experiments only
network expansion
Part 4
external data
save network
open in Cytoscape
layout
clustering
project data onto network
de Lichtenberg, Jensen et al., Science, 2005
very flexible
lose the STRING interface
payload mechanism
show external data
nodes
edges
hosted on your server
Example 4
my data in STRING
http://cyclebase-string.jensenlab.org
Conclusions
know your question
collect data
look at the data
benchmark
Thank you!
larsjuhljensen
Information integration

Information integration