Network biology
Large-scale data and text mining
Lars Juhl Jensen
guilt by association
protein networks
STRING
computational predictions
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
protein interactions
Jensen & Bork, Science, 2008
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
not same species
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
homology-based transfer
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
CDC2
cyclin dependent kinase 1
expansion rules
hCdc2
CDC2
flexible matching
cyclin-dependent kinase 1
cyclin dependent kinase 1
“black list”
SDS
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
O’Donoghue et al., Journal of Web Semantics, 2010
information extraction
co-mentioning
within documents
within paragraphs
within sentences
text corpus
~22 million abstracts
no access
millions of full-text articles
localization and disease
general approach
COMPARTMENTS
TISSUES
DISEASES
curated knowledge
experimental data
text mining
computational predictions
common identifiers
quality scores
visualization
compartments.jensenlab.org
tissues.jensenlab.org
dissemination
web interfaces
web services
diseases.jensenlab.org
bulk download
Acknowledgments
STRING
Christian von
Mering
Damian
Szklarczyk
Michael Kuhn
Manuel Stark
Samuel Chaffron
Chris Creevey
Jean Muller
Tobias Doerks
Philippe Julien
Alexander Roth
Milan Simonovic
Jan Korbel
Berend Snel
Martijn Huynen
Peer Bork
Text
mining
Sune Frankild
Evangelos Pafilis
Kalliopi Tsafou
Alberto Santos
Janos Binder
Heiko Horn
Michael Kuhn
Nigel Brown
Reinhardt Schneider
Sean O’ Donoghue

Network biology: Large-scale data and text mining