1. Issues for metabolomics and
systems biology
Douglas Kell
School of Chemistry, University of Manchester,
MANCHESTER M60 1QD, U.K.
dbk@manchester.ac.uk
http://dbkgroup.org/
http://www.mib.ac.uk www.mcisb.org
3. Some facts I ‘know’ (i.e. think I can
remember…)
• Epidemiologically, statins enhance
longevity
• Cholesterol is barely a risk factor when
within the normal range of 120-240 mg%
• Statins supposedly act (only) via HMG-
CoA reductase to lower cholesterol
• Actually many have (and from the above
logically must have) off-target effects
4. More ‘facts’
• Although originating as natural products,
many/most statins can bear comparatively little
structural relationships to them or to each other
• Are there QSAR-type relations between the
various off-target effects and the drugs that cause
them?
lovastatin atorvastatin
5. The software tool I want would
integrate all of those questions by:
• Finding the facts from the literature (and the
Web) by reading the articles ‘intelligently’
• Displaying and setting out the facts sensibly
• Allowing the QSARs directly from the papers
as the structures and substructures would be
‘known’ (or knowlable via PubChem,
DrugBank etc)
• Classify/cluster the off-target effects and the
papers that described them (via TM and ML)
• Without me having to write any actual code
6. Westerhoff & Palsson NBT 22, 1249-52 (2004)
But despite everything science is in some ways
becoming LESS effective in an applied context
8. Drug Discovery/Development Pipeline
• Multifaceted, complicated, lengthy process
gyy
o llo g
o
yy co
ac
g y
y et
ffe t rrm
m
a
ll llo g
a o o Sa
ll S
a
haa
c a co P hy
niic a c
lii n ma ca
iic a P y
all fett
cl a
--c arrm inn iic a a fe
clli
--c
c
iin S a
N
e h
e n S
Prr P h e l
HO OH
e Cl &
OH HO N
Prr
OH OH O
P P
O O
C &
O
O N O-
P
O O
N
O H
Products
F
OH
O
O
NH2
OH
N N NHCH3
N N O HN N
N
H 2N O
C O 2H
N
N N S
F O
O 2S
N
N N
N CF3
F
Cl
Cl
Discovery Exploratory Development
Exploratory Development Full Development
Cl
Discovery Full Development
O O
CH 3O O
N
H O
NH2
Phase II
Phase Phase II
Phase II Phase III
Phase III
0 5 10 15
Idea 12 -15 Years Drug
Peter S. Dragovich, Pfizer
10. Issues of attrition
• PK/PD less of an issue in last decade
• Now mostly due to (i) lack of efficacy, (ii) toxicity
• Both problems are underpinned by the fact that
drugs are typically first developed on the basis of
molecular assays before being tested in the intact
system
• These failures turn drug discovery – if it was not
already – into a problem of systems biology
12. Poor correlation between different artificial
membrane (Corti & PAMPA) assays
Corti et al EJ Pharm Sci 7, 354-362 (2006)
13. Poor correlation between Caco-2 cells and
artificial membrane (PAMPA) assays
Note axis scales
Balimane et al., AAPS J 8, e1-e13 (2006)
14. Poor relationship between PAMPA
permeability and log Ko/w
Corti et al. EJ Pharm Sci 7, 354-362 (2006)
15. Poor relationship between Caco-2
permeability and log Ko/w
r2 = 0.097
THESE THEORIES OF DRUG UPTAKE
WERE BIOPHYSICAL, ‘LIPID-ONLY’
THEORIES
Corti et al. EJ Pharm Sci 7, 354-362 (2006)
16. Narcotics (‘general anaesthetics’)
• Potency also correlates with log P (up to a cut-
off) (Meyer & Overton)
• Negligible structure-activity relationships
• Was assumed that they also act by a
‘biophysical’ mechanism by partitioning
‘nonspecifically’ into membrane and e.g.
‘squeezing’ nerve channels
• This too was a ‘lipid-only’ theory
• None of this now stands up
17. Anaesthetic potency does largely correlate with
partitioning into membrane, suggesting (to
many) a ‘lipid-only’ mechanism
P. Seeman, Pharmacol Rev 24, 583-655 (1972)
18. But…narcotics inhibit luciferase, a soluble
protein, with the same potency with which
they anaesthetise animals, over 5 logs!
No lipid involved!
Franks & Lieb, Nature 310, 599-601 (1984)
19. The structural basis is known
Binding of
bromoform to
luciferase
Franks et al, Biophys J 75, 2205-11 (1998)
24. There is a convergence between systems
biology models from whole-genome
reconstruction and the number of
experimental metabolome peaks (ca 3000
for human serum)
25. The human metabolic network (1)
• 8 cellular compartments
• 2,712 compartment-specific metabolites
• ~ 1,500 different chemical entities
• 1,496 genes
• 2,233 metabolic reactions (1,795 unique)
• 1,078 transport reactions (32.6%)
PNAS 104, 1777-1782 (2007)
26. The human metabolic network (2)
• Not yet compartmentalised
• 2,823 reactions (incl 300 ‘orphans’), of which 2,215
have disease associations, plus 1189 transport
reactions and 457 exchange reactions
• 2,322 genes (1069 common with Palsson model)
Molecular Systems Biology 3, 135 (2007)
29. VISUALISE edit Literature mining
create
Layouts and views
SBGN Store in dB
BIOCHEMICAL
Overlays, dynamics
MODEL (assumed to
Model merging: (not) Compare with
LEGO blocks be in SBML) other models
Cheminformatic
THERE
analyses
ARE MANYRun, analyse THINGS THAT ONEfit to real
POSSIBLE Compare with and
MIGHT DO WITH THIS REPRESENTATION, AND and
(sensitivities, etc) data (parameters
variables) with constraints
THESE ACTIONS CAN BE SEEN AS MODULES
Integrate various
levels Store results of
manipulations
How to deal with fitting,
including as f(global
LINK WORKFLOWS Network Motif
parameters like pH)
discovery
Soaplab, Taverna,
Automatic characterisation
Web services, etc. Optimal DoE for of parameter space and
Sys Identification, constraint checking
incl identifiability
30. BIOCHEMICAL
MODEL (assumed to
be in SBML)
FEBS J 274, 5576-5585 Compare with and fit to real
data (parameters and
variables) with constraints
4, 74-97
31. The Data Management Infrastructure of
the Manchester Centre for Integrated
Systems Biology
Norman Paton
University of Manchester
32. Capabilities
• We require software to support:
– Data capture: Pedro.
– Data access: Pierre.
– Integration of data and analyses: Taverna.
34. LITERATURE STORE NEW
MINING ANNOTATE MODEL IN DB
CREATE VISUALISE
MODEL COMPARE WITH
‘REAL’ DATA
METABOLIC
MODEL IN SBML
SCAN
RUN BASE MODEL PARAMETER
STORE SPACE
MODEL IN
DB SENSITIVITY
ANALYSES
COMPARE
DIFFERENT
MODELS
METABOLIC
MODEL IN SBML STORE
DIFFERENCES AS
NEW MODEL IN
SYSTEMS BIOLOGY WORKFLOWS DB
35. Scientists Decoupled
suppliers &
consumers
tion
ora
o llab
C
dge t
wle en
n o em
K ag
M an
Science
36. ‘Warehouse’ vs distributed workflows
• Different ‘modules’ developed in different labs can reside
on different computers anywhere, and expose themselves as
Web Services
• Labs can then specialise in what they are best at
• All that is then needed is an environment for enacting
bioinformatic workflows by coupling together these service-
oriented architectures
• One such is Taverna
• This is arguably the best way to combine metabolomic
SBML models with metabolomic data, and is what are
using at MCISB
37. Overall Architecture
Workflow
Repository
Data Analysis1
Integration
Model
Using
Workflows Repository
Analysisn
Consistent Web Service Interfaces
Repository1 … Repositoryn
Experiment1 … Experimentn
Consistent Web
Interfaces
38. The Taverna API consumer along with
libSBML allows many of these
transformations to be performed
Details: http://www.mcisb.org/software/taverna/libsbml/index.html
39. Relating Models to Expression
Read gene
names of
enzymes from
SBML model
Query maxd
transcriptome
database using
gene names
Create new
Compute colour SBMLmodel
for expression
readings
41. Potential Solutions
• Semantic annotation
• Chemical and bio-text mining
• RDF annotations – that can also be included
within the SBML
• Integrated reasoning engine
• Allowing literature-based discovery
• But we still lack a proper and useful
(bio)chemical ontology integrating roles,
pathways, diseases, chemical (sub)structures,
targets, etc.
• This last is probably the most damaging lack
and thus most important need
42. Issues for metabolomics and
systems biology
Douglas Kell
School of Chemistry, University of Manchester,
MANCHESTER M60 1QD, U.K.
dbk@manchester.ac.uk
http://dbkgroup.org/
http://www.mib.ac.uk www.mcisb.org