Systems biology in polypharmacology:
predicting and explaining off-target
effects
Bourne lab at UCSD
Under the supervision of Pr. Bourne
Under the direction of Pr. Bart Deplancke
Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
Problem
Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191–200
Astra Zenca GlaxoSmithKline Sanofi Roche Holding AG Pfizer Inc
11.8 b$/drug 8.2 b$/drug 7.9 b$/drug 7.8 b$/drug 7.7 b$/drug
Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
Problem
95% remaining
leads fail here
>95% leads
fail here
Image Courtesy of Alzheimer’s Drug Discovery Foundation
One disease – one gene – one drug
● Step 1: find a gene relevant to a disease
● Step 2: design small molecule inhibitor for it
● Step 3: test it on cellular animal models
● Step 4: discover secondary effects or absence of
therapeutic effect
● Step 5: modify lead to control toxicity
● Repeat steps 3-5 until no more funds available
● If you are lucky, secondary effects are minor to absent
● get more funds and move to human trials.
● Pay attention to unexpected sec. effects
● Pay attention to absence of therapeutic effect in humans
Unexpected pharmacological effects
Absence of
therapeutic effect:
– Main cause of rational
drug desing failure in
the 90th
– Have been overcome
with better
understanding of
biolgy
Secondary effects:
– Cyt-c : well
understood and
controlled
– Unspecific binding:
Very frequent
Hard to predict
Hard to interpret
Polypharmacology
● Specific agonist / antagonist
design are rare:
● protein sites similarity
● catalytic sites within complexes
● Some drugs owe their
pharmacological action to
their unspecificity:
● Encaptone
● Ibogaine
● Chlorpromazine
● Kanamycin
Polypharmacology:
– Use computational methods to
predict all the targets a small
molecule is likely perturb
– Use systems biology to predict
consequences of such
perturbation
● Secondary effects
● Unexpected therapeutic effect
(repositioning)
● Unexpected absence of
therapeutic effect (animal model
– human difference)
Scope of master project
● Prediction of perturbed
targets set:
– Drugdesigntech since 2009
– Bourne lab since 2007-
2008
● Analysis and
interpretation of the
perturbed targets set is
still largely manual. The
goal of this master project
is to curb this.
Image courtesy of Xie et al. (2011). Drug discovery using chemical systems
biology: weak inhibition of multiple kinases may contribute to the anti-cancer
effect of nelfinavir. PLoS Computational Biology
Master project environment
– EPFL engineering internship
– Tool for integrative
bioinformatics platform
– Used for biotech consulting
(polypharmacological effect
prediction)
– > 1000 drugs and ~1300 human
proteins
The Bourne Lab
– PDB RSCB, Supertarget, IEDB,
BioLit
– Reliable pipeline for drug off-target
effect prediction (4530 protein
models, 140 approved drugs)
– 7 publications in polypharmacology
Polypharmacological action
mechanisms recovery
● Source:
– List of proteins perturbed by a drug
● Wanted:
– Mechanisms of unexpected pharmacological action,
understandable for a biologist
● Pathway, biological entities, mechanism names
● Ordered by relevance
– Unexpected pharmacological action mechanism
model, usable for prediction on new drugs
Rigid structure of
Interactions
= Interactome
Knowledge
access structure
= GO + pathway
names
Global idea
Global idea
Platelet activation
Immune response onset
Th17 activation
Polypharmacological
effect model suited for
prediction on new
drugs
Polypharmacological
effect mechanism
understandable for
biology expert
Devil is in the details
● How to retrieve relevant annotation and sort it
by relevance?
● How to determine which targets are to be
included in the model?
Missiuro's information flow and
protein informativity
Image courtesy of Missiuro et al. (2009). Information flow analysis of
interactome networks. PLoS Computational Biology 5, e1000350.
● Each protein transmits some information to
all the other protein within interactome / set
of interest (otherwise evolution would have
eliminated it)
● Information can only be transmitted
through direct interaction (contact, co-
complex, participation to the same
biochemical reaction)
● The information conductance of an edge is
proportional to the interaction importance
or confidence
● The information flow is computed between
all the pairs of protein within the set
(Kirchoff laws + matrix operations).
● Set-specific informativity score is defined
for each element of interactome as sum of
all pairwise information flows
If Time 1:
Math Behind Information Flow
● Kirchoff law:
– For each node, except for sink and source sum entering currents equal
exiting currents
– For each edge, V = I*R = I/G
● Conductance matrix M:
● Current vector J: Voltage vector V:
● Solve M*V=J; use V to determine information flow through each node
1
2
4
3
G2
G1 G3
G4
G1+
G2
-G1 -G2 0
-G1 G1+
G3
0 -G3
-G2 0 G2+
G4
-G4
0 -G3 -G4 G3+
G4
I1=1
I2=0
I3=0
I4=-1
V1
V2
V3
V4
Missiuro's information flow and
protein informativity
● Advantage over betweenness
degree and edge degree:
– recovers weak multi-hub regulators
– Better at predicting essential genes
– Better at predicting genes essential
for a specific function (organ
development)
● Advantage over stocheomtric
methods:
– No need to solve 64k differential
equations (unstable!)
– Reflects not only metabolism, but
also regulation
Image courtesy of Missiuro et al. (2009). Information flow analysis of
interactome networks. PLoS Computational Biology 5, e1000350.
Model creation
● Recover targets affected
by drugs with a given
polypharmacological
effect
● Compute the information
circulation within
interactome for these
drugs
● Include all the targets with
a significant informativity
=> “hidden” targets
Retrieve relevant annotation
Not all the GO terms
are equivalent
GO term informativity (~protein info for Missiuro et al.)
– Expand annotations:
T-cell apoptosis regulation → T cell + apoptosis +immune system +...
– Define term informativity:
– Use it to compute the flow through
each term in a pair of proteins:
Informativity = conductance
– Compute total informativity within a group as a sum
of flows through each term in each pair, decided by
targets number squared
InfTerm=
STotal
STerm
=
kb⋅log(NTot)
kb⋅log (NTerm)
NTot
NTerm
Total targets
Targets annotated with a given GO term
Same secondary effect might have
distinct mechanisms
● Cluster affected targets by
their annotation similarity
● Compute GO-based
information circulation
within each cluster and
sort GO terms by
informativity
● Use clusters as additional
polypharmacological
action models
Clustering
GO term informativity advantages
Map to the biological concepts
Interpretation by expert biologists => biological sense ?
(cf. Potti 2010 scandal at Duke over “metagene” signature)
Molecular relation databases typically do badly in
some cases:
Systemic effects (T-cell maturation, circadian rhythm, … )
Endocrine regulation
Central Nervous System (GO however isn't the best ontology
for this)
Ability to plug in additional data from literature
analysis (just account for confidence)
Implementation: case of pancreatitis
and cirrhosis
● Sec. Effects from SIDER (EMBL)
● Drug-target interaction from Bourne lab and
Drugdesigntech simulation results
● Group drugs by secondary effect
● Filter out targets that are frequently affected in
random drug collections (Student T-test)
Name Expected count
Non-random Count in random poll of random poll
PYGL_HUMAN 96.58% 4 0.7968 0.35122176
RHOC_HUMAN 95.77% 4 0.768 0.442368
GLTP_HUMAN 95.77% 4 0.768 0.442368
C43BP_HUMAN 95.77% 4 0.768 0.442368
FUT8_HUMAN 95.77% 4 0.768 0.442368
RET7_HUMAN 95.77% 4 0.768 0.442368
CP2E1_HUMAN 94.43% 5 1.4304 0.70953984
2ABA_HUMAN 93.88% 4 0.9984 0.55148544
AUHM_HUMAN 93.88% 4 0.9984 0.55148544
DX39B_HUMAN 93.66% 4 1.3536 0.44411904
NGF_HUMAN 93.49% 11 5.0112 2.33312256
NTRK1_HUMAN 93.49% 11 5.0112 2.33312256
KIF11_HUMAN 93.03% 5 1.5648 0.82308096
Proba StDev in case
Pancreatitis: clustering results
Clustering:
BEA at UCSD
1 major cluster
RHOC, NGF,
NTRK1
Cirrhosis: clustering results
Clustering:
HSC at Drugdesign tech
4 major clusters
1 of them (all 4 were
informative and relevant)
most informative of them:
KSYK_HUMAN,
CSK_HUMAN
Quantitative polypharmacological
effect prediction
Outline:
Compute the information
circulation for
pharmacological effect
specific targets
Measure dicrease of
information circulation
within the “all targets”
model and the “cluster”
models
Quantitative polypharmacological
effect prediction
Outline:
Compute the information
circulation for
pharmacological effect
specific targets
Measure dicrease of
information circulation
within the “all targets”
model and the “cluster”
models
Backbone for the interactome
information flow computation
● NIC-Nature Pathway Interaction Database
No, too small coverage
● Kegg Patwhay database
No, pathway-oriented and non-connex for atomic
interactions
● Unipathway
No, too small coverage
● Reactome.org
Yay
Reactome.org : idea
● Reactome.org structure:
– BioPax : xml / RDF / OWL
– Physical entities:
● Proteins, small molecules, Complexes, RNA, DNA
● Fragments of physical entities
– Interaction:
● Degradation / polymerisation / Biochemical reactions
● Molecular interaction
● Genetic interaction
– Pathways, Genes, Post-translational modifications...
Reactome.org : reality
● Reality of Reactome.org:
– Main connex element: ~ 22 000 entities, but 3 other
with >50 elements
– Presence of generic classes : groups of objects
– Proteins = mix between proteins, domains, groups,
groups of domains…
– 15 000 proteins, 5000 UNIPROT references
– 156 genes, 56 RNA molecules
translation / transcription regulation is not well described
Reactome.org: incompleteness
● Still incomplete and reliant on comments:
Case of SRC => HiNT database added
Verification of pipeline:
Information routing decay
Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial
metabolism. Molecular Systems Biology 6, 407.
Verfication of pipeline
Predicting target drugability
● 186 oral small-molecule drug targets from
Overington's 2006 “How many drugs are there?”
● 77 plasma membrane targets
● 1289 total plasma membrane proteins with
Uniprot references in Reactome.org
● Use the following to predict drugability:
Overall informativity
GO-term specific informativity
Target abundance (higher abidance, more off-target action
in case of total inhibition)
Valid targets
Non-targets
Drugability prediction with
some complexity
● Raw prediction is little better then random:
– 65% specificity, 60% selectivity
● However, if we account for:
– Non-oral, Non small molecule drugs
– Drugs developed or in development since 2006
– GO-specific informativity
– The fact Reactome.org / HiNT are bad in
representing CNS functions
● The prediction results are rather encouraging:
– 75% specificity, 90% selectivity
Before we can conclude
● The methods required for the information
circulation have been coded
– Information circulation for the target set
– Calculation of information variation in case of perturbed
interactome alteration
● However, before this project can be deemed
concluded
– model creation and model utilization parts have to be
assembled into a single pipeline (right now they are
separate)
– Run model creation prediction on several secondary
effects with random training / testing set validation
Conclusions
● GO-based information circulation method
seems to work well for secondary effect
mechanism retrieval
● Reactome.org / HiNT dataset – based
information circulation method seems to be
potentially useful for computationally assisted
drug design
● Information circulation methods for secondary
effects quantitative prediction must be tested
before this project can be concluded
Moving further
Finding datasets and people interested in further
development of the method:
– SNP cumulative effect
Requires ability to project on the protein 3D structure and estimate
protein activity inhibition in different contexts
– Drug Design : secondary effect prediction
Typical pharmaceutical firms datastores contain way more
information about toxicity of different compounds and allow much
more finely tuned modeling of pharmacological effects
– Difference between animal and human interactomes:
Predict unexpected polypharmacological effects upon transition
from animal to human trials
Acknowledgements
Pr. Philip Bourne
Pr. Bart Deplancke
Cedric Merlot
Li Xie
Spencer Blieven
Roland Diggelmann
Andreas Prlic
Julia Ponomarenko
Lilia Iakoucheva
Jiang Wang
Cole Christie
Audrey Schenker
THE END
QUESTIONS?
THE END
QUESTIONS?
Graph databases
Random matrix theory
Method improvement
If time: Improvements
● For retrieving statistically significant targets,
– abandon naïve statistical drug target filtering
– build drug-specific information flows
– recover all sufficiently informative proteins for each drug
– use that proteins to get statistically significant targets
=> avoids close miss errors
● When sorting targets:
– Sort the most significant GO terms not by their informativity,
– but by how much information flow associated to them is
perturbed by the given target set
=> avoid need to tune GO term informativity
=> better interpretability
If time: Improvements
● When computing the information flow
– Not consider the information flow between any pair of
proteins as constant
– Consider associated tension (voltage) as constant
– Unrelated proteins are likely to exchange less
information
● To avoid information circulation distortion due to GO
terms correlation:
– Don't use Tanimoto distance / conductance model for
GO-based term circulation
– Use the real point-to-point routing within the GO terms
graph
If time 1:
Random matrix theory
Molecular evolution:
Adaptive mutations = survival of the fittest
Random mutations = Kimura's drift
Tools to separate the two
Protein interaction network evolution:
Adaptative topology modifications
Random topology artefacts
phosphorilation pattern modification due to random mutations
Separating the 2=????
Nothing in biology makes sense
except in the light of evolution.
Theodius Dobjansky
If time 1:
Random matrix theory
In sparse matrices (~=Graphs):
Random matrices have specific eigenvalues
All eignevalues exceeding these values are non-random
Clustering can later be performed in the space generated by
the associated eigenvectors of non-random eigenvalues
If time 2:
Graph Databases
neo4j
Titan DB
If time 2:
Graph Databases
Tinkerpop stack: ~ SQL for Graph databases
If time 3:
Conclusions – general
● Graph databases are worth a try for systems
biology applications
● We need to assemble one comprehensive,
complete and WELL DOCUMENTED resource
for computational systems biology

Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project

  • 1.
    Systems biology inpolypharmacology: predicting and explaining off-target effects Bourne lab at UCSD Under the supervision of Pr. Bourne Under the direction of Pr. Bart Deplancke Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
  • 2.
    Problem Image courtesy ofScannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191–200 Astra Zenca GlaxoSmithKline Sanofi Roche Holding AG Pfizer Inc 11.8 b$/drug 8.2 b$/drug 7.9 b$/drug 7.8 b$/drug 7.7 b$/drug Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
  • 3.
    Problem 95% remaining leads failhere >95% leads fail here Image Courtesy of Alzheimer’s Drug Discovery Foundation
  • 4.
    One disease –one gene – one drug ● Step 1: find a gene relevant to a disease ● Step 2: design small molecule inhibitor for it ● Step 3: test it on cellular animal models ● Step 4: discover secondary effects or absence of therapeutic effect ● Step 5: modify lead to control toxicity ● Repeat steps 3-5 until no more funds available ● If you are lucky, secondary effects are minor to absent ● get more funds and move to human trials. ● Pay attention to unexpected sec. effects ● Pay attention to absence of therapeutic effect in humans
  • 5.
    Unexpected pharmacological effects Absenceof therapeutic effect: – Main cause of rational drug desing failure in the 90th – Have been overcome with better understanding of biolgy Secondary effects: – Cyt-c : well understood and controlled – Unspecific binding: Very frequent Hard to predict Hard to interpret
  • 6.
    Polypharmacology ● Specific agonist/ antagonist design are rare: ● protein sites similarity ● catalytic sites within complexes ● Some drugs owe their pharmacological action to their unspecificity: ● Encaptone ● Ibogaine ● Chlorpromazine ● Kanamycin Polypharmacology: – Use computational methods to predict all the targets a small molecule is likely perturb – Use systems biology to predict consequences of such perturbation ● Secondary effects ● Unexpected therapeutic effect (repositioning) ● Unexpected absence of therapeutic effect (animal model – human difference)
  • 7.
    Scope of masterproject ● Prediction of perturbed targets set: – Drugdesigntech since 2009 – Bourne lab since 2007- 2008 ● Analysis and interpretation of the perturbed targets set is still largely manual. The goal of this master project is to curb this. Image courtesy of Xie et al. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Computational Biology
  • 8.
    Master project environment –EPFL engineering internship – Tool for integrative bioinformatics platform – Used for biotech consulting (polypharmacological effect prediction) – > 1000 drugs and ~1300 human proteins The Bourne Lab – PDB RSCB, Supertarget, IEDB, BioLit – Reliable pipeline for drug off-target effect prediction (4530 protein models, 140 approved drugs) – 7 publications in polypharmacology
  • 9.
    Polypharmacological action mechanisms recovery ●Source: – List of proteins perturbed by a drug ● Wanted: – Mechanisms of unexpected pharmacological action, understandable for a biologist ● Pathway, biological entities, mechanism names ● Ordered by relevance – Unexpected pharmacological action mechanism model, usable for prediction on new drugs
  • 10.
    Rigid structure of Interactions =Interactome Knowledge access structure = GO + pathway names Global idea
  • 11.
    Global idea Platelet activation Immuneresponse onset Th17 activation Polypharmacological effect model suited for prediction on new drugs Polypharmacological effect mechanism understandable for biology expert
  • 12.
    Devil is inthe details ● How to retrieve relevant annotation and sort it by relevance? ● How to determine which targets are to be included in the model?
  • 13.
    Missiuro's information flowand protein informativity Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350. ● Each protein transmits some information to all the other protein within interactome / set of interest (otherwise evolution would have eliminated it) ● Information can only be transmitted through direct interaction (contact, co- complex, participation to the same biochemical reaction) ● The information conductance of an edge is proportional to the interaction importance or confidence ● The information flow is computed between all the pairs of protein within the set (Kirchoff laws + matrix operations). ● Set-specific informativity score is defined for each element of interactome as sum of all pairwise information flows
  • 14.
    If Time 1: MathBehind Information Flow ● Kirchoff law: – For each node, except for sink and source sum entering currents equal exiting currents – For each edge, V = I*R = I/G ● Conductance matrix M: ● Current vector J: Voltage vector V: ● Solve M*V=J; use V to determine information flow through each node 1 2 4 3 G2 G1 G3 G4 G1+ G2 -G1 -G2 0 -G1 G1+ G3 0 -G3 -G2 0 G2+ G4 -G4 0 -G3 -G4 G3+ G4 I1=1 I2=0 I3=0 I4=-1 V1 V2 V3 V4
  • 15.
    Missiuro's information flowand protein informativity ● Advantage over betweenness degree and edge degree: – recovers weak multi-hub regulators – Better at predicting essential genes – Better at predicting genes essential for a specific function (organ development) ● Advantage over stocheomtric methods: – No need to solve 64k differential equations (unstable!) – Reflects not only metabolism, but also regulation Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.
  • 16.
    Model creation ● Recovertargets affected by drugs with a given polypharmacological effect ● Compute the information circulation within interactome for these drugs ● Include all the targets with a significant informativity => “hidden” targets
  • 17.
  • 18.
    Not all theGO terms are equivalent GO term informativity (~protein info for Missiuro et al.) – Expand annotations: T-cell apoptosis regulation → T cell + apoptosis +immune system +... – Define term informativity: – Use it to compute the flow through each term in a pair of proteins: Informativity = conductance – Compute total informativity within a group as a sum of flows through each term in each pair, decided by targets number squared InfTerm= STotal STerm = kb⋅log(NTot) kb⋅log (NTerm) NTot NTerm Total targets Targets annotated with a given GO term
  • 19.
    Same secondary effectmight have distinct mechanisms ● Cluster affected targets by their annotation similarity ● Compute GO-based information circulation within each cluster and sort GO terms by informativity ● Use clusters as additional polypharmacological action models
  • 20.
  • 21.
    GO term informativityadvantages Map to the biological concepts Interpretation by expert biologists => biological sense ? (cf. Potti 2010 scandal at Duke over “metagene” signature) Molecular relation databases typically do badly in some cases: Systemic effects (T-cell maturation, circadian rhythm, … ) Endocrine regulation Central Nervous System (GO however isn't the best ontology for this) Ability to plug in additional data from literature analysis (just account for confidence)
  • 22.
    Implementation: case ofpancreatitis and cirrhosis ● Sec. Effects from SIDER (EMBL) ● Drug-target interaction from Bourne lab and Drugdesigntech simulation results ● Group drugs by secondary effect ● Filter out targets that are frequently affected in random drug collections (Student T-test) Name Expected count Non-random Count in random poll of random poll PYGL_HUMAN 96.58% 4 0.7968 0.35122176 RHOC_HUMAN 95.77% 4 0.768 0.442368 GLTP_HUMAN 95.77% 4 0.768 0.442368 C43BP_HUMAN 95.77% 4 0.768 0.442368 FUT8_HUMAN 95.77% 4 0.768 0.442368 RET7_HUMAN 95.77% 4 0.768 0.442368 CP2E1_HUMAN 94.43% 5 1.4304 0.70953984 2ABA_HUMAN 93.88% 4 0.9984 0.55148544 AUHM_HUMAN 93.88% 4 0.9984 0.55148544 DX39B_HUMAN 93.66% 4 1.3536 0.44411904 NGF_HUMAN 93.49% 11 5.0112 2.33312256 NTRK1_HUMAN 93.49% 11 5.0112 2.33312256 KIF11_HUMAN 93.03% 5 1.5648 0.82308096 Proba StDev in case
  • 23.
    Pancreatitis: clustering results Clustering: BEAat UCSD 1 major cluster RHOC, NGF, NTRK1
  • 24.
    Cirrhosis: clustering results Clustering: HSCat Drugdesign tech 4 major clusters 1 of them (all 4 were informative and relevant) most informative of them: KSYK_HUMAN, CSK_HUMAN
  • 25.
    Quantitative polypharmacological effect prediction Outline: Computethe information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
  • 26.
    Quantitative polypharmacological effect prediction Outline: Computethe information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
  • 27.
    Backbone for theinteractome information flow computation ● NIC-Nature Pathway Interaction Database No, too small coverage ● Kegg Patwhay database No, pathway-oriented and non-connex for atomic interactions ● Unipathway No, too small coverage ● Reactome.org Yay
  • 28.
    Reactome.org : idea ●Reactome.org structure: – BioPax : xml / RDF / OWL – Physical entities: ● Proteins, small molecules, Complexes, RNA, DNA ● Fragments of physical entities – Interaction: ● Degradation / polymerisation / Biochemical reactions ● Molecular interaction ● Genetic interaction – Pathways, Genes, Post-translational modifications...
  • 29.
    Reactome.org : reality ●Reality of Reactome.org: – Main connex element: ~ 22 000 entities, but 3 other with >50 elements – Presence of generic classes : groups of objects – Proteins = mix between proteins, domains, groups, groups of domains… – 15 000 proteins, 5000 UNIPROT references – 156 genes, 56 RNA molecules translation / transcription regulation is not well described
  • 30.
    Reactome.org: incompleteness ● Stillincomplete and reliant on comments: Case of SRC => HiNT database added
  • 31.
    Verification of pipeline: Informationrouting decay Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial metabolism. Molecular Systems Biology 6, 407.
  • 32.
    Verfication of pipeline Predictingtarget drugability ● 186 oral small-molecule drug targets from Overington's 2006 “How many drugs are there?” ● 77 plasma membrane targets ● 1289 total plasma membrane proteins with Uniprot references in Reactome.org ● Use the following to predict drugability: Overall informativity GO-term specific informativity Target abundance (higher abidance, more off-target action in case of total inhibition)
  • 33.
  • 34.
  • 35.
    Drugability prediction with somecomplexity ● Raw prediction is little better then random: – 65% specificity, 60% selectivity ● However, if we account for: – Non-oral, Non small molecule drugs – Drugs developed or in development since 2006 – GO-specific informativity – The fact Reactome.org / HiNT are bad in representing CNS functions ● The prediction results are rather encouraging: – 75% specificity, 90% selectivity
  • 36.
    Before we canconclude ● The methods required for the information circulation have been coded – Information circulation for the target set – Calculation of information variation in case of perturbed interactome alteration ● However, before this project can be deemed concluded – model creation and model utilization parts have to be assembled into a single pipeline (right now they are separate) – Run model creation prediction on several secondary effects with random training / testing set validation
  • 37.
    Conclusions ● GO-based informationcirculation method seems to work well for secondary effect mechanism retrieval ● Reactome.org / HiNT dataset – based information circulation method seems to be potentially useful for computationally assisted drug design ● Information circulation methods for secondary effects quantitative prediction must be tested before this project can be concluded
  • 38.
    Moving further Finding datasetsand people interested in further development of the method: – SNP cumulative effect Requires ability to project on the protein 3D structure and estimate protein activity inhibition in different contexts – Drug Design : secondary effect prediction Typical pharmaceutical firms datastores contain way more information about toxicity of different compounds and allow much more finely tuned modeling of pharmacological effects – Difference between animal and human interactomes: Predict unexpected polypharmacological effects upon transition from animal to human trials
  • 39.
    Acknowledgements Pr. Philip Bourne Pr.Bart Deplancke Cedric Merlot Li Xie Spencer Blieven Roland Diggelmann Andreas Prlic Julia Ponomarenko Lilia Iakoucheva Jiang Wang Cole Christie Audrey Schenker
  • 40.
  • 41.
    THE END QUESTIONS? Graph databases Randommatrix theory Method improvement
  • 42.
    If time: Improvements ●For retrieving statistically significant targets, – abandon naïve statistical drug target filtering – build drug-specific information flows – recover all sufficiently informative proteins for each drug – use that proteins to get statistically significant targets => avoids close miss errors ● When sorting targets: – Sort the most significant GO terms not by their informativity, – but by how much information flow associated to them is perturbed by the given target set => avoid need to tune GO term informativity => better interpretability
  • 43.
    If time: Improvements ●When computing the information flow – Not consider the information flow between any pair of proteins as constant – Consider associated tension (voltage) as constant – Unrelated proteins are likely to exchange less information ● To avoid information circulation distortion due to GO terms correlation: – Don't use Tanimoto distance / conductance model for GO-based term circulation – Use the real point-to-point routing within the GO terms graph
  • 44.
    If time 1: Randommatrix theory Molecular evolution: Adaptive mutations = survival of the fittest Random mutations = Kimura's drift Tools to separate the two Protein interaction network evolution: Adaptative topology modifications Random topology artefacts phosphorilation pattern modification due to random mutations Separating the 2=???? Nothing in biology makes sense except in the light of evolution. Theodius Dobjansky
  • 45.
    If time 1: Randommatrix theory In sparse matrices (~=Graphs): Random matrices have specific eigenvalues All eignevalues exceeding these values are non-random Clustering can later be performed in the space generated by the associated eigenvectors of non-random eigenvalues
  • 46.
    If time 2: GraphDatabases neo4j Titan DB
  • 47.
    If time 2: GraphDatabases Tinkerpop stack: ~ SQL for Graph databases
  • 48.
    If time 3: Conclusions– general ● Graph databases are worth a try for systems biology applications ● We need to assemble one comprehensive, complete and WELL DOCUMENTED resource for computational systems biology

Editor's Notes

  • #6 Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.
  • #20 Fixed tension between sink and source Each GO term shared by the sink and the source passes information current
  • #24 Render Bioinformatics 100 prots name vectors “disease signatures” readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandal Complementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes