Systems biology in polypharmacology:
predicting and explaining off-target
effects
Bourne lab at UCSD
Under the supervision...
Problem
Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. ...
Problem
95% remaining
leads fail here
>95% leads
fail here
Image Courtesy of Alzheimer’s Drug Discovery Foundation
One disease – one gene – one drug
● Step 1: find a gene relevant to a disease
● Step 2: design small molecule inhibitor fo...
Unexpected pharmacological effects
Absence of
therapeutic effect:
– Main cause of rational
drug desing failure in
the 90th...
Polypharmacology
● Specific agonist / antagonist
design are rare:
● protein sites similarity
● catalytic sites within comp...
Scope of master project
● Prediction of perturbed
targets set:
– Drugdesigntech since 2009
– Bourne lab since 2007-
2008
●...
Master project environment
– EPFL engineering internship
– Tool for integrative
bioinformatics platform
– Used for biotech...
Polypharmacological action
mechanisms recovery
● Source:
– List of proteins perturbed by a drug
● Wanted:
– Mechanisms of ...
Rigid structure of
Interactions
= Interactome
Knowledge
access structure
= GO + pathway
names
Global idea
Global idea
Platelet activation
Immune response onset
Th17 activation
Polypharmacological
effect model suited for
predicti...
Devil is in the details
● How to retrieve relevant annotation and sort it
by relevance?
● How to determine which targets a...
Missiuro's information flow and
protein informativity
Image courtesy of Missiuro et al. (2009). Information flow analysis ...
If Time 1:
Math Behind Information Flow
● Kirchoff law:
– For each node, except for sink and source sum entering currents ...
Missiuro's information flow and
protein informativity
● Advantage over betweenness
degree and edge degree:
– recovers weak...
Model creation
● Recover targets affected
by drugs with a given
polypharmacological
effect
● Compute the information
circu...
Retrieve relevant annotation
Not all the GO terms
are equivalent
GO term informativity (~protein info for Missiuro et al.)
– Expand annotations:
T-cell...
Same secondary effect might have
distinct mechanisms
● Cluster affected targets by
their annotation similarity
● Compute G...
Clustering
GO term informativity advantages
Map to the biological concepts
Interpretation by expert biologists => biological sense ?
...
Implementation: case of pancreatitis
and cirrhosis
● Sec. Effects from SIDER (EMBL)
● Drug-target interaction from Bourne ...
Pancreatitis: clustering results
Clustering:
BEA at UCSD
1 major cluster
RHOC, NGF,
NTRK1
Cirrhosis: clustering results
Clustering:
HSC at Drugdesign tech
4 major clusters
1 of them (all 4 were
informative and re...
Quantitative polypharmacological
effect prediction
Outline:
Compute the information
circulation for
pharmacological effect...
Quantitative polypharmacological
effect prediction
Outline:
Compute the information
circulation for
pharmacological effect...
Backbone for the interactome
information flow computation
● NIC-Nature Pathway Interaction Database
No, too small coverage...
Reactome.org : idea
● Reactome.org structure:
– BioPax : xml / RDF / OWL
– Physical entities:
● Proteins, small molecules,...
Reactome.org : reality
● Reality of Reactome.org:
– Main connex element: ~ 22 000 entities, but 3 other
with >50 elements
...
Reactome.org: incompleteness
● Still incomplete and reliant on comments:
Case of SRC => HiNT database added
Verification of pipeline:
Information routing decay
Image courtesy Wintermute et al. (2010). Emergent cooperation in micro...
Verfication of pipeline
Predicting target drugability
● 186 oral small-molecule drug targets from
Overington's 2006 “How m...
Valid targets
Non-targets
Drugability prediction with
some complexity
● Raw prediction is little better then random:
– 65% specificity, 60% selectiv...
Before we can conclude
● The methods required for the information
circulation have been coded
– Information circulation fo...
Conclusions
● GO-based information circulation method
seems to work well for secondary effect
mechanism retrieval
● Reacto...
Moving further
Finding datasets and people interested in further
development of the method:
– SNP cumulative effect
Requir...
Acknowledgements
Pr. Philip Bourne
Pr. Bart Deplancke
Cedric Merlot
Li Xie
Spencer Blieven
Roland Diggelmann
Andreas Prlic...
THE END
QUESTIONS?
THE END
QUESTIONS?
Graph databases
Random matrix theory
Method improvement
If time: Improvements
● For retrieving statistically significant targets,
– abandon naïve statistical drug target filterin...
If time: Improvements
● When computing the information flow
– Not consider the information flow between any pair of
protei...
If time 1:
Random matrix theory
Molecular evolution:
Adaptive mutations = survival of the fittest
Random mutations = Kimur...
If time 1:
Random matrix theory
In sparse matrices (~=Graphs):
Random matrices have specific eigenvalues
All eignevalues e...
If time 2:
Graph Databases
neo4j
Titan DB
If time 2:
Graph Databases
Tinkerpop stack: ~ SQL for Graph databases
If time 3:
Conclusions – general
● Graph databases are worth a try for systems
biology applications
● We need to assemble ...
Upcoming SlideShare
Loading in …5
×

Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project

990 views

Published on

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
990
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
29
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.
  • Fixed tension between sink and source Each GO term shared by the sink and the source passes information current
  • Render Bioinformatics 100 prots name vectors “disease signatures” readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandal Complementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes
  • Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project

    1. 1. Systems biology in polypharmacology: predicting and explaining off-target effects Bourne lab at UCSD Under the supervision of Pr. Bourne Under the direction of Pr. Bart Deplancke Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
    2. 2. Problem Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191–200 Astra Zenca GlaxoSmithKline Sanofi Roche Holding AG Pfizer Inc 11.8 b$/drug 8.2 b$/drug 7.9 b$/drug 7.8 b$/drug 7.7 b$/drug Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
    3. 3. Problem 95% remaining leads fail here >95% leads fail here Image Courtesy of Alzheimer’s Drug Discovery Foundation
    4. 4. One disease – one gene – one drug ● Step 1: find a gene relevant to a disease ● Step 2: design small molecule inhibitor for it ● Step 3: test it on cellular animal models ● Step 4: discover secondary effects or absence of therapeutic effect ● Step 5: modify lead to control toxicity ● Repeat steps 3-5 until no more funds available ● If you are lucky, secondary effects are minor to absent ● get more funds and move to human trials. ● Pay attention to unexpected sec. effects ● Pay attention to absence of therapeutic effect in humans
    5. 5. Unexpected pharmacological effects Absence of therapeutic effect: – Main cause of rational drug desing failure in the 90th – Have been overcome with better understanding of biolgy Secondary effects: – Cyt-c : well understood and controlled – Unspecific binding: Very frequent Hard to predict Hard to interpret
    6. 6. Polypharmacology ● Specific agonist / antagonist design are rare: ● protein sites similarity ● catalytic sites within complexes ● Some drugs owe their pharmacological action to their unspecificity: ● Encaptone ● Ibogaine ● Chlorpromazine ● Kanamycin Polypharmacology: – Use computational methods to predict all the targets a small molecule is likely perturb – Use systems biology to predict consequences of such perturbation ● Secondary effects ● Unexpected therapeutic effect (repositioning) ● Unexpected absence of therapeutic effect (animal model – human difference)
    7. 7. Scope of master project ● Prediction of perturbed targets set: – Drugdesigntech since 2009 – Bourne lab since 2007- 2008 ● Analysis and interpretation of the perturbed targets set is still largely manual. The goal of this master project is to curb this. Image courtesy of Xie et al. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Computational Biology
    8. 8. Master project environment – EPFL engineering internship – Tool for integrative bioinformatics platform – Used for biotech consulting (polypharmacological effect prediction) – > 1000 drugs and ~1300 human proteins The Bourne Lab – PDB RSCB, Supertarget, IEDB, BioLit – Reliable pipeline for drug off-target effect prediction (4530 protein models, 140 approved drugs) – 7 publications in polypharmacology
    9. 9. Polypharmacological action mechanisms recovery ● Source: – List of proteins perturbed by a drug ● Wanted: – Mechanisms of unexpected pharmacological action, understandable for a biologist ● Pathway, biological entities, mechanism names ● Ordered by relevance – Unexpected pharmacological action mechanism model, usable for prediction on new drugs
    10. 10. Rigid structure of Interactions = Interactome Knowledge access structure = GO + pathway names Global idea
    11. 11. Global idea Platelet activation Immune response onset Th17 activation Polypharmacological effect model suited for prediction on new drugs Polypharmacological effect mechanism understandable for biology expert
    12. 12. Devil is in the details ● How to retrieve relevant annotation and sort it by relevance? ● How to determine which targets are to be included in the model?
    13. 13. Missiuro's information flow and protein informativity Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350. ● Each protein transmits some information to all the other protein within interactome / set of interest (otherwise evolution would have eliminated it) ● Information can only be transmitted through direct interaction (contact, co- complex, participation to the same biochemical reaction) ● The information conductance of an edge is proportional to the interaction importance or confidence ● The information flow is computed between all the pairs of protein within the set (Kirchoff laws + matrix operations). ● Set-specific informativity score is defined for each element of interactome as sum of all pairwise information flows
    14. 14. If Time 1: Math Behind Information Flow ● Kirchoff law: – For each node, except for sink and source sum entering currents equal exiting currents – For each edge, V = I*R = I/G ● Conductance matrix M: ● Current vector J: Voltage vector V: ● Solve M*V=J; use V to determine information flow through each node 1 2 4 3 G2 G1 G3 G4 G1+ G2 -G1 -G2 0 -G1 G1+ G3 0 -G3 -G2 0 G2+ G4 -G4 0 -G3 -G4 G3+ G4 I1=1 I2=0 I3=0 I4=-1 V1 V2 V3 V4
    15. 15. Missiuro's information flow and protein informativity ● Advantage over betweenness degree and edge degree: – recovers weak multi-hub regulators – Better at predicting essential genes – Better at predicting genes essential for a specific function (organ development) ● Advantage over stocheomtric methods: – No need to solve 64k differential equations (unstable!) – Reflects not only metabolism, but also regulation Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.
    16. 16. Model creation ● Recover targets affected by drugs with a given polypharmacological effect ● Compute the information circulation within interactome for these drugs ● Include all the targets with a significant informativity => “hidden” targets
    17. 17. Retrieve relevant annotation
    18. 18. Not all the GO terms are equivalent GO term informativity (~protein info for Missiuro et al.) – Expand annotations: T-cell apoptosis regulation → T cell + apoptosis +immune system +... – Define term informativity: – Use it to compute the flow through each term in a pair of proteins: Informativity = conductance – Compute total informativity within a group as a sum of flows through each term in each pair, decided by targets number squared InfTerm= STotal STerm = kb⋅log(NTot) kb⋅log (NTerm) NTot NTerm Total targets Targets annotated with a given GO term
    19. 19. Same secondary effect might have distinct mechanisms ● Cluster affected targets by their annotation similarity ● Compute GO-based information circulation within each cluster and sort GO terms by informativity ● Use clusters as additional polypharmacological action models
    20. 20. Clustering
    21. 21. GO term informativity advantages Map to the biological concepts Interpretation by expert biologists => biological sense ? (cf. Potti 2010 scandal at Duke over “metagene” signature) Molecular relation databases typically do badly in some cases: Systemic effects (T-cell maturation, circadian rhythm, … ) Endocrine regulation Central Nervous System (GO however isn't the best ontology for this) Ability to plug in additional data from literature analysis (just account for confidence)
    22. 22. Implementation: case of pancreatitis and cirrhosis ● Sec. Effects from SIDER (EMBL) ● Drug-target interaction from Bourne lab and Drugdesigntech simulation results ● Group drugs by secondary effect ● Filter out targets that are frequently affected in random drug collections (Student T-test) Name Expected count Non-random Count in random poll of random poll PYGL_HUMAN 96.58% 4 0.7968 0.35122176 RHOC_HUMAN 95.77% 4 0.768 0.442368 GLTP_HUMAN 95.77% 4 0.768 0.442368 C43BP_HUMAN 95.77% 4 0.768 0.442368 FUT8_HUMAN 95.77% 4 0.768 0.442368 RET7_HUMAN 95.77% 4 0.768 0.442368 CP2E1_HUMAN 94.43% 5 1.4304 0.70953984 2ABA_HUMAN 93.88% 4 0.9984 0.55148544 AUHM_HUMAN 93.88% 4 0.9984 0.55148544 DX39B_HUMAN 93.66% 4 1.3536 0.44411904 NGF_HUMAN 93.49% 11 5.0112 2.33312256 NTRK1_HUMAN 93.49% 11 5.0112 2.33312256 KIF11_HUMAN 93.03% 5 1.5648 0.82308096 Proba StDev in case
    23. 23. Pancreatitis: clustering results Clustering: BEA at UCSD 1 major cluster RHOC, NGF, NTRK1
    24. 24. Cirrhosis: clustering results Clustering: HSC at Drugdesign tech 4 major clusters 1 of them (all 4 were informative and relevant) most informative of them: KSYK_HUMAN, CSK_HUMAN
    25. 25. Quantitative polypharmacological effect prediction Outline: Compute the information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
    26. 26. Quantitative polypharmacological effect prediction Outline: Compute the information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
    27. 27. Backbone for the interactome information flow computation ● NIC-Nature Pathway Interaction Database No, too small coverage ● Kegg Patwhay database No, pathway-oriented and non-connex for atomic interactions ● Unipathway No, too small coverage ● Reactome.org Yay
    28. 28. Reactome.org : idea ● Reactome.org structure: – BioPax : xml / RDF / OWL – Physical entities: ● Proteins, small molecules, Complexes, RNA, DNA ● Fragments of physical entities – Interaction: ● Degradation / polymerisation / Biochemical reactions ● Molecular interaction ● Genetic interaction – Pathways, Genes, Post-translational modifications...
    29. 29. Reactome.org : reality ● Reality of Reactome.org: – Main connex element: ~ 22 000 entities, but 3 other with >50 elements – Presence of generic classes : groups of objects – Proteins = mix between proteins, domains, groups, groups of domains… – 15 000 proteins, 5000 UNIPROT references – 156 genes, 56 RNA molecules translation / transcription regulation is not well described
    30. 30. Reactome.org: incompleteness ● Still incomplete and reliant on comments: Case of SRC => HiNT database added
    31. 31. Verification of pipeline: Information routing decay Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial metabolism. Molecular Systems Biology 6, 407.
    32. 32. Verfication of pipeline Predicting target drugability ● 186 oral small-molecule drug targets from Overington's 2006 “How many drugs are there?” ● 77 plasma membrane targets ● 1289 total plasma membrane proteins with Uniprot references in Reactome.org ● Use the following to predict drugability: Overall informativity GO-term specific informativity Target abundance (higher abidance, more off-target action in case of total inhibition)
    33. 33. Valid targets
    34. 34. Non-targets
    35. 35. Drugability prediction with some complexity ● Raw prediction is little better then random: – 65% specificity, 60% selectivity ● However, if we account for: – Non-oral, Non small molecule drugs – Drugs developed or in development since 2006 – GO-specific informativity – The fact Reactome.org / HiNT are bad in representing CNS functions ● The prediction results are rather encouraging: – 75% specificity, 90% selectivity
    36. 36. Before we can conclude ● The methods required for the information circulation have been coded – Information circulation for the target set – Calculation of information variation in case of perturbed interactome alteration ● However, before this project can be deemed concluded – model creation and model utilization parts have to be assembled into a single pipeline (right now they are separate) – Run model creation prediction on several secondary effects with random training / testing set validation
    37. 37. Conclusions ● GO-based information circulation method seems to work well for secondary effect mechanism retrieval ● Reactome.org / HiNT dataset – based information circulation method seems to be potentially useful for computationally assisted drug design ● Information circulation methods for secondary effects quantitative prediction must be tested before this project can be concluded
    38. 38. Moving further Finding datasets and people interested in further development of the method: – SNP cumulative effect Requires ability to project on the protein 3D structure and estimate protein activity inhibition in different contexts – Drug Design : secondary effect prediction Typical pharmaceutical firms datastores contain way more information about toxicity of different compounds and allow much more finely tuned modeling of pharmacological effects – Difference between animal and human interactomes: Predict unexpected polypharmacological effects upon transition from animal to human trials
    39. 39. Acknowledgements Pr. Philip Bourne Pr. Bart Deplancke Cedric Merlot Li Xie Spencer Blieven Roland Diggelmann Andreas Prlic Julia Ponomarenko Lilia Iakoucheva Jiang Wang Cole Christie Audrey Schenker
    40. 40. THE END QUESTIONS?
    41. 41. THE END QUESTIONS? Graph databases Random matrix theory Method improvement
    42. 42. If time: Improvements ● For retrieving statistically significant targets, – abandon naïve statistical drug target filtering – build drug-specific information flows – recover all sufficiently informative proteins for each drug – use that proteins to get statistically significant targets => avoids close miss errors ● When sorting targets: – Sort the most significant GO terms not by their informativity, – but by how much information flow associated to them is perturbed by the given target set => avoid need to tune GO term informativity => better interpretability
    43. 43. If time: Improvements ● When computing the information flow – Not consider the information flow between any pair of proteins as constant – Consider associated tension (voltage) as constant – Unrelated proteins are likely to exchange less information ● To avoid information circulation distortion due to GO terms correlation: – Don't use Tanimoto distance / conductance model for GO-based term circulation – Use the real point-to-point routing within the GO terms graph
    44. 44. If time 1: Random matrix theory Molecular evolution: Adaptive mutations = survival of the fittest Random mutations = Kimura's drift Tools to separate the two Protein interaction network evolution: Adaptative topology modifications Random topology artefacts phosphorilation pattern modification due to random mutations Separating the 2=???? Nothing in biology makes sense except in the light of evolution. Theodius Dobjansky
    45. 45. If time 1: Random matrix theory In sparse matrices (~=Graphs): Random matrices have specific eigenvalues All eignevalues exceeding these values are non-random Clustering can later be performed in the space generated by the associated eigenvectors of non-random eigenvalues
    46. 46. If time 2: Graph Databases neo4j Titan DB
    47. 47. If time 2: Graph Databases Tinkerpop stack: ~ SQL for Graph databases
    48. 48. If time 3: Conclusions – general ● Graph databases are worth a try for systems biology applications ● We need to assemble one comprehensive, complete and WELL DOCUMENTED resource for computational systems biology

    ×