Successfully reported this slideshow.
Your SlideShare is downloading. ×

Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project

Upcoming SlideShare
PGX Data Mining
PGX Data Mining
Loading in …3

Check these out next

1 of 48 Ad

More Related Content

Slideshows for you (20)

Similar to Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project (20)


Recently uploaded (20)

Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project

  1. 1. Systems biology in polypharmacology: predicting and explaining off-target effects Bourne lab at UCSD Under the supervision of Pr. Bourne Under the direction of Pr. Bart Deplancke Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
  2. 2. Problem Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191–200 Astra Zenca GlaxoSmithKline Sanofi Roche Holding AG Pfizer Inc 11.8 b$/drug 8.2 b$/drug 7.9 b$/drug 7.8 b$/drug 7.7 b$/drug Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
  3. 3. Problem 95% remaining leads fail here >95% leads fail here Image Courtesy of Alzheimer’s Drug Discovery Foundation
  4. 4. One disease – one gene – one drug ● Step 1: find a gene relevant to a disease ● Step 2: design small molecule inhibitor for it ● Step 3: test it on cellular animal models ● Step 4: discover secondary effects or absence of therapeutic effect ● Step 5: modify lead to control toxicity ● Repeat steps 3-5 until no more funds available ● If you are lucky, secondary effects are minor to absent ● get more funds and move to human trials. ● Pay attention to unexpected sec. effects ● Pay attention to absence of therapeutic effect in humans
  5. 5. Unexpected pharmacological effects Absence of therapeutic effect: – Main cause of rational drug desing failure in the 90th – Have been overcome with better understanding of biolgy Secondary effects: – Cyt-c : well understood and controlled – Unspecific binding: Very frequent Hard to predict Hard to interpret
  6. 6. Polypharmacology ● Specific agonist / antagonist design are rare: ● protein sites similarity ● catalytic sites within complexes ● Some drugs owe their pharmacological action to their unspecificity: ● Encaptone ● Ibogaine ● Chlorpromazine ● Kanamycin Polypharmacology: – Use computational methods to predict all the targets a small molecule is likely perturb – Use systems biology to predict consequences of such perturbation ● Secondary effects ● Unexpected therapeutic effect (repositioning) ● Unexpected absence of therapeutic effect (animal model – human difference)
  7. 7. Scope of master project ● Prediction of perturbed targets set: – Drugdesigntech since 2009 – Bourne lab since 2007- 2008 ● Analysis and interpretation of the perturbed targets set is still largely manual. The goal of this master project is to curb this. Image courtesy of Xie et al. (2011). Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Computational Biology
  8. 8. Master project environment – EPFL engineering internship – Tool for integrative bioinformatics platform – Used for biotech consulting (polypharmacological effect prediction) – > 1000 drugs and ~1300 human proteins The Bourne Lab – PDB RSCB, Supertarget, IEDB, BioLit – Reliable pipeline for drug off-target effect prediction (4530 protein models, 140 approved drugs) – 7 publications in polypharmacology
  9. 9. Polypharmacological action mechanisms recovery ● Source: – List of proteins perturbed by a drug ● Wanted: – Mechanisms of unexpected pharmacological action, understandable for a biologist ● Pathway, biological entities, mechanism names ● Ordered by relevance – Unexpected pharmacological action mechanism model, usable for prediction on new drugs
  10. 10. Rigid structure of Interactions = Interactome Knowledge access structure = GO + pathway names Global idea
  11. 11. Global idea Platelet activation Immune response onset Th17 activation Polypharmacological effect model suited for prediction on new drugs Polypharmacological effect mechanism understandable for biology expert
  12. 12. Devil is in the details ● How to retrieve relevant annotation and sort it by relevance? ● How to determine which targets are to be included in the model?
  13. 13. Missiuro's information flow and protein informativity Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350. ● Each protein transmits some information to all the other protein within interactome / set of interest (otherwise evolution would have eliminated it) ● Information can only be transmitted through direct interaction (contact, co- complex, participation to the same biochemical reaction) ● The information conductance of an edge is proportional to the interaction importance or confidence ● The information flow is computed between all the pairs of protein within the set (Kirchoff laws + matrix operations). ● Set-specific informativity score is defined for each element of interactome as sum of all pairwise information flows
  14. 14. If Time 1: Math Behind Information Flow ● Kirchoff law: – For each node, except for sink and source sum entering currents equal exiting currents – For each edge, V = I*R = I/G ● Conductance matrix M: ● Current vector J: Voltage vector V: ● Solve M*V=J; use V to determine information flow through each node 1 2 4 3 G2 G1 G3 G4 G1+ G2 -G1 -G2 0 -G1 G1+ G3 0 -G3 -G2 0 G2+ G4 -G4 0 -G3 -G4 G3+ G4 I1=1 I2=0 I3=0 I4=-1 V1 V2 V3 V4
  15. 15. Missiuro's information flow and protein informativity ● Advantage over betweenness degree and edge degree: – recovers weak multi-hub regulators – Better at predicting essential genes – Better at predicting genes essential for a specific function (organ development) ● Advantage over stocheomtric methods: – No need to solve 64k differential equations (unstable!) – Reflects not only metabolism, but also regulation Image courtesy of Missiuro et al. (2009). Information flow analysis of interactome networks. PLoS Computational Biology 5, e1000350.
  16. 16. Model creation ● Recover targets affected by drugs with a given polypharmacological effect ● Compute the information circulation within interactome for these drugs ● Include all the targets with a significant informativity => “hidden” targets
  17. 17. Retrieve relevant annotation
  18. 18. Not all the GO terms are equivalent GO term informativity (~protein info for Missiuro et al.) – Expand annotations: T-cell apoptosis regulation → T cell + apoptosis +immune system +... – Define term informativity: – Use it to compute the flow through each term in a pair of proteins: Informativity = conductance – Compute total informativity within a group as a sum of flows through each term in each pair, decided by targets number squared InfTerm= STotal STerm = kb⋅log(NTot) kb⋅log (NTerm) NTot NTerm Total targets Targets annotated with a given GO term
  19. 19. Same secondary effect might have distinct mechanisms ● Cluster affected targets by their annotation similarity ● Compute GO-based information circulation within each cluster and sort GO terms by informativity ● Use clusters as additional polypharmacological action models
  20. 20. Clustering
  21. 21. GO term informativity advantages Map to the biological concepts Interpretation by expert biologists => biological sense ? (cf. Potti 2010 scandal at Duke over “metagene” signature) Molecular relation databases typically do badly in some cases: Systemic effects (T-cell maturation, circadian rhythm, … ) Endocrine regulation Central Nervous System (GO however isn't the best ontology for this) Ability to plug in additional data from literature analysis (just account for confidence)
  22. 22. Implementation: case of pancreatitis and cirrhosis ● Sec. Effects from SIDER (EMBL) ● Drug-target interaction from Bourne lab and Drugdesigntech simulation results ● Group drugs by secondary effect ● Filter out targets that are frequently affected in random drug collections (Student T-test) Name Expected count Non-random Count in random poll of random poll PYGL_HUMAN 96.58% 4 0.7968 0.35122176 RHOC_HUMAN 95.77% 4 0.768 0.442368 GLTP_HUMAN 95.77% 4 0.768 0.442368 C43BP_HUMAN 95.77% 4 0.768 0.442368 FUT8_HUMAN 95.77% 4 0.768 0.442368 RET7_HUMAN 95.77% 4 0.768 0.442368 CP2E1_HUMAN 94.43% 5 1.4304 0.70953984 2ABA_HUMAN 93.88% 4 0.9984 0.55148544 AUHM_HUMAN 93.88% 4 0.9984 0.55148544 DX39B_HUMAN 93.66% 4 1.3536 0.44411904 NGF_HUMAN 93.49% 11 5.0112 2.33312256 NTRK1_HUMAN 93.49% 11 5.0112 2.33312256 KIF11_HUMAN 93.03% 5 1.5648 0.82308096 Proba StDev in case
  23. 23. Pancreatitis: clustering results Clustering: BEA at UCSD 1 major cluster RHOC, NGF, NTRK1
  24. 24. Cirrhosis: clustering results Clustering: HSC at Drugdesign tech 4 major clusters 1 of them (all 4 were informative and relevant) most informative of them: KSYK_HUMAN, CSK_HUMAN
  25. 25. Quantitative polypharmacological effect prediction Outline: Compute the information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
  26. 26. Quantitative polypharmacological effect prediction Outline: Compute the information circulation for pharmacological effect specific targets Measure dicrease of information circulation within the “all targets” model and the “cluster” models
  27. 27. Backbone for the interactome information flow computation ● NIC-Nature Pathway Interaction Database No, too small coverage ● Kegg Patwhay database No, pathway-oriented and non-connex for atomic interactions ● Unipathway No, too small coverage ● Yay
  28. 28. : idea ● structure: – BioPax : xml / RDF / OWL – Physical entities: ● Proteins, small molecules, Complexes, RNA, DNA ● Fragments of physical entities – Interaction: ● Degradation / polymerisation / Biochemical reactions ● Molecular interaction ● Genetic interaction – Pathways, Genes, Post-translational modifications...
  29. 29. : reality ● Reality of – Main connex element: ~ 22 000 entities, but 3 other with >50 elements – Presence of generic classes : groups of objects – Proteins = mix between proteins, domains, groups, groups of domains… – 15 000 proteins, 5000 UNIPROT references – 156 genes, 56 RNA molecules translation / transcription regulation is not well described
  30. 30. incompleteness ● Still incomplete and reliant on comments: Case of SRC => HiNT database added
  31. 31. Verification of pipeline: Information routing decay Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial metabolism. Molecular Systems Biology 6, 407.
  32. 32. Verfication of pipeline Predicting target drugability ● 186 oral small-molecule drug targets from Overington's 2006 “How many drugs are there?” ● 77 plasma membrane targets ● 1289 total plasma membrane proteins with Uniprot references in ● Use the following to predict drugability: Overall informativity GO-term specific informativity Target abundance (higher abidance, more off-target action in case of total inhibition)
  33. 33. Valid targets
  34. 34. Non-targets
  35. 35. Drugability prediction with some complexity ● Raw prediction is little better then random: – 65% specificity, 60% selectivity ● However, if we account for: – Non-oral, Non small molecule drugs – Drugs developed or in development since 2006 – GO-specific informativity – The fact / HiNT are bad in representing CNS functions ● The prediction results are rather encouraging: – 75% specificity, 90% selectivity
  36. 36. Before we can conclude ● The methods required for the information circulation have been coded – Information circulation for the target set – Calculation of information variation in case of perturbed interactome alteration ● However, before this project can be deemed concluded – model creation and model utilization parts have to be assembled into a single pipeline (right now they are separate) – Run model creation prediction on several secondary effects with random training / testing set validation
  37. 37. Conclusions ● GO-based information circulation method seems to work well for secondary effect mechanism retrieval ● / HiNT dataset – based information circulation method seems to be potentially useful for computationally assisted drug design ● Information circulation methods for secondary effects quantitative prediction must be tested before this project can be concluded
  38. 38. Moving further Finding datasets and people interested in further development of the method: – SNP cumulative effect Requires ability to project on the protein 3D structure and estimate protein activity inhibition in different contexts – Drug Design : secondary effect prediction Typical pharmaceutical firms datastores contain way more information about toxicity of different compounds and allow much more finely tuned modeling of pharmacological effects – Difference between animal and human interactomes: Predict unexpected polypharmacological effects upon transition from animal to human trials
  39. 39. Acknowledgements Pr. Philip Bourne Pr. Bart Deplancke Cedric Merlot Li Xie Spencer Blieven Roland Diggelmann Andreas Prlic Julia Ponomarenko Lilia Iakoucheva Jiang Wang Cole Christie Audrey Schenker
  41. 41. THE END QUESTIONS? Graph databases Random matrix theory Method improvement
  42. 42. If time: Improvements ● For retrieving statistically significant targets, – abandon naïve statistical drug target filtering – build drug-specific information flows – recover all sufficiently informative proteins for each drug – use that proteins to get statistically significant targets => avoids close miss errors ● When sorting targets: – Sort the most significant GO terms not by their informativity, – but by how much information flow associated to them is perturbed by the given target set => avoid need to tune GO term informativity => better interpretability
  43. 43. If time: Improvements ● When computing the information flow – Not consider the information flow between any pair of proteins as constant – Consider associated tension (voltage) as constant – Unrelated proteins are likely to exchange less information ● To avoid information circulation distortion due to GO terms correlation: – Don't use Tanimoto distance / conductance model for GO-based term circulation – Use the real point-to-point routing within the GO terms graph
  44. 44. If time 1: Random matrix theory Molecular evolution: Adaptive mutations = survival of the fittest Random mutations = Kimura's drift Tools to separate the two Protein interaction network evolution: Adaptative topology modifications Random topology artefacts phosphorilation pattern modification due to random mutations Separating the 2=???? Nothing in biology makes sense except in the light of evolution. Theodius Dobjansky
  45. 45. If time 1: Random matrix theory In sparse matrices (~=Graphs): Random matrices have specific eigenvalues All eignevalues exceeding these values are non-random Clustering can later be performed in the space generated by the associated eigenvectors of non-random eigenvalues
  46. 46. If time 2: Graph Databases neo4j Titan DB
  47. 47. If time 2: Graph Databases Tinkerpop stack: ~ SQL for Graph databases
  48. 48. If time 3: Conclusions – general ● Graph databases are worth a try for systems biology applications ● We need to assemble one comprehensive, complete and WELL DOCUMENTED resource for computational systems biology

Editor's Notes

  • Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.
  • Fixed tension between sink and source Each GO term shared by the sink and the source passes information current
  • Render Bioinformatics 100 prots name vectors “disease signatures” readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandal Complementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes