Your SlideShare is downloading. ×
Med264 Structural Bioinformatics
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Med264 Structural Bioinformatics


Published on

What constitutes structural bioinformatics and 2 example areas from our own work - studying evolution using structure and what really happens when we take a drug. Presented to UCSD medical students in …

What constitutes structural bioinformatics and 2 example areas from our own work - studying evolution using structure and what really happens when we take a drug. Presented to UCSD medical students in years 1-3

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • P distance to environmental boundary; Pi Di and alphai D distance to central atom alpha direction to central atom
  • Tuberculosis, which is caused by the bacterial pathogen Mycobacterium tuberculosis, is a leading cause of mortality among the infectious diseases. It has been estimated by the World Health Organization (WHO) that almost one-third of the world's population, around 2 billion people, is infected with the disease.
    Every year, more than 8 million people develop an active form of the disease, which claims the lives of nearly 2 million. This translates to over 4,900 deaths per day, and more than 95% of these are in developing countries.
    Despite the current global situation, antitubercular drugs have remained largely unchanged over the last four decades. The widespread use of these agents has provided a strong selective pressure for M.tuberculosis, thus encouraging the emergence of resistant strains.
    Multidrug resistant (MDR) tuberculosis is defined as resistance to the first-line drugs isoniazid and rifampin. The effective treatment of MDR tuberculosis necessitates long-term use of second-line drug combinations, an unfortunate consequence of which is the emergence of further drug resistance.
    Enter extensively drug resistant (XDR) tuberculosis - M.tuberculosis strains that are resistant to both isoniazid plus rifampin, as well as key second-line drugs. Since the only remaining drug classes exhibit such low potency and high toxicity, XDR tuberculosis is extremely difficult to treat.
    The rise of XDR tuberculosis around the world imposes a great threat on human health, therefore reinforcing the development of new antitubercular agents as an urgent priority.
    Very few Mtb proteins explored as drug targets
  • 3,996 proteins in TB proteome
    749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)
    ModBase contains homology models for entire TB proteome
    1,446 ‘high quality’ homology models were added to the data set
    Structural coverage increased to 43.8%
    Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).
    There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).
    However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)
    Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions.
    The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable
  • (nutraceuticals excluded)
  • Multi-target therapy may be more effective than single-target therapy to treat infectious diseases
    Most of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics.
    GSMN-TB: Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/
    849 reactions, 739 metabolites, 726 genes
    Can optimize the model for in vivo growth
    Carry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)
  • Transcript

    • 1. Structural Bioinformatics with Examples Drawn from Our Own Work Philip E. Bourne Professor of Pharmacology UCSD Associate Vice Chancellor for Innovation & Industry Alliances 10/24/13 MED264 1
    • 2. How I Got Excited
    • 3. Some Things Stay with You Your Whole Life
    • 4. Number of released entries Drivers: Numbers & Complexity 10/24/13 MED264 4 Courtesy of the RCSB Protein Data Bank
    • 5. Putting Structural Bioinformatics in Perspective Drug dosing Pharmacokinetics Pharmacy Information Systems EHR Decision support systems Hospital Information Systems Pharmacy Informatics Biomedical Informatics Bioinformatics Note: These are only representative examples 10/24/13 MED264 Algorithms Genomics Proteomics Biological networks Systems Biology 5
    • 6. Putting Structural Bioinformatics in Perspective Pharmacy Informatics Controlled vocabularies Ontologies Literature searching Data management Pharmacogenomics Personalized medicine Biomedical Informatics Bioinformatics Structural Bioinformatics Note: These are only representative examples 10/24/13 MED264 6
    • 7. 10/24/13 MED264 7
    • 8. Structural Bioinformatics – Example Topics • • • • Structure prediction Evolution Drug discovery Sequence-structurefunction relationships…. Video: 10/24/13 MED264 8
    • 9. Determining 3D Structures – X-ray Crystallography Structural biology moves from being functionally driven to genomically driven Basic Steps Crystallomics • Isolation, Target • Expression, Data Selection • Purification, Collection • Crystallization Fill in Robotics protein fold -ve data space 10/24/13 Structure Solution Structure Refinement Software engineering MED264 Functional Annotation Functional prediction Publish Not necessarily 10
    • 10. Enough background lets look at two fundamental questions where structural bioinformatics is critical 1. Is structure useful in studying evolution and what can we learn? 2. What really happens when we take a drug? 10/24/13 MED264 11
    • 11. Nature’s Reductionism There are ~ 20300 possible proteins >>>> all the atoms in the Universe ~45M protein sequences from UniProt 10/24/13 ~90,000 protein structures Yield ~1500 folds, ~2000 superfamilies, ~4000 MED264 families (SCOP 1.75) 12
    • 12. Structure Provides an Evolutionary Fingerprint Distribution among the three kingdoms as taken from SUPERFAMILY Eukaryota (650) 135 153/14 • Superfamily distributions would seem to be related to the complexity of life 10 21/2 2 9/1 1 118 387 645/49 310/0 17 29/0 Archaea (416) 42 68/0 Bacteria (564) SCOP fold (765 total) Any genome / All genomes 10/24/13 MED264 13
    • 13. Method – Distance Determination Presence/Absence Data Matrix (FSF) SCOP organisms SUPERFAMILY C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 Distance Matrix C. intestinalis C. intestinalis C. briggsae F. rubripes 0 101 109 0 144 C. briggsae F. rubripes 10/24/13 0 MED264 14
    • 14. If Structure is so Conserved is it a Useful Tool in the Study of Evolution? The Answer Would Appear to be Yes • It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 10/24/13 MED264 15
    • 15. The Influence of Environment on Life Chris Dupont Scripps Institute of Oceanography UCSD DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827 10/24/13 MED264 16
    • 16. Consider the Distribution of Disulfide Bonds among Folds • Disulphides are only stable under oxidizing conditions • Oxygen content gradually accumulated during the earth’s evolution • The divergence of the three kingdoms occurred 1.8-2.2 billion years ago • Oxygen began to accumulate ~ 2.0 billion years ago • Logical deduction – disulfides more prevalent in folds (organisms) that evolved later • This would seem to hold true • Can we take this further? 10/24/13 Eukaryota 31.9% (43/135) 0% (0/10) 0% (0/2) 1 4.7% (18/387) 14.4% (17/118) 5.9% (1/17) Archaea MED264 16.7% (7/42) Bacteria SCOP fold (708 total) 17
    • 17. Evolution of the Earth • • • • • 4.5 billion years of change 300+50K 1-5 atmospheres Constant photoenergy Chemical and geological changes • Life has evolved in this time • The ocean was the “cradle” for 90% of evolution 10/24/13 MED264 18
    • 18. Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History • Bacteria Archaea Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic oceansolid lines, euxinic oceandashed lines). • The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom. Eukarya Oxygen Concentration (O2 in arbitrary units, Zn and Fe in moles L-1 Zinc Iron Cobalt Manganese Billions of years before present Replotted from Saito et al, 2003 Inorganica Chimica Acta 356: 308-318 10/24/13 MED264 19
    • 19. The Gaia Hypothesis Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖ α; is a Greek goddess personifying the Earth Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet. James Lovelock 10/24/13 MED264 20
    • 20. The Question • Have the emergent properties of an organism as judged by its protein content been influenced by the environment? • Will do this by consideration of the metallomes of a broad range of species • The metallomes can only be deduced by consideration of the protein structures to which the metal is covalently bound • Will hypothesize that these emergent properties in turn influenced the environment 10/24/13 MED264 21
    • 21. Superfamily Distribution As Well As Overall Content Has Changed a.1.1 a.1.2 a.104.1 a.110.1 a.119.1 a.138.1 a.2.11 a.24.3 a.24.4 a.25.1 a.3.1 a.39.3 a.93.1 a.56.1 a.93.1 b.1.13 b.2.6 b.1.13 b.2.6 b.3.6 b.33.1 b.3.6 b.33.1 b.70.2 b.82.2 b.70.2 b.82.2 c.56.6 c.83.1 c.56.6 c.83.1 c.96.1 d.134.1 c.96.1 d.134.1 d.15.4 d.174.1 d.15.4 d.174.1 d.178.1 d.35.1 d.178.1 d.35.1 d.44.1 d.58.1 d.44.1 d.58.1 e.18.1 e.19.1 e.18.1 e.19.1 e.26.1 e.5.1 e.26.1 e.5.1 f.21.1 f.21.2 f.21.1 f.21.2 f.24.1 f.26.1 f.24.1 f.26.1 g.35.1 g.36.1 g.35.1 g.36.1 a.1.1 Bacteria Fe superfamilies a.1.2 a.104.1 a.110.1 a.119.1 a.138.1 a.2.11 a.24.3 a.24.4 a.25.1 a.3.1 a.39.3 a.56.1 Eukaryotic Fe superfamilies g.41.5 g.41.5 10/24/13 MED264 22
    • 22. Metal Binding Proteins are Not Consistent Across Superkingdoms Since these data are derived from current species they are independent of evolutionary events such as duplication, gene loss, horizontal transfer and endosymbiosis 10/24/13 MED264 23
    • 23. Power Laws: Fundamental Constants in the Evolution of Proteomes A slope of 1 indicates that a group of structural domains is in equilibrium with genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of genome reductions). van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology 10/24/13 MED264 24
    • 24. Why are the Power Laws Different for Each Superkingdom? • Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen • We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom • This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments 10/24/13 MED264 25
    • 25. Do the Metallomes Contain Further Support for this Hypothesis? Overall percent of Fe bound by Fe-S heme amino Superkingdom Fold Family % Fe-binding O2 Cytochrome P450 0.44 + 0.48 heme yes Cytochrome c3-like 0.13 + 0.3 heme no Cytochrome b5 0.12 + 0.09 heme no Eukarya Purple acid phosphatase 0.11 + 0.08 amino no 21 + 9 47 + 19 Penicillin synthase-like 0.07 + 0.1 amino yes Hypoxia-inducible factor 0.07 + 0.04 amino yes 0.06 + 0.01 Di-heme elbow motif heme no 4Fe-4S ferredoxins 1.80 + 0.7 Fe-S no MoCo biosynthesis proteins 1.60 + 0.3 Fe-S no Heme-binding PAS domain 1.10 + 1.0 heme no Archaea HemN 0.80 + 0.20 Fe-S 1 68 + 12 13 + 14 0.60 + 0.16 Fe-S no α helical ferrodoxin biotin synthase 0.55 + 0.1 Fe-S no 0.5 + 0.1 ROO N-terminal domain-like amino 2 High potential iron protein 0.38 + 0.25 Fe-S no Heme-binding PAS domain 0.3 + 0.4 heme 1 MoCo biosynthesis proteins 0.21 + 0.15 Fe-S no Bacteria HemN 0.2 + 0.15 Fe-S no 47 + 11 22 + 12 4Fe-4S ferredoxins 0.2 + 0.2 Fe-S no cytochrome c 0.14 + 0.2 heme no 0.12 + 0.09 α helical ferrodoxin Fe-S no 1. Some, but not all, PAS domains actually sense oxygen 2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway 10/24/13 MED264 32 + 12 19 + 6 31 + 16 26
    • 26. e Transfer Proteins - Same Broad Function, Same Metal, Different Chemistry Induced by the Environment? Fe-S clusters Cytochromes Fe bound by S Fe bound by heme (and amino-acids) Cluster held in place by Cys Generally negative reduction potentials Generally positive reduction potentials Less susceptible to oxidation Very susceptible to oxidation 10/24/13 MED264 27
    • 27. Hypothesis • Emergence of cyanobacteria changed oxygen concentrations • Impacted relative metal ion concentrations in the ocean • Organisms evolved to use these metals in new ways to evolve new biological processes eg complex signaling • This in turn further impacted the environment • Only protein structures could reveal such dependencies 10/24/13 MED264 28
    • 28. What really happens when we take a drug? 10/24/13 MED264 29
    • 29. Our Motivation • Tykerb – Breast cancer • Gleevac – Leukemia, GI cancers • Nexavar – Kidney and liver cancer • Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive Collins and Workman 2006 Nature Chemical Biology 2 689-700 10/24/13 MED264 30
    • 30. A Reverse Engineering Approach to Drug Discovery Across Gene Families Characterize ligand binding site of primary target (Geometric Potential) Identify off-targets by ligand binding site similarity (Sequence order independent profile-profile alignment) Extract known drugs or inhibitors of the primary and/or off-targets Search for similar small molecules … Dock molecules to both primary and off-targets Statistics analysis of docking score correlations 31 Xie and Bourne 2009 Bioinformatics 25(12) 305-312
    • 31. Characterization of the Ligand Binding Site - The Geometric Potential  Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments • • GP = P + Initially assign Cα atom with a value that is the distance to the environmental boundary Update the value with those of surrounding Cα atoms dependent on distances and orientation – atoms within a 10A radius define i Pi cos(αi) + 1.0 × 2.0 neighbors Di + 1.0 ∑ Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
    • 32. Discrimination Power of the Geometric Potential 4 binding site non-binding site 3.5 • Geometric potential can distinguish binding and non-binding sites 3 2.5 2 1.5 1 0.5 100 Geometric Potential 99 88 77 66 55 44 33 22 11 0 0 0 Geometric Potential Scale For Residue Clusters Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
    • 33. Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm Structure A Structure B LER VKDL LER VKDL • • Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix The maximum-weight clique corresponds to the optimum alignment of the two structures Xie and Bourne 2008 PNAS, 105(14) 5441
    • 34. Similarity Matrix of Alignment Chemical Similarity • Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH) • Amino acid chemical similarity matrix Evolutionary Correlation • Amino acid substitution matrix such as BLOSUM45 • Similarity score between two sequence profiles d = ∑ f a Sb + ∑ f b S a i i i i i i fa, fb are the 20 amino acid target frequencies of profile a and b, respectively Sa, Sb are the PSSM of profile a and b, respectively Xie and Bourne 2008 PNAS, 105(14) 5441
    • 35. We are particularly interested in applying these techniques to neglected diseases 10/24/13 MED264 36
    • 36. The Problem with Tuberculosis • • • • One third of global population infected 1.7 million deaths per year 95% of deaths in developing countries Anti-TB drugs hardly changed in 40 years • MDR-TB and XDR-TB pose a threat to human health worldwide • Development of novel, effective and inexpensive drugs is an urgent priority MED264 37
    • 37. The TB-Drugome 1. Determine the TB structural proteome 2. Determine all known drug binding sites from the PDB 3. Determine which of the sites found in 2 exist in 1 4. Call the result the TB-drugome 10/24/13 MED264 38 Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
    • 38. 1. Determine the TB Structural Proteome TB t ro p e m eo om h 3, 996 2, 266 els od ym log o ds lve so t uc tr s re u 284 1, 446 • High quality homology models from ModBase ( increase structural coverage from 7.1% to 43.3% 10/24/13 MED264 39 Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
    • 39. 2. Determine all Known Drug Binding Sites in the PDB • Searched the PDB for protein crystal structures bound with FDA-approved drugs • 268 drugs bound in a total of 931 binding sites Acarbose Darunavir Alitretinoin Conjugated estrogens Chenodiol Methotrexate No. of drug binding sites 10/24/13 MED264 40 Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
    • 40. Map 2 onto 1 – The TB-Drugome Similarities between the binding sites of M.tb proteins (blue), 41 10/24/13 MED264 and binding sites containing approved drugs (red).
    • 41. From a Drug Repositioning Perspective • Similarities between drug binding sites and TB proteins are found for 61/268 drugs • 41 of these drugs could potentially inhibit more than one TB protein chenodiol testosterone ritonavir 10/24/13 conjugated estrogens & methotrexate raloxifene levothyroxine alitretinoin No. of potential TB targets MED264 42 Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
    • 42. Top 5 Most Highly Connected Drugs Drug Intended targets Indications levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor alitretinoin conjugated estrogens methotrexate raloxifene 10/24/13 retinoic acid receptor RXR-α, β & γ, retinoic acid receptor cutaneous lesions in patients α, β & γ-1&2, cellular with Kaposi's sarcoma retinoic acid-binding protein 1&2 estrogen receptor menopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure dihydrofolate reductase, serum albumin gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis estrogen receptor, estrogen receptor β osteoporosis in postmenopausal women MED264 No. of TB proteins connections 14 adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein 13 adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN 10 acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC 10 acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp 9 adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC 43
    • 43. Systems Pharmacology Chang et al. 2010 Plos Comp. Biol. 6(9): e1000938 & Change et al. 2013 BMC Systems Biology 7:102 10/24/13 MED264 44
    • 44. A closing note… 10/24/13 MED264 45
    • 45. Your Social Responsibility Josh Sommer and Chordoma Disease ttp:// 10/24/13 MED264 46
    • 46. Questions? 10/24/13 MED264 47