0
Bioinformatics in the Bourne Lab                                     Philip E. Bourne                                    p...
Some Personal Background ….5/3/12               UCSD BILD 94      2
5/3/12   UCSD BILD 94   3
The Life of One Scientist – The Early Years                   So That You Might Not Make the Same Mistakes• My high school...
40+ Years Later                       Ten Simple Rules for Starting a Company                       PLoS Comp Biol 2012 8(...
5/3/12   UCSD BILD 94   6
PhD in Physical Chemistry5/3/12             UCSD BILD 94      7
Always Loved Computing5/3/12            UCSD BILD 94                                 Circa 1974   8
Postdoctoral Work – The Molecular         Basis of How the Body Works• Regrets: never  learnt another  language 5/3/12    ...
Post Doc5/3/12     UCSD BILD 94   10
Some Things Stay with You Your Whole                Life5/3/12          UCSD BILD 94       11
Senior Scientist HHMI Columbia               University New York• Driven not by career but  wanting to live in New York  C...
~1990 Got Involved with the The Human                        Genome                                 • Was only possible by...
Came to UCSD to Apply Computers to         Big Biological Problems• Possibly the best place in the  world to do computatio...
5/3/12   UCSD BILD 94   15
The Protein Kinase Family •A large family important to signal transduction in eukaryotes and many bacteria. •Phosphotransf...
Sometime Ya Got to Just Do It Yourself5/3/12           UCSD BILD 94         17
The Growth of Data is A Major Driver                                         in BiologyNumber of released entries         ...
Demo5/3/12   UCSD BILD 94      19
Big Research Questions in the Lab                                  1.     Can we improve how science is                   ...
Studying Evolution         Through Structure5/3/12          UCSD BILD 94   21
Nature’s Reductionism             There are ~ 20300 possible proteins             >>>> all the atoms in the Universe      ...
Initial Question:    With the current coverage of proteomes     by structure and assuming we know a    high percentage of ...
Chapter 2 Initial Findings                                                                       Song Yang                ...
To Answer this Question We Only Need to       Make Use of Existing Resources• SCOP – Further catalogs Nature’s  reductioni...
The SCOP Hierarchy v1.75         Based on 38221 Structures                                  7                             ...
Is Structure a Useful Discriminator of Species? -                    Maybe…      Distribution among the three kingdomsas t...
Method – Distance Determination  Presence/Absence Data Matrix                                                             ...
Is Structure a Useful Discriminator of                     Species? - Yes    Archaea                    Bacteria          ...
The Answer Would Appear to be Yes                          • It is possible to                            generate a reaso...
Environmental Influence                                      Chris Dupont                           Scripps Institute of O...
Consider the Distribution of Disulfide Bonds                             among Folds• Disulphides are only stable under  o...
Evolution of the Earth• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geologic...
Theoretical Levels of Trace Metals and Oxygen in the                Deep Ocean Through Earth’s History                    ...
Superfamily Distribution As Well As Overall            Content Has Changed                a.1.1     a.1.2                 ...
Hypothesis    • Emergence of cyanobacteria changed oxygen      concentrations    • Impacted metal concentrations in the oc...
Big Research Questions in the Lab                                  1.     Can we improve how science is                   ...
Our Motivation                                         • Tykerb – Breast cancer                                         • ...
Our Broad Approach   • Involves the fields of:          –    Structural bioinformatics          –    Cheminformatics      ...
Approach - Need to Start with a 3D Drug-         Receptor Complex – Either Experimental or                        Modeled ...
A Reverse Engineering Approach to          Drug Discovery Across Gene Families         Characterize ligand binding       I...
Characterization of the Ligand Binding             Site - The Geometric Potential                                         ...
Discrimination Power of the Geometric                               Potential 4                          binding site     ...
Local Sequence-order Independent Alignment with          Maximum-Weight Sub-Graph Algorithm                    Xie and Bou...
Similarity Matrix of AlignmentChemical Similarity• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)• Amino acid...
The Problem with Tuberculosis • One third of global population infected • 1.7 million deaths per year • 95% of deaths in d...
The TB-Drugome   1. Determine the TB structural proteome   2. Determine all known drug binding sites      from the PDB   3...
1. Determine the TB Structural                      Proteome         3, 996   2, 266                        284           ...
2. Determine all Known Drug                       Binding Sites in the PDB         • Searched the PDB for protein crystal ...
Map 2 onto 1 – The TB-Drugome            http://funsite.sdsc.edu/drugome/TB/Similarities between the binding sites of M.tb...
Research is a Good Life
Upcoming SlideShare
Loading in...5
×

Bioinformatics in the Bourne Lab.

617

Published on

A lecture in BILD94 at UCSD on introducing undergraduates to various aspects of bioinformatics.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
617
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Tuberculosis, which is caused by the bacterial pathogen Mycobacterium tuberculosis, is a leading cause of mortality among the infectious diseases. It has been estimated by the World Health Organization (WHO) that almost one-third of the world's population, around 2 billion people, is infected with the disease. Every year, more than 8 million people develop an active form of the disease, which claims the lives of nearly 2 million. This translates to over 4,900 deaths per day, and more than 95% of these are in developing countries. Despite the current global situation, antitubercular drugs have remained largely unchanged over the last four decades. The widespread use of these agents has provided a strong selective pressure for M.tuberculosis, thus encouraging the emergence of resistant strains. Multidrug resistant (MDR) tuberculosis is defined as resistance to the first-line drugs isoniazid and rifampin. The effective treatment of MDR tuberculosis necessitates long-term use of second-line drug combinations, an unfortunate consequence of which is the emergence of further drug resistance. Enter extensively drug resistant (XDR) tuberculosis - M.tuberculosis strains that are resistant to both isoniazid plus rifampin, as well as key second-line drugs. Since the only remaining drug classes exhibit such low potency and high toxicity, XDR tuberculosis is extremely difficult to treat. The rise of XDR tuberculosis around the world imposes a great threat on human health, therefore reinforcing the development of new antitubercular agents as an urgent priority. Very few Mtb proteins explored as drug targets
  • 3,996 proteins in TB proteome749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)ModBase contains homology models for entire TB proteome1,446 ‘high quality’ homology models were added to the data setStructural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable
  • (nutraceuticals excluded)
  • Transcript of "Bioinformatics in the Bourne Lab."

    1. 1. Bioinformatics in the Bourne Lab Philip E. Bourne pbourne@ucsd.edu BILD 94 May 3, 2012August 14, 2009 5/3/12 UCSD BILD 94 1
    2. 2. Some Personal Background ….5/3/12 UCSD BILD 94 2
    3. 3. 5/3/12 UCSD BILD 94 3
    4. 4. The Life of One Scientist – The Early Years So That You Might Not Make the Same Mistakes• My high school teacher Mr. Wilson • The opportunity to said I would be a failure at chemistry live in different• My PhD is in places shaped my chemistry life • Good friends are5/3/12 UCSD BILD 94 forever 4
    5. 5. 40+ Years Later Ten Simple Rules for Starting a Company PLoS Comp Biol 2012 8(3) 10024395/3/12 UCSD BILD 94 5
    6. 6. 5/3/12 UCSD BILD 94 6
    7. 7. PhD in Physical Chemistry5/3/12 UCSD BILD 94 7
    8. 8. Always Loved Computing5/3/12 UCSD BILD 94 Circa 1974 8
    9. 9. Postdoctoral Work – The Molecular Basis of How the Body Works• Regrets: never learnt another language 5/3/12 UCSD BILD 94 9
    10. 10. Post Doc5/3/12 UCSD BILD 94 10
    11. 11. Some Things Stay with You Your Whole Life5/3/12 UCSD BILD 94 11
    12. 12. Senior Scientist HHMI Columbia University New York• Driven not by career but wanting to live in New York City 5/3/12 UCSD BILD 94 12
    13. 13. ~1990 Got Involved with the The Human Genome • Was only possible by applying computers to problems in biology • Developed algorithms to support physical and genetic mapping of Chr 135/3/12 UCSD BILD 94 13
    14. 14. Came to UCSD to Apply Computers to Big Biological Problems• Possibly the best place in the world to do computational biology 5/3/12 UCSD BILD 94 14
    15. 15. 5/3/12 UCSD BILD 94 15
    16. 16. The Protein Kinase Family •A large family important to signal transduction in eukaryotes and many bacteria. •Phosphotransferases: transfer phosphate group from ATP to Ser/Thr or Tyr residue on target protein, producing a range of downstream signaling effects. •PKA: an example of a typical protein kinase (TPK) fold, shown in “open book” format 5/3/12 UCSD BILD 94 16
    17. 17. Sometime Ya Got to Just Do It Yourself5/3/12 UCSD BILD 94 17
    18. 18. The Growth of Data is A Major Driver in BiologyNumber of released entries Year 5/3/12 UCSD BILD 94 18
    19. 19. Demo5/3/12 UCSD BILD 94 19
    20. 20. Big Research Questions in the Lab 1. Can we improve how science is disseminated and comprehended? 2. What is the ancestry of the protein structure universe and what can we learn from it? 3. Are there alternative ways to represent proteins from which we can learn something new? 4. What really happens when we take a drug? 5. Can we contribute to the treatment of neglected {tropical} diseases?August 14, 2009 5/3/12 UCSD BILD 94 20
    21. 21. Studying Evolution Through Structure5/3/12 UCSD BILD 94 21
    22. 22. Nature’s Reductionism There are ~ 20300 possible proteins >>>> all the atoms in the Universe 11.2M protein sequences from 10,854 species (source RefSeq) 38,221 protein structures yield 1195 domain folds (SCOP 1.75)5/3/12 UCSD BILD 94 22
    23. 23. Initial Question: With the current coverage of proteomes by structure and assuming we know a high percentage of all folds, is structure a useful discriminator of species?5/3/12 UCSD BILD 94 23
    24. 24. Chapter 2 Initial Findings Song Yang Russ Doolittle, Post Doc UC Berkeley Professor Department of Chemistry and Biochemistry Center for Molecular Genetics UCSD UCSD Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-85/3/12 UCSD BILD 94 24
    25. 25. To Answer this Question We Only Need to Make Use of Existing Resources• SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies• SUPERFAMILY assigns the above to fully sequenced proteomes5/3/12 UCSD BILD 94 25
    26. 26. The SCOP Hierarchy v1.75 Based on 38221 Structures 7 1195 1962 3902 1108005/3/12 UCSD BILD 94 26
    27. 27. Is Structure a Useful Discriminator of Species? - Maybe… Distribution among the three kingdomsas taken from SUPERFAMILY Eukaryota (650) 153/14 135• Superfamily distributions would seem to be 10 21/2 118 310/0 related to the complexity 645/49 387 of life 9/1 12 29/0 17 42 68/0• Update of the work of Caetano-Anolles2 (2003) Archaea (416) Bacteria (564) Genome Biology 13:1563 SCOP fold (765 total) Any genome / All genomes 5/3/12 UCSD BILD 94 27
    28. 28. Method – Distance Determination Presence/Absence Data Matrix organisms (FSF) SCOP SUPERFAMILY C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 Distance Matrix C. intestinalis C. briggsae F. rubripes C. intestinalis 0 101 109 C. briggsae 0 144 F. rubripes 0Chapter 2 Initial Findings 5/3/12 UCSD BILD 94 28
    29. 29. Is Structure a Useful Discriminator of Species? - Yes Archaea Bacteria Eukaryota The method cleanly placed all species in their correct superkingdoms5/3/12 UCSD BILD 94 29
    30. 30. The Answer Would Appear to be Yes • It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome5/3/12 UCSD BILD 94 30
    31. 31. Environmental Influence Chris Dupont Scripps Institute of Oceanography UCSD DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-178275/3/12 UCSD BILD 94 31
    32. 32. Consider the Distribution of Disulfide Bonds among Folds• Disulphides are only stable under oxidizing conditions Eukaryota• Oxygen content gradually accumulated during the earth’s evolution 31.9% (43/135)• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago 0% 14.4%• Oxygen began to accumulate ~ 2.0 (0/10) 4.7% (17/118) billion years ago (18/387)• Logical deduction – disulfides more 0% 16.7% 5.9% prevalent in folds (organisms) that 1 (0/2) (1/17) (7/42) evolved later Archaea Bacteria• This would seem to hold true• Can we take this further? SCOP fold (708 total) 5/3/12 UCSD BILD 94 32
    33. 33. Evolution of the Earth• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological changes• Life has evolved in this time• The ocean was the “cradle” for 90% of evolution5/3/12 UCSD BILD 94 33
    34. 34. Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History • Whether the deep ocean became oxic or euxinic following the rise Bacteria Eukarya in atmospheric oxygen (~2.3 Gya) Archaea 1 is debated, therefore both are Oxygen 0.5 shown (oxic ocean-solid lines, (O2 in arbitrary units, Zn and Fe in moles L-1 0 euxinic ocean-dashed lines). 1.00E-08 Zinc 1.00E-12 Concentration 1.00E-16 1.00E-20 • The phylogenetic tree symbols at Iron 1.00E-06 1.00E-09 the top of the figure show one 1.00E-12 1.00E-15 1.00E-07 idea as to the theoretical periods Cobalt 1.00E-09 of diversification for each Manganese 1.00E-11 Superkingdom. 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Billions of years before presentReplotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318 5/3/12 UCSD BILD 94 34
    35. 35. Superfamily Distribution As Well As Overall Content Has Changed a.1.1 a.1.2 a.1.1 a.1.2 a.104.1 a.110.1 Bacteria Fe a.104.1 a.119.1 a.2.11 a.110.1 a.138.1 a.24.3 Eukaryotic Fe a.119.1 a.2.11 a.138.1 a.24.3superfamilies a.24.4 a.3.1 a.25.1 a.39.3 superfamilies a.24.4 a.3.1 a.25.1 a.39.3 a.56.1 a.93.1 a.56.1 a.93.1 b.1.13 b.2.6 b.1.13 b.2.6 b.3.6 b.33.1 b.3.6 b.33.1 b.70.2 b.82.2 b.70.2 b.82.2 c.56.6 c.83.1 c.56.6 c.83.1 c.96.1 d.134.1 c.96.1 d.134.1 d.15.4 d.174.1 d.15.4 d.174.1 d.178.1 d.35.1 d.178.1 d.35.1 d.44.1 d.58.1 d.44.1 d.58.1 e.18.1 e.19.1 e.18.1 e.19.1 e.26.1 e.5.1 e.26.1 e.5.1 f.21.1 f.21.2 f.21.1 f.21.2 f.24.1 f.26.1 f.24.1 f.26.1 g.35.1 g.36.1 g.35.1 g.36.1 g.41.5 g.41.5 5/3/12 UCSD BILD 94 35
    36. 36. Hypothesis • Emergence of cyanobacteria changed oxygen concentrations • Impacted metal concentrations in the ocean • Organisms used new metals in new ways to evolve new biological processes eg complex signaling • This in turn further impacted the environment5/3/12 UCSD BILD 94 36
    37. 37. Big Research Questions in the Lab 1. Can we improve how science is disseminated and comprehended? 2. What is the ancestry of the protein structure universe and what can we learn from it? 3. Are there alternative ways to represent proteins from which we can learn something new? 4. What really happens when we take a drug? 5. Can we contribute to the treatment of neglected {tropical} diseases?August 14, 2009 5/3/12 UCSD BILD 94 37
    38. 38. Our Motivation • Tykerb – Breast cancer • Gleevac – Leukemia, GI cancers • Nexavar – Kidney and liver cancer • Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive 5/3/12 UCSD BILD 94 38 Collins and Workman 2006 Nature Chemical Biology 2 689-700Motivators
    39. 39. Our Broad Approach • Involves the fields of: – Structural bioinformatics – Cheminformatics – Biophysics – Systems biology – Pharmaceutical chemistry • L. Xie, L. Xie, S.L. Kinnings and P.E. Bourne 2012 Novel Computational Approaches to Polypharmacology as a Means to Define Responses to Individual Drugs, Annual Review of Pharmacology and Toxicology 52: 361-379 • L. Xie, S.L. Kinnings, L. Xie and P.E. Bourne 2012 Predicting the Polypharmacology of Drugs: Identifying New Uses Through Bioinformatics and Cheminformatics Approaches in Drug Repurposing M. Barrett and D. Frail (Eds.) Wiley and Sons. (available upon request)5/3/12 UCSD BILD 94 39
    40. 40. Approach - Need to Start with a 3D Drug- Receptor Complex – Either Experimental or Modeled Generic Name Other Name Treatment PDBidLipitor Atorvastatin High cholesterol 1HWK, 1HW8…Testosterone Testosterone Osteoporosis 1AFS, 1I9J ..Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXHViagra Sildenafil citrate ED, pulmonary 1TBF, 1UDT, arterial 1XOS.. hypertensionDigoxin Lanoxin Congestive heart 1IGJ failure5/3/12 UCSD BILD 94 40
    41. 41. A Reverse Engineering Approach to Drug Discovery Across Gene Families Characterize ligand binding Identify off-targets by ligand site of primary target binding site similarity (Geometric Potential) (Sequence order independent profile-profile alignment) Extract known drugs or inhibitors of the primary and/or off-targets Search for similar small molecules … Dock molecules to both primary and off-targets Statistics analysis of docking score correlations 415/3/12 Xie and Bourne 2009 Bioinformatics 25(12) 305-312
    42. 42. Characterization of the Ligand Binding Site - The Geometric Potential  Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments • Initially assign C atom with a value that is the distance to the environmental boundary • Update the value with those of surrounding C atoms dependent on distances and orientation – atoms within a 10A radius define i Pi cos( i) 1.0GP P neighbors Di 1.0 2.0 Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 5/3/12 UCSD BILD 94 42
    43. 43. Discrimination Power of the Geometric Potential 4 binding site non-binding site3.5 3 • Geometric2.5 potential can 2 distinguish1.5 binding and 1 non-binding0.5 sites 0 100 0 11 22 33 44 55 66 77 88 99 0 Geometric Potential Geometric Potential Scale For Residue Clusters 5/3/12 UCSD BILD 94 43
    44. 44. Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm Xie and Bourne 2008 PNAS, 105(14) 5441 Structure A Structure B LER VKDL LER VKDL • Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix • The maximum-weight clique corresponds to the optimum alignment of the two structures5/3/12 UCSD BILD 94 44
    45. 45. Similarity Matrix of AlignmentChemical Similarity• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)• Amino acid chemical similarity matrixEvolutionary Correlation• Amino acid substitution matrix such as BLOSUM45• Similarity score between two sequence profiles i i i i d f a Sb fb Sa i i fa, fb are the 20 amino acid target frequencies of profile a and b, respectively Sa, Sb are the PSSM of profile a and b, respectively5/3/12 UCSD BILD 94 45
    46. 46. The Problem with Tuberculosis • One third of global population infected • 1.7 million deaths per year • 95% of deaths in developing countries • Anti-TB drugs hardly changed in 40 years • MDR-TB and XDR-TB pose a threat to human health worldwide • Development of novel, effective and inexpensive drugs is an urgent priority5/3/12 UCSD BILD 94 46
    47. 47. The TB-Drugome 1. Determine the TB structural proteome 2. Determine all known drug binding sites from the PDB 3. Determine which of the sites found in 2 exist in 1 4. Call the result the TB-drugome Kinnings et al 2010 PLoS Comp Biol 6(11): e10009765/3/12 UCSD BILD 94 47
    48. 48. 1. Determine the TB Structural Proteome 3, 996 2, 266 284 1, 446 • High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%5/3/12 UCSD BILD 94 48
    49. 49. 2. Determine all Known Drug Binding Sites in the PDB • Searched the PDB for protein crystal structures bound with FDA-approved drugs • 268 drugs bound in a total of 931 binding sites 140 120 100 AcarboseNo. of drugs Darunavir Alitretinoin 80 Conjugated 60 estrogens 40 Chenodiol 20 Methotrexate 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 No. of drug binding sites5/3/12 UCSD BILD 94 49
    50. 50. Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/Similarities between the binding sites of M.tb proteins (blue), UCSD BILD 94 and binding sites containing approved drugs (red).
    51. 51. Research is a Good Life
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×