Scholarly Communication for Bioinformatics Students


Published on

Presentation made to the incoming bioinformatics and systems biology students at UCSD on how they could get involved in changing scholarly communication. Given February 28, 2011

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 3,996 proteins in TB proteome749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)ModBase contains homology models for entire TB proteome1,446 ‘high quality’ homology models were added to the data setStructural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable
  • (nutraceuticals excluded)
  • Multi-target therapy may be more effective than single-target therapy to treat infectious diseasesMost of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics.GSMN-TB: Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/ reactions, 739 metabolites, 726 genesCan optimize the model for in vivo growthCarry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)
  • Scholarly Communication for Bioinformatics Students

    1. 1. The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student<br />Philip E. Bourne<br />University of California San Diego<br /><br /><br />Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011<br />
    2. 2. Observation 1: Everyone in this Room is Driven by One Thing Above All Else<br />
    3. 3. Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other <br />
    4. 4. Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on Publications<br />
    5. 5. Perhaps it is Time We Though Less About a Publication as a Reward and More About How it Can be Presented to Maximize its Use<br />
    6. 6. So What Needs to Happen<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Hard<br />Easy<br />
    7. 7. One Personal Example of Why This Needs to Happen Now<br />
    8. 8. Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation<br /><br />
    9. 9. Chordoma<br />A rare form of brain cancer<br />No known drugs<br />Treatment – surgical resection followed by intense radiation therapy<br /><br />
    10. 10.<br />
    11. 11.<br />
    12. 12.<br />
    13. 13. If I have seen further it is only by<br />standing on the shoulders of giants<br />Isaac<br />Isaac Newton<br />From Josh’s point of view the climb <br />up just takes too long<br />> 15 years and > $850M to be <br />more precise<br />Adapted:<br />
    14. 14.<br />
    15. 15.<br />
    16. 16.<br />
    17. 17. So We Have Seem What Needs the Change and Why. What about the How?<br />
    18. 18. We Need Data and Knowledge About That Data to Interoperate<br />The Knowledge and Data Cycle<br />0. Full text of PLoS papers stored <br />in a database<br />4. The composite view has<br />links to pertinent blocks <br />of literature text and back to the PDB<br />User clicks on content<br />Metadata and webservices to data provide an interactiveview that can be annotated<br />Selecting features provides a data/knowledge mashup<br />Analysis leads to new content I can share<br />4.<br />1.<br />3. A composite view of<br />journal and database<br />content results<br />1. A link brings up figures <br />from the paper<br />3.<br />2.<br />2. Clicking the paper figure retrieves<br />data from the PDB which is<br />analyzed<br />PLoS Comp. Biol. 2005 1(3) e34<br />
    19. 19. We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?<br />Open Access<br />Governance – publishers vs. database providers<br />Reward<br />Metadata standards for provenance, privacy etc.<br />Exemplars<br /> ….<br />
    20. 20. A Small Example - The World Wide Protein Data Bank<br />The single worldwide repository for data on the structure of biological macromolecules<br />Vital for drug discovery and the life sciences<br />39 years old<br />Free to all<br /><br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
    21. 21. The World Wide Protein Data Bank – The Best Case Scenario<br />Paper not published unless data are deposited – strong data to literature correspondence<br />Highly structured data conforming to an extensive ontology<br />DOI’s assigned to every structure<br /> <br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
    22. 22. Example Interoperability: The Database View<br /><br />We need data and knowledge about that data to interoperate<br />BMC Bioinformatics 2010 11:220<br />
    23. 23. Example Interoperability: The Literature View<br />Nucleic Acids Research 2008 36(S2) W385-389<br />We need data and knowledge about that data to interoperate<br />
    24. 24. ICTP Trieste, December 10, 2007<br />We need data and knowledge about that data to interoperate<br />
    25. 25. Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much<br />Will Widgets and Semantic Tagging Change Computational Biology? <br />PLoS Comp. Biol. 6(2) e1000673<br />We need data and knowledge about that data to interoperate<br />
    26. 26. Semantic Tagging of Database Content in The Literature or Elsewhere<br /><br />PLoS Comp. Biol. 6(2) e1000673<br />Semantic Tagging<br />
    27. 27. We need data and knowledge about that data to interoperate<br />
    28. 28. The Publishers are Starting to Do It<br />From Anita de Waard, Elsevier <br />
    29. 29. This is Literature Post-processingBetter to Get the Authors Involved<br />Authors are the absolute experts on the content<br />More effective distribution of labor<br />Add metadata before the article enters the publishing process<br />We need data and knowledge about that data to interoperate<br />
    30. 30. Word 2007 Add-in for authors<br />Allows authors to add metadata as they write, before they submit the manuscript<br />Authors are assisted by automated term recognition<br />OBO ontologies<br />Database IDs<br />Metadata are embedded directly into the manuscript document via XML tags, OOXML format<br />Open<br />Machine-readable<br />Open source, Microsoft Public License<br /><br />We need data and knowledge about that data to interoperate<br />
    31. 31. Challenges<br />Authors <br />Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on<br />Publishers<br />Carrot Competitive advantage<br />We need data and knowledge about that data to interoperate<br />
    32. 32. The Promise – A Hypothetical Example<br />Cardiac Disease<br />Literature<br />Immunology Literature<br />Shared Function<br />We need data and knowledge about that data to interoperate<br />
    33. 33. High-throughput Biology Requires High-throughput Knowledge Discovery<br />Consider an Example from Our Own Work…<br />Roger Chang Will Give You Another Example<br />
    34. 34. The TB-Drugome<br />Determine the TB structural proteome<br />Determine all known drug binding sites from the PDB<br />Determine which of the sites found in 2 exist in 1<br />Call the result the TB-drugome<br />Kinnings et al 2010 PLoS Comp Biol6(11): e1000976<br />High-throughput Data Requires High-throughput Knowledge<br />
    35. 35. 1. Determine the TB Structural Proteome<br />TB proteome<br />homology models<br />solved structures<br />2, 266<br />3, 996<br />284<br />1, 446<br />High quality homology models from ModBase ( increase structural coverage from 7.1% to 43.3%<br />Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976<br />
    36. 36. 2. Determine all Known Drug Binding Sites in the PDB<br />Searched the PDB for protein crystal structures bound with FDA-approved drugs<br />268 drugs bound in a total of 931 binding sites<br />No. of drugs<br />Acarbose<br />Darunavir<br />Alitretinoin<br />Conjugated estrogens<br />Chenodiol<br />Methotrexate<br />No. of drug binding sites<br />Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976<br />
    37. 37. Map 2 onto 1 – The TB-Drugome<br /><br />Similarities between the binding sites of M.tb proteins (blue), <br />and binding sites containing approved drugs (red). <br />
    38. 38. From a Drug Repositioning Perspective<br />Similarities between drug binding sites and TB proteins are found for 61/268 drugs<br />41 of these drugs could potentially inhibit more than one TB protein<br />conjugated estrogens &<br />methotrexate<br />No. of drugs<br />chenodiol<br />levothyroxine<br />testosterone<br />raloxifene<br />alitretinoin<br />ritonavir<br />No. of potential TB targets<br />Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976<br />
    39. 39. Top 5 Most Highly Connected Drugs<br />
    40. 40. We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of PubMed Abstracts – Its About Changing the System<br />Our Future is in Your Hands!<br />
    41. 41. Acknowledgements<br />BioLit Team<br />Lynn Fink<br />Parker Williams<br />Marco Martinez<br />RahulChandran<br />Greg Quinn<br />Microsoft Scholarly Communications<br />Pablo Fernicola<br />Lee Dirks<br />SavasParastitidas<br />Alex Wade<br />Tony Hey<br />RCSB PDB team<br />Andreas Prilc<br />DimitrisDimitropoulos<br />TB Drugome Team<br />Lei Xie<br />Sarah Kinnings<br />Li Xie<br /><br /><br />http//<br /><br />
    42. 42.<br />Questions?<br />