• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scholarly Communication for Bioinformatics Students
 

Scholarly Communication for Bioinformatics Students

on

  • 825 views

Presentation made to the incoming bioinformatics and systems biology students at UCSD on how they could get involved in changing scholarly communication. Given February 28, 2011

Presentation made to the incoming bioinformatics and systems biology students at UCSD on how they could get involved in changing scholarly communication. Given February 28, 2011

Statistics

Views

Total Views
825
Views on SlideShare
825
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 3,996 proteins in TB proteome749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)ModBase contains homology models for entire TB proteome1,446 ‘high quality’ homology models were added to the data setStructural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable
  • (nutraceuticals excluded)
  • Multi-target therapy may be more effective than single-target therapy to treat infectious diseasesMost of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics.GSMN-TB: Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/sbs.surrey.ac.uk/tb)849 reactions, 739 metabolites, 726 genesCan optimize the model for in vivo growthCarry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)

Scholarly Communication for Bioinformatics Students Scholarly Communication for Bioinformatics Students Presentation Transcript

  • The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student
    Philip E. Bourne
    University of California San Diego
    pbourne@ucsd.edu
    http://www.sdsc.edu/pb
    Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011
  • Observation 1: Everyone in this Room is Driven by One Thing Above All Else
  • Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other
  • Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on Publications
  • Perhaps it is Time We Though Less About a Publication as a Reward and More About How it Can be Presented to Maximize its Use
  • So What Needs to Happen
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Hard
    Easy
  • One Personal Example of Why This Needs to Happen Now
  • Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
    http://sagecongress.org/Presentations/Sommer.pdf
  • Chordoma
    A rare form of brain cancer
    No known drugs
    Treatment – surgical resection followed by intense radiation therapy
    http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
  • http://sagecongress.org/Presentations/Sommer.pdf
  • http://sagecongress.org/Presentations/Sommer.pdf
  • http://sagecongress.org/Presentations/Sommer.pdf
  • If I have seen further it is only by
    standing on the shoulders of giants
    Isaac
    Isaac Newton
    From Josh’s point of view the climb
    up just takes too long
    > 15 years and > $850M to be
    more precise
    Adapted: http://sagecongress.org/Presentations/Sommer.pdf
  • http://sagecongress.org/Presentations/Sommer.pdf
  • http://sagecongress.org/Presentations/Sommer.pdf
  • http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
  • So We Have Seem What Needs the Change and Why. What about the How?
  • We Need Data and Knowledge About That Data to Interoperate
    The Knowledge and Data Cycle
    0. Full text of PLoS papers stored
    in a database
    4. The composite view has
    links to pertinent blocks
    of literature text and back to the PDB
    User clicks on content
    Metadata and webservices to data provide an interactiveview that can be annotated
    Selecting features provides a data/knowledge mashup
    Analysis leads to new content I can share
    4.
    1.
    3. A composite view of
    journal and database
    content results
    1. A link brings up figures
    from the paper
    3.
    2.
    2. Clicking the paper figure retrieves
    data from the PDB which is
    analyzed
    PLoS Comp. Biol. 2005 1(3) e34
  • We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
    Open Access
    Governance – publishers vs. database providers
    Reward
    Metadata standards for provenance, privacy etc.
    Exemplars
    ….
  • A Small Example - The World Wide Protein Data Bank
    The single worldwide repository for data on the structure of biological macromolecules
    Vital for drug discovery and the life sciences
    39 years old
    Free to all
    http://www.wwpdb.org
    We need data and knowledge about that data to interoperate
    PLoS Comp. Biol. 2005 1(3) e34
  • The World Wide Protein Data Bank – The Best Case Scenario
    Paper not published unless data are deposited – strong data to literature correspondence
    Highly structured data conforming to an extensive ontology
    DOI’s assigned to every structure
    http://www.wwpdb.org
    We need data and knowledge about that data to interoperate
    PLoS Comp. Biol. 2005 1(3) e34
  • Example Interoperability: The Database View
    www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
    We need data and knowledge about that data to interoperate
    BMC Bioinformatics 2010 11:220
  • Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
    Nucleic Acids Research 2008 36(S2) W385-389
    We need data and knowledge about that data to interoperate
  • ICTP Trieste, December 10, 2007
    We need data and knowledge about that data to interoperate
  • Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much
    Will Widgets and Semantic Tagging Change Computational Biology?
    PLoS Comp. Biol. 6(2) e1000673
    We need data and knowledge about that data to interoperate
  • Semantic Tagging of Database Content in The Literature or Elsewhere
    http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp
    PLoS Comp. Biol. 6(2) e1000673
    Semantic Tagging
  • We need data and knowledge about that data to interoperate
  • The Publishers are Starting to Do It
    From Anita de Waard, Elsevier
  • This is Literature Post-processingBetter to Get the Authors Involved
    Authors are the absolute experts on the content
    More effective distribution of labor
    Add metadata before the article enters the publishing process
    We need data and knowledge about that data to interoperate
  • Word 2007 Add-in for authors
    Allows authors to add metadata as they write, before they submit the manuscript
    Authors are assisted by automated term recognition
    OBO ontologies
    Database IDs
    Metadata are embedded directly into the manuscript document via XML tags, OOXML format
    Open
    Machine-readable
    Open source, Microsoft Public License
    http://www.codeplex.com/ucsdbiolit
    We need data and knowledge about that data to interoperate
  • Challenges
    Authors
    Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on
    Publishers
    Carrot Competitive advantage
    We need data and knowledge about that data to interoperate
  • The Promise – A Hypothetical Example
    Cardiac Disease
    Literature
    Immunology Literature
    Shared Function
    We need data and knowledge about that data to interoperate
  • High-throughput Biology Requires High-throughput Knowledge Discovery
    Consider an Example from Our Own Work…
    Roger Chang Will Give You Another Example
  • The TB-Drugome
    Determine the TB structural proteome
    Determine all known drug binding sites from the PDB
    Determine which of the sites found in 2 exist in 1
    Call the result the TB-drugome
    Kinnings et al 2010 PLoS Comp Biol6(11): e1000976
    High-throughput Data Requires High-throughput Knowledge
  • 1. Determine the TB Structural Proteome
    TB proteome
    homology models
    solved structures
    2, 266
    3, 996
    284
    1, 446
    High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%
    Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • 2. Determine all Known Drug Binding Sites in the PDB
    Searched the PDB for protein crystal structures bound with FDA-approved drugs
    268 drugs bound in a total of 931 binding sites
    No. of drugs
    Acarbose
    Darunavir
    Alitretinoin
    Conjugated estrogens
    Chenodiol
    Methotrexate
    No. of drug binding sites
    Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • Map 2 onto 1 – The TB-Drugome
    http://funsite.sdsc.edu/drugome/TB/
    Similarities between the binding sites of M.tb proteins (blue),
    and binding sites containing approved drugs (red).
  • From a Drug Repositioning Perspective
    Similarities between drug binding sites and TB proteins are found for 61/268 drugs
    41 of these drugs could potentially inhibit more than one TB protein
    conjugated estrogens &
    methotrexate
    No. of drugs
    chenodiol
    levothyroxine
    testosterone
    raloxifene
    alitretinoin
    ritonavir
    No. of potential TB targets
    Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
  • Top 5 Most Highly Connected Drugs
  • We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of PubMed Abstracts – Its About Changing the System
    Our Future is in Your Hands!
  • Acknowledgements
    BioLit Team
    Lynn Fink
    Parker Williams
    Marco Martinez
    RahulChandran
    Greg Quinn
    Microsoft Scholarly Communications
    Pablo Fernicola
    Lee Dirks
    SavasParastitidas
    Alex Wade
    Tony Hey
    RCSB PDB team
    Andreas Prilc
    DimitrisDimitropoulos
    TB Drugome Team
    Lei Xie
    Sarah Kinnings
    Li Xie
    http://funsite.sdsc.edu/drugome/TB/
    http://biolit.ucsd.edu
    http//www.pdb.org
    http://www.codeplex.com/ucsdbiolit
  • pbourne@ucsd.edu
    Questions?