• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Jim Gray Award Lecture
 

Jim Gray Award Lecture

on

  • 2,530 views

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

Statistics

Views

Total Views
2,530
Views on SlideShare
2,530
Embed Views
0

Actions

Likes
1
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Jim Gray Award Lecture Jim Gray Award Lecture Presentation Transcript

    • The Reaming of Life
      Philip E. Bourne
      University of California San Diego
      pbourne@ucsd.edu
      Jim Gray eScience Award Lecture
      Oct. 12, 2010
    • Disclaimer
      I am a domain (life) scientist not a computer or information scientist
      I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground
      I am part of the long tail
      I am naïve, but I am the majority
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
    • The Reaming of Life -What on Earth is He Talking About?
      A reamer is a tool for turning a roughly punched hole into an accurate and smooth one
      The digital data deluge has punched that rough hole
      For the life {other?} sciences to optimally advance we need an accurate and smooth conduit through which data can be distilled, analyzed, visualized, distributed and above all else comprehended
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
    • … and we need to accelerate the process by which this is donehere is why ….
      This is just another way of saying what Jim said and is embodied in the Fourth Paradigm
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
    • The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      By the time the paper is published
      we could all be dead
      http://knol.google.com/k/plos-currents-influenza#
      Motivation
    • In a time of crisis the need for fast access
      to accurate data and any knowledge of
      that data are paramount
      Structure Summary page activity for
      H1N1 Influenza related structures
      Jan. 2008
      Jan. 2009
      Jan. 2010
      Jul. 2009
      Jul. 2008
      Jul. 2010
      3B7E: Neuraminidase of A/Brevig Mission/1/1918
      H1N1 strain in complex with zanamivir
      1RUZ: 1918 H1 Hemagglutinin
      * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
      Motivation
    • If that is not enough…For some people the scientific process may be too slow to save their life
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Motivation
    • Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Chordoma
      A rare form of brain cancer
      No known drugs
      Treatment – surgical resection followed by intense radiation therapy
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      If I have seen further it is only by
      standing on the shoulders of giants
      Isaac
      Isaac Newton
      From Josh’s point of view the climb
      up just takes too long
      > 15 years and > $850M to be
      more precise
      Adapted: http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://sagecongress.org/Presentations/Sommer.pdf
      Motivation
    • Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
      Motivation
    • Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion Here are a few big things …
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
    • A Few Things to Accelerate the Rate of Scientific Discovery
      Better communication, data and knowledge access, and new modes of discovery, which means:
      We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
      We need to be more open with both
      We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
      Reward systems need to change
      We need scientist management tools
      We need to be less fixated on the big data problems
      We need to unleash the full power of the Internet
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Hard
      Easy
    • We Need Data and Knowledge About That Data to Interoperate
      The Knowledge and Data Cycle
      0. Full text of PLoS papers stored
      in a database
      4. The composite view has
      links to pertinent blocks
      of literature text and back to the PDB
      User clicks on content
      Metadata and webservices to data provide an interactiveview that can be annotated
      Selecting features provides a data/knowledge mashup
      Analysis leads to new content I can share
      4.
      1.
      3. A composite view of
      journal and database
      content results
      1. A link brings up figures
      from the paper
      3.
      2.
      2. Clicking the paper figure retrieves
      data from the PDB which is
      analyzed
      PLoS Comp. Biol. 2005 1(3) e34
    • We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
      Governance – publishers vs. database providers
      Reward
      Metadata standards for provenance, privacy etc.
      Exemplars
      ….
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Caveat: Each discipline is different – I speak very much from a biomedical
      sciences perspective
    • Certainly the Argument for Interoperability in the Biomedical Sciences is Strong
      1078 databases reported in NAR 2008
      MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
      PubMed contains 18,792,257 entries
      ~100,000 papers indexed per month
      In Feb 2009:
      67,406,898 interactive searches were done
      92,216,786 entries were viewed
      Data as of April 14, 2009
      We need data and knowledge about that data to interoperate
      PLoS Comp. Biol. 2005 1(3) e34
    • A Small Example - The World Wide Protein Data Bank
      The single worldwide repository for data on the structure of biological macromolecules
      Vital for drug discovery and the life sciences
      39 years old
      Free to all
      http://www.wwpdb.org
      We need data and knowledge about that data to interoperate
      PLoS Comp. Biol. 2005 1(3) e34
    • The World Wide Protein Data Bank – The Best Case Scenario
      Paper not published unless data are deposited – strong data to literature correspondence
      Highly structured data conforming to an extensive ontology
      DOI’s assigned to every structure
      http://www.wwpdb.org
      We need data and knowledge about that data to interoperate
      PLoS Comp. Biol. 2005 1(3) e34
    • Example Interoperability: The Database View
      www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
      We need data and knowledge about that data to interoperate
      BMC Bioinformatics 2010 11:220
    • Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
      Nucleic Acids Research 2008 36(S2) W385-389
      We need data and knowledge about that data to interoperate
    • ICTP Trieste, December 10, 2007
      We need data and knowledge about that data to interoperate
    • Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Will Widgets and Semantic Tagging Change Computational Biology?
      PLoS Comp. Biol. 6(2) e1000673
      We need data and knowledge about that data to interoperate
    • Semantic Tagging of Database Content in The Literature or Elsewhere
      http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp
      PLoS Comp. Biol. 6(2) e1000673
      Semantic Tagging
    • We need data and knowledge about that data to interoperate
    • The Publishers are Starting to Do It
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      From Anita de Waard, Elsevier
    • This is Literature Post-processingBetter to Get the Authors Involved
      Authors are the absolute experts on the content
      More effective distribution of labor
      Add metadata before the article enters the publishing process
      We need data and knowledge about that data to interoperate
    • Word 2007 Add-in for authors
      Allows authors to add metadata as they write, before they submit the manuscript
      Authors are assisted by automated term recognition
      OBO ontologies
      Database IDs
      Metadata are embedded directly into the manuscript document via XML tags, OOXML format
      Open
      Machine-readable
      Open source, Microsoft Public License
      http://www.codeplex.com/ucsdbiolit
      We need data and knowledge about that data to interoperate
    • Challenges
      Authors
      Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on
      Publishers
      Carrot Competitive advantage
      We need data and knowledge about that data to interoperate
    • The Promise – A Hypothetical Example
      Cardiac Disease
      Literature
      Immunology Literature
      Shared Function
      We need data and knowledge about that data to interoperate
    • A Few Things to Accelerate the Rate of Scientific Discovery
      Better communication, data and knowledge access, and new modes of discovery, which means:
      We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
      We need to be more open with both
      We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
      Reward systems need to change
      We need scientist management tools
      We need to be less fixated on the big data problems
      We need to unleash the full power of the Internet
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Hard
      Easy
    • One Small Example – The Molecular Biology Toolkit (MBT)
      jMol, VMD … are de facto standard important tools for rendering biological molecules .. but
      They are not versatile ie do not for example:
      Respond to the data they are reading
      Offer views that match the users interests
      Allow the user to annotate the data
      Allow those annotations to be shared (published?)
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Think More About the Tools
    • MBT Featureshttp://mbt.sdsc.edu
      Offer a framework not an end user application
      Responds to the data type
      Support read write access
      Encourages others to write end user applications
      Discourages feature creep
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Immunologists
      Immunome Research, 2007 3(1):3
      Medicinal
      Chemists
      BMC Bioinformatics 2005, 6:21.
      Think More About the Tools
    • A Few Things to Accelerate the Rate of Scientific Discovery
      Better communication, data and knowledge access, and new modes of discovery, which means:
      We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
      We need to be more open with both
      We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
      Reward systems need to change
      We need scientist management tools
      We need to be less fixated on the big data problems
      We need to unleash the full power of the Internet
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Hard
      Easy
    • Reward Systems Need to ChangeWhat is Needed?
      Author disambiguation
      Auditing (identification and metrics) of all scholarship - means new tools
      Seniors need to promote alternative forms of scholarship
      Juniors need to respond
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia
      PLoS Comp Biol to appear
      Reward Systems Need to Change
    • Example Tools
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      http://www.researcherid.com/
      http://pubnet.gersteinlab.org/
      http://www.biomedexperts.com
    • What Are these Alternative Forms of Scholarship?
      Reviews
      Curation
      Research
      [Grants]
      Journal
      Article
      Poster
      Session
      Conference
      Paper
      Blogs
      Community Service/Data
      Reward Systems Need to Change
    • Reward Systems Need to Change
    • A Unique Identifier is Going to Happen
      It is DOIs for people
      Some scientists will resist
      The winner is ORCHID?
      Reward Systems Need to Change
    • Ideally the ID will be Tagged to Every Piece of Scholarly Communication
      I an Not a Scientist I am a Number
      PLoS Comp. Biol. 2008 4(12) e1000247
      Reward Systems Need to Change
    • A Few Things to Accelerate the Rate of Scientific Discovery
      Better communication, data and knowledge access, and new modes of discovery, which means:
      We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
      We need to be more open with both
      We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
      Reward systems need to change
      We need scientist management tools
      We need to be less fixated on the big data problems
      We need to unleash the full power of the Internet
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Hard
      Easy
    • The Truth About My Laboratory
      I have ?? mail folders!
      The intellectual memory of my laboratory is in those folders
      This is an unhealthy hub and spoke mentality
      We Need Scientist Management Tools
    • The Truth About My Laboratory
      I generate way more negative that positive data, but where is it?
      Content management is a mess
      Slides, posters…..
      Data, lab notebooks ….
      Collaborations, Journal clubs …
      Software is open but where is it?
      Farewell is for the data too
      http://artbyvida.com/portfolio.php
      Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136
      We Need Scientist Management Tools
    • Many Great Tools Out There
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Taverna
      We Need Scientist Management Tools
    • Where I See the Problems
      The long tail is confused
      Lack of interoperability between the options
      The reward (publishing) is still removed from the available tools
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      We Need Scientist Management Tools
    • A Few Things to Accelerate the Rate of Scientific Discovery
      Better communication, data and knowledge access, and new modes of discovery, which means:
      We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
      We need to be more open with both
      We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
      Reward systems need to change
      We need scientist management tools
      We need to be less fixated on the big data problems
      We need to unleash the full power of the Internet
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
      Hard
      Easy
    • Yes YouTube Can Increase the Rate of Discovery
      Unleash the full power of the Internet
    • The Lab ExperimentPaper+Rich Media
      My students enjoyed the experience
      The shyest student was actually the most bold in front of the camera
      “We will become a generation of “science castors”
      They liked the exposure for the most part – rather than the PI it puts them out in front
      Unleash the full power of the Internet
    • Organic Growth
      3 Years Later
      www.scivee.tv
      Some of their work viewed 20,000+ times
      Global audience of researchers, educators and academic/research institutions
      60,000 unique visitors & 2M pageviews/month
      16,000 registered users & 600 communities
      5,000 uploads of video content (about journal articles, conferences, research news and classes)
      Growing 4-5% monthly
      Sustainability - evolving a business model supporting journals and conferences
      Unleash the full power of the Internet
    • What Emerged: SciveeCasts
      Products
      ApplicationProductPrimary Customers
      Journals PubCastJournals, publishers, societies
      Meetings PosterCast Societies, conference orgs.
      SlideCast
      Comm. PaperCast Societies, journals
      Podcast
      SlideCast
      Education PosterCast Societies, universities
      SlideCast
      BooksBookCast Publishers, book sellers
      Unleash the full power of the Internet
    • Summarizing the Reaming of Life
      By “Life” I mean experiences in the Life Sciences
      By “Reaming” I mean the the making of something smooth, fast and accurate
      The Monty Python parody is on conversation cards for getting a dialog going ..
      The rest is just a few examples of the small ways we are trying to address big problems in the hope they will inspire us all to think more deeply about the problem
      Oct. 12, 2010
      Jim Gray eScience Award Lecture
    • Acknowledgements
      BioLit Team
      Lynn Fink
      Parker Williams
      Marco Martinez
      RahulChandran
      Greg Quinn
      MBT
      John Moreland
      John Beaver
      Microsoft Scholarly Communications
      Pablo Fernicola
      Lee Dirks
      SavasParastitidas
      Alex Wade
      Tony Hey
      wwPDB team
      Andreas Prilc
      DimitrisDimitropoulos
      SciVee Team
      Apryl Bailey
      Leo Chalupa
      Lynn Fink
      Marc Friedman (CEO)
      Ken Liu
      Alex Ramos
      Willy Suwanto
      Ben Yukich
      http://www.scivee.tv
      http://biolit.ucsd.edu
      http//www.pdb.org
      http://www.codeplex.com/ucsdbiolit
    • pbourne@ucsd.edu
      Questions?