Your SlideShare is downloading. ×
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Jim Gray Award Lecture

2,754

Published on

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,754
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Reaming of Life
    Philip E. Bourne
    University of California San Diego
    pbourne@ucsd.edu
    Jim Gray eScience Award Lecture
    Oct. 12, 2010
  • 2. Disclaimer
    I am a domain (life) scientist not a computer or information scientist
    I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground
    I am part of the long tail
    I am naïve, but I am the majority
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
  • 3. The Reaming of Life -What on Earth is He Talking About?
    A reamer is a tool for turning a roughly punched hole into an accurate and smooth one
    The digital data deluge has punched that rough hole
    For the life {other?} sciences to optimally advance we need an accurate and smooth conduit through which data can be distilled, analyzed, visualized, distributed and above all else comprehended
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
  • 4. … and we need to accelerate the process by which this is donehere is why ….
    This is just another way of saying what Jim said and is embodied in the Fourth Paradigm
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
  • 5. The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    By the time the paper is published
    we could all be dead
    http://knol.google.com/k/plos-currents-influenza#
    Motivation
  • 6. In a time of crisis the need for fast access
    to accurate data and any knowledge of
    that data are paramount
    Structure Summary page activity for
    H1N1 Influenza related structures
    Jan. 2008
    Jan. 2009
    Jan. 2010
    Jul. 2009
    Jul. 2008
    Jul. 2010
    3B7E: Neuraminidase of A/Brevig Mission/1/1918
    H1N1 strain in complex with zanamivir
    1RUZ: 1918 H1 Hemagglutinin
    * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
    Motivation
  • 7. If that is not enough…For some people the scientific process may be too slow to save their life
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Motivation
  • 8. Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 9. Chordoma
    A rare form of brain cancer
    No known drugs
    Treatment – surgical resection followed by intense radiation therapy
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
    Motivation
  • 10. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 11. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 12. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 13. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    If I have seen further it is only by
    standing on the shoulders of giants
    Isaac
    Isaac Newton
    From Josh’s point of view the climb
    up just takes too long
    > 15 years and > $850M to be
    more precise
    Adapted: http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 14. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 15. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://sagecongress.org/Presentations/Sommer.pdf
    Motivation
  • 16. Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
    Motivation
  • 17. Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion Here are a few big things …
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
  • 18. A Few Things to Accelerate the Rate of Scientific Discovery
    Better communication, data and knowledge access, and new modes of discovery, which means:
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Hard
    Easy
  • 19. We Need Data and Knowledge About That Data to Interoperate
    The Knowledge and Data Cycle
    0. Full text of PLoS papers stored
    in a database
    4. The composite view has
    links to pertinent blocks
    of literature text and back to the PDB
    User clicks on content
    Metadata and webservices to data provide an interactiveview that can be annotated
    Selecting features provides a data/knowledge mashup
    Analysis leads to new content I can share
    4.
    1.
    3. A composite view of
    journal and database
    content results
    1. A link brings up figures
    from the paper
    3.
    2.
    2. Clicking the paper figure retrieves
    data from the PDB which is
    analyzed
    PLoS Comp. Biol. 2005 1(3) e34
  • 20. We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
    Governance – publishers vs. database providers
    Reward
    Metadata standards for provenance, privacy etc.
    Exemplars
    ….
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Caveat: Each discipline is different – I speak very much from a biomedical
    sciences perspective
  • 21. Certainly the Argument for Interoperability in the Biomedical Sciences is Strong
    1078 databases reported in NAR 2008
    MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
    PubMed contains 18,792,257 entries
    ~100,000 papers indexed per month
    In Feb 2009:
    67,406,898 interactive searches were done
    92,216,786 entries were viewed
    Data as of April 14, 2009
    We need data and knowledge about that data to interoperate
    PLoS Comp. Biol. 2005 1(3) e34
  • 22. A Small Example - The World Wide Protein Data Bank
    The single worldwide repository for data on the structure of biological macromolecules
    Vital for drug discovery and the life sciences
    39 years old
    Free to all
    http://www.wwpdb.org
    We need data and knowledge about that data to interoperate
    PLoS Comp. Biol. 2005 1(3) e34
  • 23. The World Wide Protein Data Bank – The Best Case Scenario
    Paper not published unless data are deposited – strong data to literature correspondence
    Highly structured data conforming to an extensive ontology
    DOI’s assigned to every structure
    http://www.wwpdb.org
    We need data and knowledge about that data to interoperate
    PLoS Comp. Biol. 2005 1(3) e34
  • 24. Example Interoperability: The Database View
    www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
    We need data and knowledge about that data to interoperate
    BMC Bioinformatics 2010 11:220
  • 25. Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
    Nucleic Acids Research 2008 36(S2) W385-389
    We need data and knowledge about that data to interoperate
  • 26. ICTP Trieste, December 10, 2007
    We need data and knowledge about that data to interoperate
  • 27. Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Will Widgets and Semantic Tagging Change Computational Biology?
    PLoS Comp. Biol. 6(2) e1000673
    We need data and knowledge about that data to interoperate
  • 28. Semantic Tagging of Database Content in The Literature or Elsewhere
    http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp
    PLoS Comp. Biol. 6(2) e1000673
    Semantic Tagging
  • 29. We need data and knowledge about that data to interoperate
  • 30. The Publishers are Starting to Do It
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    From Anita de Waard, Elsevier
  • 31. This is Literature Post-processingBetter to Get the Authors Involved
    Authors are the absolute experts on the content
    More effective distribution of labor
    Add metadata before the article enters the publishing process
    We need data and knowledge about that data to interoperate
  • 32. Word 2007 Add-in for authors
    Allows authors to add metadata as they write, before they submit the manuscript
    Authors are assisted by automated term recognition
    OBO ontologies
    Database IDs
    Metadata are embedded directly into the manuscript document via XML tags, OOXML format
    Open
    Machine-readable
    Open source, Microsoft Public License
    http://www.codeplex.com/ucsdbiolit
    We need data and knowledge about that data to interoperate
  • 33. Challenges
    Authors
    Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on
    Publishers
    Carrot Competitive advantage
    We need data and knowledge about that data to interoperate
  • 34. The Promise – A Hypothetical Example
    Cardiac Disease
    Literature
    Immunology Literature
    Shared Function
    We need data and knowledge about that data to interoperate
  • 35. A Few Things to Accelerate the Rate of Scientific Discovery
    Better communication, data and knowledge access, and new modes of discovery, which means:
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Hard
    Easy
  • 36. One Small Example – The Molecular Biology Toolkit (MBT)
    jMol, VMD … are de facto standard important tools for rendering biological molecules .. but
    They are not versatile ie do not for example:
    Respond to the data they are reading
    Offer views that match the users interests
    Allow the user to annotate the data
    Allow those annotations to be shared (published?)
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Think More About the Tools
  • 37. MBT Featureshttp://mbt.sdsc.edu
    Offer a framework not an end user application
    Responds to the data type
    Support read write access
    Encourages others to write end user applications
    Discourages feature creep
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Immunologists
    Immunome Research, 2007 3(1):3
    Medicinal
    Chemists
    BMC Bioinformatics 2005, 6:21.
    Think More About the Tools
  • 38. A Few Things to Accelerate the Rate of Scientific Discovery
    Better communication, data and knowledge access, and new modes of discovery, which means:
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Hard
    Easy
  • 39. Reward Systems Need to ChangeWhat is Needed?
    Author disambiguation
    Auditing (identification and metrics) of all scholarship - means new tools
    Seniors need to promote alternative forms of scholarship
    Juniors need to respond
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia
    PLoS Comp Biol to appear
    Reward Systems Need to Change
  • 40. Example Tools
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    http://www.researcherid.com/
    http://pubnet.gersteinlab.org/
    http://www.biomedexperts.com
  • 41. What Are these Alternative Forms of Scholarship?
    Reviews
    Curation
    Research
    [Grants]
    Journal
    Article
    Poster
    Session
    Conference
    Paper
    Blogs
    Community Service/Data
    Reward Systems Need to Change
  • 42. Reward Systems Need to Change
  • 43. A Unique Identifier is Going to Happen
    It is DOIs for people
    Some scientists will resist
    The winner is ORCHID?
    Reward Systems Need to Change
  • 44. Ideally the ID will be Tagged to Every Piece of Scholarly Communication
    I an Not a Scientist I am a Number
    PLoS Comp. Biol. 2008 4(12) e1000247
    Reward Systems Need to Change
  • 45. A Few Things to Accelerate the Rate of Scientific Discovery
    Better communication, data and knowledge access, and new modes of discovery, which means:
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Hard
    Easy
  • 46. The Truth About My Laboratory
    I have ?? mail folders!
    The intellectual memory of my laboratory is in those folders
    This is an unhealthy hub and spoke mentality
    We Need Scientist Management Tools
  • 47. The Truth About My Laboratory
    I generate way more negative that positive data, but where is it?
    Content management is a mess
    Slides, posters…..
    Data, lab notebooks ….
    Collaborations, Journal clubs …
    Software is open but where is it?
    Farewell is for the data too
    http://artbyvida.com/portfolio.php
    Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136
    We Need Scientist Management Tools
  • 48. Many Great Tools Out There
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Taverna
    We Need Scientist Management Tools
  • 49. Where I See the Problems
    The long tail is confused
    Lack of interoperability between the options
    The reward (publishing) is still removed from the available tools
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    We Need Scientist Management Tools
  • 50. A Few Things to Accelerate the Rate of Scientific Discovery
    Better communication, data and knowledge access, and new modes of discovery, which means:
    We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
    We need to be more open with both
    We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery
    Reward systems need to change
    We need scientist management tools
    We need to be less fixated on the big data problems
    We need to unleash the full power of the Internet
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
    Hard
    Easy
  • 51. Yes YouTube Can Increase the Rate of Discovery
    Unleash the full power of the Internet
  • 52. The Lab ExperimentPaper+Rich Media
    My students enjoyed the experience
    The shyest student was actually the most bold in front of the camera
    “We will become a generation of “science castors”
    They liked the exposure for the most part – rather than the PI it puts them out in front
    Unleash the full power of the Internet
  • 53. Organic Growth
    3 Years Later
    www.scivee.tv
    Some of their work viewed 20,000+ times
    Global audience of researchers, educators and academic/research institutions
    60,000 unique visitors & 2M pageviews/month
    16,000 registered users & 600 communities
    5,000 uploads of video content (about journal articles, conferences, research news and classes)
    Growing 4-5% monthly
    Sustainability - evolving a business model supporting journals and conferences
    Unleash the full power of the Internet
  • 54. What Emerged: SciveeCasts
    Products
    ApplicationProductPrimary Customers
    Journals PubCastJournals, publishers, societies
    Meetings PosterCast Societies, conference orgs.
    SlideCast
    Comm. PaperCast Societies, journals
    Podcast
    SlideCast
    Education PosterCast Societies, universities
    SlideCast
    BooksBookCast Publishers, book sellers
    Unleash the full power of the Internet
  • 55. Summarizing the Reaming of Life
    By “Life” I mean experiences in the Life Sciences
    By “Reaming” I mean the the making of something smooth, fast and accurate
    The Monty Python parody is on conversation cards for getting a dialog going ..
    The rest is just a few examples of the small ways we are trying to address big problems in the hope they will inspire us all to think more deeply about the problem
    Oct. 12, 2010
    Jim Gray eScience Award Lecture
  • 56. Acknowledgements
    BioLit Team
    Lynn Fink
    Parker Williams
    Marco Martinez
    RahulChandran
    Greg Quinn
    MBT
    John Moreland
    John Beaver
    Microsoft Scholarly Communications
    Pablo Fernicola
    Lee Dirks
    SavasParastitidas
    Alex Wade
    Tony Hey
    wwPDB team
    Andreas Prilc
    DimitrisDimitropoulos
    SciVee Team
    Apryl Bailey
    Leo Chalupa
    Lynn Fink
    Marc Friedman (CEO)
    Ken Liu
    Alex Ramos
    Willy Suwanto
    Ben Yukich
    http://www.scivee.tv
    http://biolit.ucsd.edu
    http//www.pdb.org
    http://www.codeplex.com/ucsdbiolit
  • 57. pbourne@ucsd.edu
    Questions?

×