Jim Gray Award Lecture

3,024 views
2,921 views

Published on

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,024
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Jim Gray Award Lecture

  1. 1. The Reaming of Life<br />Philip E. Bourne<br />University of California San Diego<br />pbourne@ucsd.edu<br />Jim Gray eScience Award Lecture<br />Oct. 12, 2010<br />
  2. 2. Disclaimer<br />I am a domain (life) scientist not a computer or information scientist<br />I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground<br />I am part of the long tail<br />I am naïve, but I am the majority<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  3. 3. The Reaming of Life -What on Earth is He Talking About?<br />A reamer is a tool for turning a roughly punched hole into an accurate and smooth one<br />The digital data deluge has punched that rough hole<br />For the life {other?} sciences to optimally advance we need an accurate and smooth conduit through which data can be distilled, analyzed, visualized, distributed and above all else comprehended<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  4. 4. … and we need to accelerate the process by which this is donehere is why ….<br />This is just another way of saying what Jim said and is embodied in the Fourth Paradigm<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  5. 5. The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />By the time the paper is published<br /> we could all be dead<br />http://knol.google.com/k/plos-currents-influenza#<br />Motivation<br />
  6. 6. In a time of crisis the need for fast access <br />to accurate data and any knowledge of<br />that data are paramount<br />Structure Summary page activity for<br />H1N1 Influenza related structures<br />Jan. 2008<br />Jan. 2009<br />Jan. 2010<br />Jul. 2009<br />Jul. 2008<br />Jul. 2010<br />3B7E: Neuraminidase of A/Brevig Mission/1/1918 <br />H1N1 strain in complex with zanamivir<br />1RUZ: 1918 H1 Hemagglutinin<br />* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm<br />Motivation<br />
  7. 7. If that is not enough…For some people the scientific process may be too slow to save their life<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Motivation<br />
  8. 8. Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  9. 9. Chordoma<br />A rare form of brain cancer<br />No known drugs<br />Treatment – surgical resection followed by intense radiation therapy<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG<br />Motivation<br />
  10. 10. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  11. 11. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  12. 12. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  13. 13. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />If I have seen further it is only by<br />standing on the shoulders of giants<br />Isaac<br />Isaac Newton<br />From Josh’s point of view the climb <br />up just takes too long<br />> 15 years and > $850M to be <br />more precise<br />Adapted: http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  14. 14. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  15. 15. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  16. 16. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation<br />Motivation<br />
  17. 17. Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion Here are a few big things …<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  18. 18. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  19. 19. We Need Data and Knowledge About That Data to Interoperate<br />The Knowledge and Data Cycle<br />0. Full text of PLoS papers stored <br />in a database<br />4. The composite view has<br />links to pertinent blocks <br />of literature text and back to the PDB<br />User clicks on content<br />Metadata and webservices to data provide an interactiveview that can be annotated<br />Selecting features provides a data/knowledge mashup<br />Analysis leads to new content I can share<br />4.<br />1.<br />3. A composite view of<br />journal and database<br />content results<br />1. A link brings up figures <br />from the paper<br />3.<br />2.<br />2. Clicking the paper figure retrieves<br />data from the PDB which is<br />analyzed<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  20. 20. We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?<br />Governance – publishers vs. database providers<br />Reward<br />Metadata standards for provenance, privacy etc.<br />Exemplars<br /> ….<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Caveat: Each discipline is different – I speak very much from a biomedical<br />sciences perspective<br />
  21. 21. Certainly the Argument for Interoperability in the Biomedical Sciences is Strong<br />1078 databases reported in NAR 2008<br />MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times<br />PubMed contains 18,792,257 entries<br />~100,000 papers indexed per month<br />In Feb 2009:<br />67,406,898 interactive searches were done<br />92,216,786 entries were viewed<br />Data as of April 14, 2009<br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  22. 22. A Small Example - The World Wide Protein Data Bank<br />The single worldwide repository for data on the structure of biological macromolecules<br />Vital for drug discovery and the life sciences<br />39 years old<br />Free to all<br />http://www.wwpdb.org<br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  23. 23. The World Wide Protein Data Bank – The Best Case Scenario<br />Paper not published unless data are deposited – strong data to literature correspondence<br />Highly structured data conforming to an extensive ontology<br />DOI’s assigned to every structure<br />http://www.wwpdb.org <br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  24. 24. Example Interoperability: The Database View<br />www.rcsb.org/pdb/explore/literature.do?structureId=1TIM<br />We need data and knowledge about that data to interoperate<br />BMC Bioinformatics 2010 11:220<br />
  25. 25. Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu<br />Nucleic Acids Research 2008 36(S2) W385-389<br />We need data and knowledge about that data to interoperate<br />
  26. 26. ICTP Trieste, December 10, 2007<br />We need data and knowledge about that data to interoperate<br />
  27. 27. Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Will Widgets and Semantic Tagging Change Computational Biology? <br />PLoS Comp. Biol. 6(2) e1000673<br />We need data and knowledge about that data to interoperate<br />
  28. 28. Semantic Tagging of Database Content in The Literature or Elsewhere<br />http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp<br />PLoS Comp. Biol. 6(2) e1000673<br />Semantic Tagging<br />
  29. 29. We need data and knowledge about that data to interoperate<br />
  30. 30. The Publishers are Starting to Do It<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />From Anita de Waard, Elsevier <br />
  31. 31. This is Literature Post-processingBetter to Get the Authors Involved<br />Authors are the absolute experts on the content<br />More effective distribution of labor<br />Add metadata before the article enters the publishing process<br />We need data and knowledge about that data to interoperate<br />
  32. 32. Word 2007 Add-in for authors<br />Allows authors to add metadata as they write, before they submit the manuscript<br />Authors are assisted by automated term recognition<br />OBO ontologies<br />Database IDs<br />Metadata are embedded directly into the manuscript document via XML tags, OOXML format<br />Open<br />Machine-readable<br />Open source, Microsoft Public License<br />http://www.codeplex.com/ucsdbiolit<br />We need data and knowledge about that data to interoperate<br />
  33. 33. Challenges<br />Authors <br />Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on<br />Publishers<br />Carrot Competitive advantage<br />We need data and knowledge about that data to interoperate<br />
  34. 34. The Promise – A Hypothetical Example<br />Cardiac Disease<br />Literature<br />Immunology Literature<br />Shared Function<br />We need data and knowledge about that data to interoperate<br />
  35. 35. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  36. 36. One Small Example – The Molecular Biology Toolkit (MBT)<br />jMol, VMD … are de facto standard important tools for rendering biological molecules .. but<br />They are not versatile ie do not for example:<br />Respond to the data they are reading<br />Offer views that match the users interests<br />Allow the user to annotate the data<br />Allow those annotations to be shared (published?)<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Think More About the Tools<br />
  37. 37. MBT Featureshttp://mbt.sdsc.edu<br />Offer a framework not an end user application<br />Responds to the data type<br />Support read write access<br />Encourages others to write end user applications<br />Discourages feature creep<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Immunologists<br />Immunome Research, 2007 3(1):3 <br />Medicinal<br />Chemists<br />BMC Bioinformatics 2005, 6:21.<br />Think More About the Tools<br />
  38. 38. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  39. 39. Reward Systems Need to ChangeWhat is Needed?<br />Author disambiguation<br />Auditing (identification and metrics) of all scholarship - means new tools<br />Seniors need to promote alternative forms of scholarship<br />Juniors need to respond<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia <br />PLoS Comp Biol to appear<br />Reward Systems Need to Change<br />
  40. 40. Example Tools<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://www.researcherid.com/<br />http://pubnet.gersteinlab.org/<br />http://www.biomedexperts.com<br />
  41. 41. What Are these Alternative Forms of Scholarship?<br />Reviews<br />Curation<br />Research<br />[Grants]<br />Journal<br />Article<br />Poster<br />Session<br />Conference<br />Paper<br />Blogs<br />Community Service/Data<br />Reward Systems Need to Change<br />
  42. 42. Reward Systems Need to Change<br />
  43. 43. A Unique Identifier is Going to Happen <br />It is DOIs for people<br />Some scientists will resist<br />The winner is ORCHID?<br />Reward Systems Need to Change<br />
  44. 44. Ideally the ID will be Tagged to Every Piece of Scholarly Communication<br />I an Not a Scientist I am a Number<br />PLoS Comp. Biol. 2008 4(12) e1000247<br />Reward Systems Need to Change<br />
  45. 45. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  46. 46. The Truth About My Laboratory<br />I have ?? mail folders!<br />The intellectual memory of my laboratory is in those folders<br />This is an unhealthy hub and spoke mentality<br />We Need Scientist Management Tools<br />
  47. 47. The Truth About My Laboratory<br />I generate way more negative that positive data, but where is it? <br />Content management is a mess<br />Slides, posters…..<br />Data, lab notebooks ….<br />Collaborations, Journal clubs …<br />Software is open but where is it?<br />Farewell is for the data too<br />http://artbyvida.com/portfolio.php<br />Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136<br />We Need Scientist Management Tools<br />
  48. 48. Many Great Tools Out There<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Taverna<br />We Need Scientist Management Tools<br />
  49. 49. Where I See the Problems<br />The long tail is confused<br />Lack of interoperability between the options<br />The reward (publishing) is still removed from the available tools<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />We Need Scientist Management Tools<br />
  50. 50. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  51. 51. Yes YouTube Can Increase the Rate of Discovery<br />Unleash the full power of the Internet<br />
  52. 52. The Lab ExperimentPaper+Rich Media<br />My students enjoyed the experience<br />The shyest student was actually the most bold in front of the camera<br />“We will become a generation of “science castors”<br />They liked the exposure for the most part – rather than the PI it puts them out in front<br />Unleash the full power of the Internet<br />
  53. 53. Organic Growth<br />3 Years Later<br />www.scivee.tv<br />Some of their work viewed 20,000+ times<br />Global audience of researchers, educators and academic/research institutions<br />60,000 unique visitors & 2M pageviews/month<br />16,000 registered users & 600 communities<br />5,000 uploads of video content (about journal articles, conferences, research news and classes)<br />Growing 4-5% monthly<br />Sustainability - evolving a business model supporting journals and conferences<br />Unleash the full power of the Internet<br />
  54. 54. What Emerged: SciveeCasts<br />Products<br />ApplicationProductPrimary Customers<br />Journals PubCastJournals, publishers, societies<br />Meetings PosterCast Societies, conference orgs.<br />SlideCast<br />Comm. PaperCast Societies, journals<br />Podcast<br />SlideCast<br />Education PosterCast Societies, universities<br />SlideCast<br />BooksBookCast Publishers, book sellers<br />Unleash the full power of the Internet<br />
  55. 55. Summarizing the Reaming of Life<br />By “Life” I mean experiences in the Life Sciences<br />By “Reaming” I mean the the making of something smooth, fast and accurate<br />The Monty Python parody is on conversation cards for getting a dialog going .. <br />The rest is just a few examples of the small ways we are trying to address big problems in the hope they will inspire us all to think more deeply about the problem<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  56. 56. Acknowledgements<br />BioLit Team<br />Lynn Fink<br />Parker Williams<br />Marco Martinez<br />RahulChandran<br />Greg Quinn<br />MBT<br />John Moreland<br />John Beaver<br />Microsoft Scholarly Communications<br />Pablo Fernicola<br />Lee Dirks<br />SavasParastitidas<br />Alex Wade<br />Tony Hey<br />wwPDB team<br />Andreas Prilc<br />DimitrisDimitropoulos<br />SciVee Team<br />Apryl Bailey<br />Leo Chalupa<br />Lynn Fink<br />Marc Friedman (CEO)<br />Ken Liu<br />Alex Ramos<br />Willy Suwanto<br />Ben Yukich<br />http://www.scivee.tv<br />http://biolit.ucsd.edu<br />http//www.pdb.org<br />http://www.codeplex.com/ucsdbiolit<br />
  57. 57. pbourne@ucsd.edu<br />Questions?<br />

×