Your SlideShare is downloading. ×
0
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Jim Gray Award Lecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Jim Gray Award Lecture

2,762

Published on

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,762
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Reaming of Life<br />Philip E. Bourne<br />University of California San Diego<br />pbourne@ucsd.edu<br />Jim Gray eScience Award Lecture<br />Oct. 12, 2010<br />
  • 2. Disclaimer<br />I am a domain (life) scientist not a computer or information scientist<br />I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground<br />I am part of the long tail<br />I am naïve, but I am the majority<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  • 3. The Reaming of Life -What on Earth is He Talking About?<br />A reamer is a tool for turning a roughly punched hole into an accurate and smooth one<br />The digital data deluge has punched that rough hole<br />For the life {other?} sciences to optimally advance we need an accurate and smooth conduit through which data can be distilled, analyzed, visualized, distributed and above all else comprehended<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  • 4. … and we need to accelerate the process by which this is donehere is why ….<br />This is just another way of saying what Jim said and is embodied in the Fourth Paradigm<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  • 5. The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />By the time the paper is published<br /> we could all be dead<br />http://knol.google.com/k/plos-currents-influenza#<br />Motivation<br />
  • 6. In a time of crisis the need for fast access <br />to accurate data and any knowledge of<br />that data are paramount<br />Structure Summary page activity for<br />H1N1 Influenza related structures<br />Jan. 2008<br />Jan. 2009<br />Jan. 2010<br />Jul. 2009<br />Jul. 2008<br />Jul. 2010<br />3B7E: Neuraminidase of A/Brevig Mission/1/1918 <br />H1N1 strain in complex with zanamivir<br />1RUZ: 1918 H1 Hemagglutinin<br />* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm<br />Motivation<br />
  • 7. If that is not enough…For some people the scientific process may be too slow to save their life<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Motivation<br />
  • 8. Josh Sommer – A Remarkable Young ManCo-founder &amp; Executive Director the Chordoma Foundation<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 9. Chordoma<br />A rare form of brain cancer<br />No known drugs<br />Treatment – surgical resection followed by intense radiation therapy<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG<br />Motivation<br />
  • 10. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 11. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 12. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 13. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />If I have seen further it is only by<br />standing on the shoulders of giants<br />Isaac<br />Isaac Newton<br />From Josh’s point of view the climb <br />up just takes too long<br />&gt; 15 years and &gt; $850M to be <br />more precise<br />Adapted: http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 14. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 15. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://sagecongress.org/Presentations/Sommer.pdf<br />Motivation<br />
  • 16. Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation<br />Motivation<br />
  • 17. Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion Here are a few big things …<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  • 18. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  • 19. We Need Data and Knowledge About That Data to Interoperate<br />The Knowledge and Data Cycle<br />0. Full text of PLoS papers stored <br />in a database<br />4. The composite view has<br />links to pertinent blocks <br />of literature text and back to the PDB<br />User clicks on content<br />Metadata and webservices to data provide an interactiveview that can be annotated<br />Selecting features provides a data/knowledge mashup<br />Analysis leads to new content I can share<br />4.<br />1.<br />3. A composite view of<br />journal and database<br />content results<br />1. A link brings up figures <br />from the paper<br />3.<br />2.<br />2. Clicking the paper figure retrieves<br />data from the PDB which is<br />analyzed<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  • 20. We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?<br />Governance – publishers vs. database providers<br />Reward<br />Metadata standards for provenance, privacy etc.<br />Exemplars<br /> ….<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Caveat: Each discipline is different – I speak very much from a biomedical<br />sciences perspective<br />
  • 21. Certainly the Argument for Interoperability in the Biomedical Sciences is Strong<br />1078 databases reported in NAR 2008<br />MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times<br />PubMed contains 18,792,257 entries<br />~100,000 papers indexed per month<br />In Feb 2009:<br />67,406,898 interactive searches were done<br />92,216,786 entries were viewed<br />Data as of April 14, 2009<br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  • 22. A Small Example - The World Wide Protein Data Bank<br />The single worldwide repository for data on the structure of biological macromolecules<br />Vital for drug discovery and the life sciences<br />39 years old<br />Free to all<br />http://www.wwpdb.org<br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  • 23. The World Wide Protein Data Bank – The Best Case Scenario<br />Paper not published unless data are deposited – strong data to literature correspondence<br />Highly structured data conforming to an extensive ontology<br />DOI’s assigned to every structure<br />http://www.wwpdb.org <br />We need data and knowledge about that data to interoperate<br />PLoS Comp. Biol. 2005 1(3) e34<br />
  • 24. Example Interoperability: The Database View<br />www.rcsb.org/pdb/explore/literature.do?structureId=1TIM<br />We need data and knowledge about that data to interoperate<br />BMC Bioinformatics 2010 11:220<br />
  • 25. Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu<br />Nucleic Acids Research 2008 36(S2) W385-389<br />We need data and knowledge about that data to interoperate<br />
  • 26. ICTP Trieste, December 10, 2007<br />We need data and knowledge about that data to interoperate<br />
  • 27. Semantic Tagging &amp; Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Will Widgets and Semantic Tagging Change Computational Biology? <br />PLoS Comp. Biol. 6(2) e1000673<br />We need data and knowledge about that data to interoperate<br />
  • 28. Semantic Tagging of Database Content in The Literature or Elsewhere<br />http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp<br />PLoS Comp. Biol. 6(2) e1000673<br />Semantic Tagging<br />
  • 29. We need data and knowledge about that data to interoperate<br />
  • 30. The Publishers are Starting to Do It<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />From Anita de Waard, Elsevier <br />
  • 31. This is Literature Post-processingBetter to Get the Authors Involved<br />Authors are the absolute experts on the content<br />More effective distribution of labor<br />Add metadata before the article enters the publishing process<br />We need data and knowledge about that data to interoperate<br />
  • 32. Word 2007 Add-in for authors<br />Allows authors to add metadata as they write, before they submit the manuscript<br />Authors are assisted by automated term recognition<br />OBO ontologies<br />Database IDs<br />Metadata are embedded directly into the manuscript document via XML tags, OOXML format<br />Open<br />Machine-readable<br />Open source, Microsoft Public License<br />http://www.codeplex.com/ucsdbiolit<br />We need data and knowledge about that data to interoperate<br />
  • 33. Challenges<br />Authors <br />Carrot IF one or more publishers fast tracked a paper that had semantic markup it might catch on<br />Publishers<br />Carrot Competitive advantage<br />We need data and knowledge about that data to interoperate<br />
  • 34. The Promise – A Hypothetical Example<br />Cardiac Disease<br />Literature<br />Immunology Literature<br />Shared Function<br />We need data and knowledge about that data to interoperate<br />
  • 35. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  • 36. One Small Example – The Molecular Biology Toolkit (MBT)<br />jMol, VMD … are de facto standard important tools for rendering biological molecules .. but<br />They are not versatile ie do not for example:<br />Respond to the data they are reading<br />Offer views that match the users interests<br />Allow the user to annotate the data<br />Allow those annotations to be shared (published?)<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Think More About the Tools<br />
  • 37. MBT Featureshttp://mbt.sdsc.edu<br />Offer a framework not an end user application<br />Responds to the data type<br />Support read write access<br />Encourages others to write end user applications<br />Discourages feature creep<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Immunologists<br />Immunome Research, 2007 3(1):3 <br />Medicinal<br />Chemists<br />BMC Bioinformatics 2005, 6:21.<br />Think More About the Tools<br />
  • 38. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  • 39. Reward Systems Need to ChangeWhat is Needed?<br />Author disambiguation<br />Auditing (identification and metrics) of all scholarship - means new tools<br />Seniors need to promote alternative forms of scholarship<br />Juniors need to respond<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia <br />PLoS Comp Biol to appear<br />Reward Systems Need to Change<br />
  • 40. Example Tools<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />http://www.researcherid.com/<br />http://pubnet.gersteinlab.org/<br />http://www.biomedexperts.com<br />
  • 41. What Are these Alternative Forms of Scholarship?<br />Reviews<br />Curation<br />Research<br />[Grants]<br />Journal<br />Article<br />Poster<br />Session<br />Conference<br />Paper<br />Blogs<br />Community Service/Data<br />Reward Systems Need to Change<br />
  • 42. Reward Systems Need to Change<br />
  • 43. A Unique Identifier is Going to Happen <br />It is DOIs for people<br />Some scientists will resist<br />The winner is ORCHID?<br />Reward Systems Need to Change<br />
  • 44. Ideally the ID will be Tagged to Every Piece of Scholarly Communication<br />I an Not a Scientist I am a Number<br />PLoS Comp. Biol. 2008 4(12) e1000247<br />Reward Systems Need to Change<br />
  • 45. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  • 46. The Truth About My Laboratory<br />I have ?? mail folders!<br />The intellectual memory of my laboratory is in those folders<br />This is an unhealthy hub and spoke mentality<br />We Need Scientist Management Tools<br />
  • 47. The Truth About My Laboratory<br />I generate way more negative that positive data, but where is it? <br />Content management is a mess<br />Slides, posters…..<br />Data, lab notebooks ….<br />Collaborations, Journal clubs …<br />Software is open but where is it?<br />Farewell is for the data too<br />http://artbyvida.com/portfolio.php<br />Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136<br />We Need Scientist Management Tools<br />
  • 48. Many Great Tools Out There<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Taverna<br />We Need Scientist Management Tools<br />
  • 49. Where I See the Problems<br />The long tail is confused<br />Lack of interoperability between the options<br />The reward (publishing) is still removed from the available tools<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />We Need Scientist Management Tools<br />
  • 50. A Few Things to Accelerate the Rate of Scientific Discovery<br />Better communication, data and knowledge access, and new modes of discovery, which means:<br />We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives<br />We need to be more open with both<br />We need to think more about the tools that analyze, visualize and annotate data to maximize knowledge discovery<br />Reward systems need to change<br />We need scientist management tools<br />We need to be less fixated on the big data problems<br />We need to unleash the full power of the Internet<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />Hard<br />Easy<br />
  • 51. Yes YouTube Can Increase the Rate of Discovery<br />Unleash the full power of the Internet<br />
  • 52. The Lab ExperimentPaper+Rich Media<br />My students enjoyed the experience<br />The shyest student was actually the most bold in front of the camera<br />“We will become a generation of “science castors”<br />They liked the exposure for the most part – rather than the PI it puts them out in front<br />Unleash the full power of the Internet<br />
  • 53. Organic Growth<br />3 Years Later<br />www.scivee.tv<br />Some of their work viewed 20,000+ times<br />Global audience of researchers, educators and academic/research institutions<br />60,000 unique visitors &amp; 2M pageviews/month<br />16,000 registered users &amp; 600 communities<br />5,000 uploads of video content (about journal articles, conferences, research news and classes)<br />Growing 4-5% monthly<br />Sustainability - evolving a business model supporting journals and conferences<br />Unleash the full power of the Internet<br />
  • 54. What Emerged: SciveeCasts<br />Products<br />ApplicationProductPrimary Customers<br />Journals PubCastJournals, publishers, societies<br />Meetings PosterCast Societies, conference orgs.<br />SlideCast<br />Comm. PaperCast Societies, journals<br />Podcast<br />SlideCast<br />Education PosterCast Societies, universities<br />SlideCast<br />BooksBookCast Publishers, book sellers<br />Unleash the full power of the Internet<br />
  • 55. Summarizing the Reaming of Life<br />By “Life” I mean experiences in the Life Sciences<br />By “Reaming” I mean the the making of something smooth, fast and accurate<br />The Monty Python parody is on conversation cards for getting a dialog going .. <br />The rest is just a few examples of the small ways we are trying to address big problems in the hope they will inspire us all to think more deeply about the problem<br />Oct. 12, 2010<br />Jim Gray eScience Award Lecture<br />
  • 56. Acknowledgements<br />BioLit Team<br />Lynn Fink<br />Parker Williams<br />Marco Martinez<br />RahulChandran<br />Greg Quinn<br />MBT<br />John Moreland<br />John Beaver<br />Microsoft Scholarly Communications<br />Pablo Fernicola<br />Lee Dirks<br />SavasParastitidas<br />Alex Wade<br />Tony Hey<br />wwPDB team<br />Andreas Prilc<br />DimitrisDimitropoulos<br />SciVee Team<br />Apryl Bailey<br />Leo Chalupa<br />Lynn Fink<br />Marc Friedman (CEO)<br />Ken Liu<br />Alex Ramos<br />Willy Suwanto<br />Ben Yukich<br />http://www.scivee.tv<br />http://biolit.ucsd.edu<br />http//www.pdb.org<br />http://www.codeplex.com/ucsdbiolit<br />
  • 57. pbourne@ucsd.edu<br />Questions?<br />

×