Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Second Open Economics Workshop - Thoughts from the Biosciences


Published on

A brief description of best practices and what needs to be considered regarding open data. A view from the biomedical sciences.

  • Be the first to comment

  • Be the first to like this

Second Open Economics Workshop - Thoughts from the Biosciences

  1. 1. Thoughts from theBiomedical SciencesPhilip E. BourneUCSDpbourne@ucsd.eduSecond Open Economics Workshop 1June 11, 2013
  2. 2. My Perspective is Drawn from Being: A data producer and a data user* An overseer of data curation efforts A database provider (PDB & IEDB) Suspicious of workshop reports, datastandards bodies … A supporter of data publication An open access journal founder OpinionatedSecond Open Economics Workshop 2June 11, 2013
  3. 3. The Big Picture The Good News:– NLM – Entrez - A GreatJob– Opendata/software/papershave spawned scienceand jobs– Success stories: Encode,PDB– D2K? The Bad News:– We have resources butnow they are perceivedas silos– Lack of reproducibilityrevealed– Sustainability is unsolved– Failures: CaBIG,DataNet– D2K?June 11, 2013 Second Open Economics Workshop
  4. 4. The Big Picture – What is the WayForward? Driven by scientific outcomes – not build itand they will come Community, community, - which means:– A simple vision that many stakeholders can buy into– Transparency– Shared ownership– A code of conduct– A reward system for individuals and teams– Strategic policies eg open access, data sharing plans– Use resources as drivers – funding bodies, societies,institutions have a role here– Building trust through quality data/softwareJune 11, 2013 Second Open Economics Workshop 4
  5. 5. Worldwide Protein Data Bankwww.wwpdb.orgPersonal Experiences to Support MyBig Picture ViewJune 11, 2013 Second Open Economics Workshop 5
  6. 6. Its All About Trust6Second Open Economics WorkshopPDBTrust in the datais perhaps ourbiggest achievement
  7. 7. Its All About Trust Trust is like compound interest Comes from listening Comes from engaging the community inevery aspect of the process Comes from data consistency and level ofannotation Comes from responsiveness Comes from the quality of the delivery service7Second Open Economics WorkshopJune 11, 2013
  8. 8. Data Quality Begats Trust About 25% of our budget has been spent on dataremediation Support for versioning hence the copy of record Our ontology/data model has been a criticalcomponent of our workflow and data accuracy Until recently the same data model was too complexto facilitate wide adoption by others that use our dataSecond Open Economics Workshop 8June 11, 2013
  9. 9. All About PeopleCurators are the Unsung Heroes• They really should domore to promotethemselves• Institutions must domore to respect theirefforts9
  10. 10. Its All About PeopleThe Users Constantly striving to have the user distinguish rawfrom derived data All data are not created equal but the user thinks soSecond Open Economics Workshop 10June 11, 2013
  11. 11. Its All About PeopleThe Global Personalities11 Second Open Economics Workshop
  12. 12. Its NOT All About Institutions As far as I am aware no data standards bodyhas directly influenced anything we havedone in 15 years of running the PDB The structural biology community created avery successful data sharing plan long beforefunding bodies did12Second Open Economics WorkshopJune 11, 2013
  13. 13. It is About Openness There are no restrictions on the usage of thedata beyond attribution The PDB runs exclusively on open sourcesoftware We maintain and contribute to the Biojavarepository We need to be transparent about data usageSecond Open Economics Workshop 13June 11, 2013
  14. 14. Worldwide Protein Data Bankwww.wwpdb.orgSo What Needs to Change re Data?Second Open Economics Workshop 14June 11, 2013
  15. 15. That All Data Are Created EqualMust End We need to understandhow data are used Sustainability is notmore money from thefunding agencies itsabout business models Reductionism is not adirty word – ReferenceData! We need to do morewith the long tailSecond Open Economics WorkshopOn the Future of Genomic DataScience 11 February 2011:vol. 331 no. 6018 728-729June 11, 2013
  16. 16. Institutions That Generate Data MustPlay a Greater Role We need institutional data sharing plans We need data scientists to be betterrecognized by institutions – its not all aboutpapers – this implies new metricsSecond Open Economics Workshop 16June 11, 2013
  17. 17. – Tim Clark– Ivan Herman– Paul Groth– Ed Hovy– Maryann Martone– Cameron Neylon– David Shotton– Anita de Waard Beyond the PDF Many othersSecond Open Economics WorkshopFunding Agencies:NSF, NIGMS, DOE, NLM, NCI,NCRR, NIBIB, NINDS, NIDDK17AcknowledgementsJune 11, 2013
  18. 18. The {Lack of} Distinction Between Dataand Knowledge Needs to be BetterAppreciated• The PDB paper has been cited 14,000 times• No one has ever read it• Some PDB datasets have 1,000’s of downloads• These data are not associated with publications 18