Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

If Data Are The New Oil, How Do We Prevent Global Warming?

216 views

Published on

Keynote at the University of Cincinnati Data Science Day, March 23, 2017.

Published in: Education
  • Be the first to comment

  • Be the first to like this

If Data Are The New Oil, How Do We Prevent Global Warming?

  1. 1. If Data Are The New Oil, How Do We Prevent Global Warming? Philip E. Bourne, PhD, FACMI The National Institutes of Health http://www.slideshare.net/pebourne philip.bourne@nih.gov University of Cincinnati Data Day 2017 March 23, 2017
  2. 2. Who am I representing and what is my bias? • I am presenting my views, not necessarily those of NIH • Total data parasite • Unnatural interest in scholarly communication • Co-founded and founding EIC PLOS Computational Biology – OA advocate • Prior co-Director Protein Data Bank • Amateur student researcher in scholarly communication 2
  3. 3. I appreciate this is a day to focus on data, but .. I don’t think you can consider data in isolation from the analytics associated with that data and indeed the knowledge derived from both.
  4. 4. The Knowledge versus Data Landscape • Knowledge • Largely a for-profit business with limited input into that business from the producers of scholarship • Some open access (OA), costs shifted from consumer to producer • Full accessibility for non-OA is constrained/controlled • Funders able to influence the landscape eg PubMed Central • Sustainable! • An analog system functioning in a digital world – aka not born digital • Data • Largely left to governments to support • Mostly OA • • Funders control the landscape • Not sustainable • Mostly born digital 4
  5. 5. Some Shared Issues … • Reproducibility • Comprehension / communication • Quality
  6. 6. Reproducibility Examples From My Own Work It took several months to replicate this work this work … And just last week… Phew… http://www.sdsc.edu/pb/kinases/
  7. 7. Tools Fix This Problem Right? • Extracted all PMC papers with associated Jupyter notebooks available • Approx 100 • Took a random sample of 25 • Only 1 ran out of the box • Several ran with minor modification • Others lacked libraries, sufficient details to run etc. It takes more than tools.. It takes incentives … Daniel Mietchen 2017 Personal Communication
  8. 8. 1. A link brings up figures from the paper 0. Full text of paper stored in a database – one view 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results One Hypothetical End Point • Paper is one attributable view of the knowledge • User clicks on a static image • Metadata and data provide direct further analysis - an executable paper • Private and public annotations revealed • Selecting a feature forms a query for yet further knowledge • That knowledge rendered as a knowledge graph rather than a paper 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. PLoS Comp. Biol. 2005 1(3) e34 8
  9. 9. So how do we get there? Well first…..
  10. 10. Source Washington Post On November 6, 2012, Donald Trump tweeted: "The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non-competitive." We Need Relationships Built on Trust
  11. 11. Trust Becomes Even More Important as We Move to Platforms Sangeet Paul Choudary https://www.slideshare.net/sanguit
  12. 12. The Research Pipeline IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
  13. 13. Tools and Resources Will Continue To Be Developed IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  14. 14. And Become More Interconnected IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication 3/01/14 2014 SPARC Annual Meeting 14
  15. 15. Until We Become a Platform IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  16. 16. Consider an example of an existing platform….
  17. 17. • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: • 60 million users searching 2 million listings in 192 countries • Average of 500,000 stays per night. • Evaluation of US $25bn Bonazzi & Bourne 2017, PLOS Biology, In Press
  18. 18. Platforms are Certainly Not Without Issues
  19. 19. Nevertheless It would seem we need to move in this direction if we are to solve the many issues swirling around scholarly communication …
  20. 20. Is not biomedical research the same?
  21. 21. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt
  22. 22. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Platforms - The Situation Today
  23. 23. In summary there is not currently a widely adopted single platform for the exchange of services in biomedical research. Either there is a platform per service or no platform at all. Why have we not done better and what are the impediments today?
  24. 24. Impediments to a biomedical platform • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to- employee-innovation/#8bdbaa811133
  25. 25. The NIH through the Big Data to Knowledge (BD2K) and others are experimenting with a platform, keeping in mind the need to overcome these impediments Enter The Commons https://en.wikipedia.org/wiki/Ealing_Common#/media/File:Eali ng_Common_-_geograph.org.uk_-_17075.jpg
  26. 26. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Commons – Initial focus is on integrating two layers of the scholarly workflow
  27. 27. Commons Topology Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons
  28. 28. “I really admire Airbnb as a pioneer of the sharing economy and for building community. They've found an elegant way to help hosts make more money and for guests to have authentic experiences. It brings those people together in a unique way. “ Logan Green
  29. 29. “The Commons is one effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way. “ Phil Bourne
  30. 30. Acknowledgements • Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS) • NLM/NCBI: Mike Huerta, George Komatsoulis • NHGRI: Valentina di Francesco • NIGMS: Susan Gregurick • CIT: Debbie Sinmao, Andrea Norris • NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr • NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen • Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI) • RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI) • OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke, Bonazzi & Bourne 2017, PLOS Biology, In Press

×