Looking for Data: Finding New Science

824 views

Published on

Keynote for STM innovations seminar 2014: http://www.stm-assoc.org/events/stm-innovations-seminar-u-s-2014/

Published in: Science, Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
824
On SlideShare
0
From Embeds
0
Number of Embeds
40
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Looking for Data: Finding New Science

  1. 1. Looking for Data: Finding New Science Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  2. 2. Why should science publishers care about Research Data?Funding bodies:  Demonstrate impact  Guarantee permanence, discoverability  Avoid fraud  Avoid double funding  Serve general public Research Management/Libary:  Generate, track outputs  Comply with mandates  Ensure availability Phil Bourne, (then) Associate Vice Chancellor, UCSD, 4/13: “We need to think about the university as a digital enterprise.” Mike Huerta, Ass. Director NLM: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.” Researchers:  Derive credit  Comply with mandates  Discover and use  Cite/acknowledge Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!” Barbara Ransom, NSF Program Director Earth Sciences: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”
  3. 3. Research data management today: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  4. 4. Prepare Observe Analyze Ponder Communicate Prepare Observe Analyze Ponder Communicate Most of biology is quite insular
  5. 5. But it is also VERY complicated: http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg • Interspecies variability: A specimen is not a species • Gene expression variability: Knowing genes is not knowing how they are expressed • Microbiome: An animal is an ecosystem • Systems biology: A whole is more than the sum of its parts • Male researchers stress out rodents! Reductionist science does not work for living systems! Statistics to the rescue!
  6. 6. What if the research data was connected? Prepare Analyze Communicate Prepare Analyze Communicate Observations Observations Observations Across labs, experiments: track reagents and how they are used
  7. 7. Prepare Analyze Communicate Prepare Analyze Communicate Observations Observations Observations Compare outcome of interactions with these entities What if the research data was connected?
  8. 8. Prepare Analyze Communicate Prepare AnalyzeCommunicate Observations Observations Observations Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments Think What if the research data was connected?
  9. 9. Maslow Hierarchy of Research Data Needs: Use ful Trusted Reproducible Discoverable Comprehensible Archived Accessible Preserved in digital format
  10. 10. 1: Urban Legend How can we make a standard neuroscience wet lab store and share their data? • Incorporate structured workflows into the daily practice of a typical electrophysiology lab (the Urban Lab at CMU) – What does it take? – Where are points of conflict? • 1-year pilot, funded by Elsevier RDS: – CMU: Shreejoy Tripathy, manage/user test – Elsevier: development, UI, project management • Next steps: NIH grant to scale up to 4 labs Use ful Trusted Reproducible Discoverable Comprehensible Archived Accessible Preserved in digital format
  11. 11. de Waard, A., Burton, S. et al., 2013 Urban Legend Components
  12. 12. Data Entry App:
  13. 13. Data dashboard (e.g. SDB140225c4_onbeam_CC)
  14. 14. 2: Moonrocks How can we scale up data curation? Pilot project with IEDA: • Build a database for lunar geochemistry • Leapfrog & improve curation time • Write joint report on processes, costs and challenges • 1-year pilot, funded by Elsevier • Next step: NSF grant on schema’s > spreadsheets Use ful Trus- ted Reprodu- cible Discoverable Comprehensible Archived Accessible Preserved in digital format
  15. 15. Moonrocks Data Import: Moonrocks: pushing data curation to the researcher
  16. 16. 3: How do we improve how data (and software) are published? • Eg with the Virtual Microscope • Or Interactive Plots • Or Executable Papers Use ful Trusted Reprodu-cible Discoverable Comprehensible Archived Accessible Preserved in digital format
  17. 17. Let’s support the needs of research data! Experimental Metadata: Workflows, Samples, Settings, Reagents, Organisms, etc. Record Metadata: DOI, Date, Author, Institute, etc. Processed Data: Mathematically/computationally processed data: correlations, plots, etc. Raw Data: Direct outputs from equipment: images, traces, spectra, etc. Methods and Equipment: Reagents, settings, manufacturer’s details, etc. Validation: Approval, Reproduction, Selection, Quality Stamp Use ful Trusted Reproducib le Discoverable Comprehensible Archived Accessible Preserved in digital format Morecuration Moreusable
  18. 18. Anita de Waard a.dewaard@elsevier.com Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy • UCSD: Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF: Maryann Martone, Anita Bandrowski • OHSU: Melissa Haendel, Nicole Vasilevsky • Columbia/IEDA: Kerstin Lehnert, Leslie Hsu • MIT: Micah Altman Thank you! http://researchdata.elsevier.com/

×