Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research Life Cycle for GeoData 2014


Published on

Presentation on challenges for research data management and the data life cycle, for GeoData meeting in Boulder, 18 June 2014.

Published in: Science, Technology, Education
  • Be the first to comment

  • Be the first to like this

Research Life Cycle for GeoData 2014

  1. 1. The Research Data Life Cycle From Flickr by Velo Steve Carly Strasser California Digital Library GeoData 18 June 2014
  2. 2. Why don’t people share data? Is data management being taught? Do attitudes about sharing differ among disciplines? What role can libraries play in data education? How can we promote storing data in repositories? What barriers to sharing can we eliminate? NSF funded DataNet Project Office of Cyberinfrastructure
  3. 3. Enable data sharing Encourage new incentives Think about code sharing Work with libraries, publishers and researchers Explore new tools to help change system Build tools
  4. 4. FromFlickrbygsagostinho Outreach Education Assistance You’re doing it wrong!
  5. 5. Back in the day… Da Vinci Curie Newton Darwin
  6. 6. Research has changed Better
  7. 7. From wikimedia Such Internet! So many tools! From Flickr by John Jobby So much data!
  8. 8. Research has changed Worse
  9. 9. Digital data FromFlickrbyFlickmor FromFlickrbyUSArmyEnvironmentalCommand FromFlickrbyDW0825 C. Strasser CourteseyofWHOI FromFlickrbydeltaMike
  10. 10. Digital data + Complex workflows
  11. 11. From Flickr by ~Minnea~ Reproducibility Data management Documentation
  12. 12. “Reproducibility Crisis” “Digital Dark Age” “Erosion of Trust”
  13. 13. “I own my data and you can’t have it.” “Let me do my work.” “I’m already too busy.” “This takes away from research time.”
  14. 14. h/t Ted Hart, NEON
  15. 15. Data can’t be owned. You can be the Guardian Steward Caretaker  
  16. 16. Plan Collect Assure Describe Preserve Discover Integrate Analyze The Data Life Cycle
  17. 17. Discussion topics End game Stakeholders & responsibilities Compliance Costs Follow-up Peer review Concrete steps
  18. 18. Liz Lyon: Dealing with Data 2008 UK funder expectations 2009 2009-­‐10   DMPs: A Short History
  19. 19. Federal Funding Accountability and Transparency Act 2006 Across the Pond… 2010 2010  – present     DMPs: A Short History
  20. 20. … “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.” Feb 2013
  21. 21. From  Calisphere,    Courtesy  of    UC  Riverside,  California  Museum  of  Photography   What do researchers think?
  22. 22. They don’t know about policies. John  Kratz,  CLIR/DLF  Postdoc  at  CDL  
  23. 23. They aren’t taught data management. Quality control and quality assurance The proper way to name computer files Types of files and software to use Metadata generation Workflows Protecting data Databases and data archiving Data re-use Meta-analysis Data sharing Reproducibility Notebook protocols (lab or field) Strasser  &  Hampton  2013.   “Undergraduates  &  Ecological   Data  Management  Training  in  the   US”.    DOI:10.1890/ES12-­‐00139.1  
  24. 24. 0   10   20   30   40   50   60   70   BAS   RU   In Curriculum? They aren’t taught data management.
  25. 25. No  one  reads  it   anyway.   It’s  an  unfunded   mandate.  I  wrote  it  the  night   before.   They aren’t concerned.
  26. 26. What does success look like? DMPs… •  are flexible •  are useful and used •  result in easily discoverable data •  linked to open data •  are created in partnership with institutional service providers •  are used as a/n (automated) compliance tool •  are part of the workflow of research •  include digital and non-digital materials (where relevant)
  27. 27. “Community-driven” But what if community doesn’t care (yet)? “Generic, work for everyone” But community-specific standards
  28. 28. Current DMP tools FromFlickrbymhlradio
  29. 29. Step-by-step wizard for generating DMP Create | edit | re-use | share | save | generate Open to community DMPonline:
  30. 30. Step-by-step wizard for generating DMP Create | edit | re-use | share | save | generate Open to community DMPTool:
  31. 31. IEDA Data Management Plan Tool
  32. 32.
  33. 33. We want templates!
  34. 34. Plan Collect Assure Describe Preserve Discover Integrate Analyze The Data Life Cycle
  35. 35. Scientists are bad at data management. still <
  36. 36. From  Flickr  by  iowa_spirit_walker   •  Cost •  Confusion about standards •  Lack of training •  Fear of lost rights or benefits •  No incentives
  37. 37. Data are being recognized as first class products of research From Flickr by Richard Moross NSF bio-sketches can include data Data Publication Data Citation
  38. 38. Journals Funders Peers From Flickr by Eva Rinaldi Celebrity and Live Music Photographer
  39. 39. science source notebook content access data government knowledge FromFlickrbycdsessums
  40. 40. Plan Collect Assure Describe Preserve Discover Integrate Analyze The Data Life Cycle
  41. 41. “Data Publication”
  42. 42. John Kratz, CLIR Postdoc
  43. 43. What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy* Data are *peer reviewed? certified? Props to Sarah Callaghan & colleagues
  44. 44. Available | Citable | Trustworthy Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access. “Email me!” CC-0 on web
  45. 45. Simple case… Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier Available | Citable | Trustworthy Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc
  46. 46. More complicated… Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing? Available | Citable | Trustworthy
  47. 47. Technical VS. Scientific Sometimes consider impact and/or novelty Guidelines provided Available | Citable | Trustworthy From Flickr by Percival Lowell
  48. 48. 1.  Data as supplemental material Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability. What does a data publication look like? From Flickr by subsetsum
  49. 49. 2.  Data paper: Data + descriptive “data paper” Most require data be in a trusted repository. All have a component of peer review. Examples: •  Standalone journals: Nature Scientific Data, Geoscience Data Journal, Ecological Archives •  Journals that publish data papers: GigaScience, F1000 Research, Internet Archaeology What does a data publication look like? From Flickr by subsetsum
  50. 50. 3.  Standalone data Data published without a related journal article. Rich metadata (structured or unstructured) Examples: •  Open Context •  NASA PDS Peer Review Data •  figshare (but no validation) What does a data publication look like? From Flickr by subsetsum
  51. 51. “Publish” “Paper” “Peer review” “Sharing” “Available” “Article” “Publication”
  52. 52. From Flickr by Sandia Labs C. Strasser C. Strasser World Bank Photo Collection From Flickr What do researchers think of data publication?
  53. 53. We have our work cut out for us.
  54. 54. Okay, I’ll share it. Where do I put it?
  55. 55. Repositories for data General content Non-institutional Publishers/for-profits Other Institutional Discipline-specific Repository choices…
  56. 56. Institutional Discipline-specific •  All data associated with a paper •  Tells a story •  Clearinghouse for researcher’s works •  Some of data for a given paper •  Discoverable •  Integrated systems •  Collection policies ?   Both Which should a researcher use? Which is more important? Depends Repository choices…
  57. 57. Simplify data deposit for UC researchers Branded for campus Merritt underneath the hood
  58. 58.
  59. 59.
  60. 60. From  Flickr  by  dotpolka   Hard work Shifting norms Exciting times
  61. 61. Website Email Twiter Slides @carlystrasser
  62. 62. From  Flickr  by  dotpolka   Hard work Shifting norms Exciting times
  63. 63. Website Email Twiter Slides @carlystrasser