Data management plans (dmp) for nsf


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data management plans (dmp) for nsf

  1. 1. Brad Houston University Records Officer December 2, 2011
  2. 2. <ul><li>Document describing data (and/or digital materials) that have been or will be gathered in a study or project. </li></ul><ul><li>Often includes details on how data will be organized, preserved, and accessed </li></ul><ul><li>Facilitates re-use of data sets by either PI or other researchers </li></ul><ul><li>Required component of grants for MANY agencies (NSF and NIH) </li></ul>
  3. 3. <ul><li>Starting January 2011 for NEW, non-collaborative proposals </li></ul><ul><li>Not voluntary – “integral part” of proposal </li></ul><ul><li>Data Management Plans for all data resulting from any level of NSF funding </li></ul><ul><li>Supplementary 2-page document (max) </li></ul><ul><li>Optional: Also part of 15-page (max) Project Description </li></ul>
  4. 4. <ul><li>Must address both physical and digital data </li></ul><ul><li>“ Efficiency and effectiveness” of the DMP will be considered by NSF and disciplinary division or directorate </li></ul><ul><li>Must include sufficient information that peer reviewers and project monitors can assess present proposal and past performance </li></ul><ul><li>As of January 2011, proposals will not be accepted without an accompanying data plan! </li></ul>
  5. 5. <ul><ul><li>Such dissemination of data is necessary for the community to stimulate new advances as quickly as possible and to allow prompt evaluation of the results by the scientific community. “ – NSF (italics mine) </li></ul></ul><ul><ul><li>Part of Openness trend in federal government ( - Open Government Initiative) </li></ul></ul><ul><ul><li>NIH Public Access Policy (2008) </li></ul></ul><ul><ul><li>Public access to federally funded research hearings - Information Policy, Census and National Archives Subcommittee of U.S. Congress (July, 2010) </li></ul></ul>
  6. 6. <ul><li>It makes your research easier! </li></ul><ul><li>Data available in case you need it later </li></ul><ul><li>Helps avoid accusations of fraud or bad science </li></ul><ul><li>To share it for others to use and learn from </li></ul><ul><li>To get credit for producing it </li></ul><ul><li>To keep from drowning in irrelevant stuff </li></ul><ul><ul><li>... especially at grant/project end </li></ul></ul>
  7. 7. <ul><li>Gene expression microarray data: “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.” </li></ul><ul><ul><li>Piwowar, Heather et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI: 10.1371/journal.pone.0000308 </li></ul></ul><ul><li>Maybe there’s an advantage here! </li></ul>
  8. 8. <ul><li>Discuss specific requirements for NSF Data Management plans </li></ul><ul><li>Suggest ways to manage, share, and archive data more effectively </li></ul><ul><li>Provide resources for more information </li></ul>
  9. 9. <ul><li>Planning, retention, data defined </li></ul>
  10. 10. <ul><li>What data are you collecting or making? </li></ul><ul><li>Can it be recreated? How much would that cost? </li></ul><ul><li>How much of it? How fast is it growing? Does it change? </li></ul><ul><li>What file format(s)? </li></ul><ul><li>What’s your infrastructure for data collection and storage like? </li></ul><ul><li>How do you find it, or find what you’re looking for in it? </li></ul><ul><li>How easy is it to get new people up to speed? Or share data with others? </li></ul>
  11. 11. <ul><li>Who are the audiences for your data? </li></ul><ul><ul><li>You (including Future You), your lab colleagues (including future ones), your PIs </li></ul></ul><ul><ul><li>Disciplinary colleagues, at your institution or at others </li></ul></ul><ul><ul><li>Colleagues in allied disciplines </li></ul></ul><ul><ul><li>The world! </li></ul></ul><ul><li>What are your obligations to others? </li></ul><ul><ul><li>Funder requirements </li></ul></ul><ul><ul><li>Confidentiality issues </li></ul></ul><ul><ul><li>IP questions </li></ul></ul><ul><ul><li>Security </li></ul></ul>
  12. 12. <ul><li>How do you and your lab get from where you are to where you need to be? </li></ul><ul><ul><li>Document, document, document all decisions and all processes! </li></ul></ul><ul><li>Secret sauce: the more you strategize upfront, the less angst and panic later. </li></ul><ul><ul><li>“ Make it up as you go along” is very bad practice! </li></ul></ul><ul><ul><li>But the best-laid plans go agley... so be flexible. </li></ul></ul><ul><ul><li>And watch your field! Best practices are still in flux. </li></ul></ul>
  13. 13. <ul><li>Four kinds of data defined by OMB: </li></ul><ul><ul><li>Observational </li></ul></ul><ul><ul><ul><li>Examples: Sensor data, telemetry, survey data, sample data, neuroimages. </li></ul></ul></ul><ul><ul><li>Experimental </li></ul></ul><ul><ul><ul><li>Examples: gene sequences, chromatograms, toroid magnetic field data. </li></ul></ul></ul><ul><ul><li>Simulation </li></ul></ul><ul><ul><ul><li>Examples: climate models, economic models. </li></ul></ul></ul><ul><ul><li>Derived or compiled </li></ul></ul><ul><ul><ul><li>Examples: text and data mining, compiled database, 3D models, data gathered from public documents. </li></ul></ul></ul>
  14. 14. <ul><li>Preliminary analyses </li></ul><ul><ul><li>Raw data is included in this definition </li></ul></ul><ul><li>Drafts of scientific papers </li></ul><ul><li>Plans for future research </li></ul><ul><li>Peer reviews or communications with colleagues </li></ul><ul><li>Physical objects, such as gel samples </li></ul>
  15. 15. <ul><li>As early as possible, but no later than guidelines laid down by relevant Directorate </li></ul><ul><ul><li>Engineering Section: “no later than the acceptance for publication of the main findings of the final data” </li></ul></ul><ul><ul><li>Earth Sciences: “No later than two (2) years after the data were collected.” </li></ul></ul><ul><ul><li>Social and Economic Sciences: “within one year after the expiration of an award” </li></ul></ul><ul><li>Be aware of concerns that may require earlier or later disclosure </li></ul><ul><ul><li>FERPA? Human Subjects data? HIPAA? </li></ul></ul>
  16. 16. <ul><li>Again, specific retention periods will depend on the type of data and the Directorate </li></ul><ul><ul><li>Example: Engineering Section suggests retention period of “three years after either completion of the grant project or public release of research data, whichever is later” </li></ul></ul><ul><li>Certain types of data will need to be retained longer </li></ul><ul><ul><li>Patent data, longitudinal data sets, etc. </li></ul></ul><ul><li>Ask: is your data of permanent value? </li></ul>
  17. 17. <ul><li>Analyzed data (incl. images, tables and tables of numbers used for making graphs) </li></ul><ul><li>Metadata that defines how data was generated, such as experiment descriptions, computer code, and computer-calculation input </li></ul>
  18. 18. <ul><li>Investigators are expected to preserve/share primary data, samples, physical collections, & supporting materials </li></ul><ul><li>Provide easily accessible information about data holdings, including quality assessments and guidance/finding aids </li></ul><ul><li>Data may be made available through submission to national data center, publication in journal, book, or accessible website of institutional archives </li></ul>
  19. 19. <ul><li>Requirements and their implications </li></ul>
  20. 20. <ul><li>All submitted plans must include, at minimum: </li></ul><ul><ul><li>Expected Data : types, physical/electronic collections, materials to be produced </li></ul></ul><ul><ul><li>Standards for data and metadata format and content </li></ul></ul><ul><ul><li>Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, etc. </li></ul></ul><ul><ul><li>Policies and provisions for re-use, re-distribution, and the production of derivatives </li></ul></ul><ul><ul><li>Plans for archiving data, samples, and other research products , and for preservation of access to them </li></ul></ul>
  21. 21. <ul><li>In short: What kind of data will be produced by your research processes? </li></ul><ul><li>Keep in mind: </li></ul><ul><ul><li>File formats of complete data sets </li></ul></ul><ul><ul><li>Any software or code that will be needed/produced </li></ul></ul><ul><ul><li>Physical samples or other individual data points </li></ul></ul><ul><ul><ul><li>Some divisions require retention of physical samples; consult your Program Officer </li></ul></ul></ul>
  22. 22. <ul><li>In short: how will you organize your data within datasets to make it widely accessible, and how will you make data sets identifiable? </li></ul><ul><li>Keep in mind: </li></ul><ul><ul><li>Any data formatting standards for your particular discipline </li></ul></ul><ul><ul><li>Any metadata (author, date, subject, etc.) that your program attaches automatically, and what you will need to attach manually </li></ul></ul><ul><ul><li>How will you find your data for later consultation? How will others find it? </li></ul></ul>
  23. 23. <ul><li>In short: How will you allow other researchers to find and use your data? </li></ul><ul><li>Keep in mind: </li></ul><ul><ul><li>How will other researchers find your data? (i.e. How will you publicize its existence?) </li></ul></ul><ul><ul><li>How will you provide access to your data?(CD-RW? Data Repository? Download via panther FILE ?) </li></ul></ul><ul><ul><li>How will you prepare your data for sharing? </li></ul></ul><ul><ul><ul><li>Do you need to depersonalize or declassify anything? </li></ul></ul></ul>
  24. 24. <ul><li>Data Management Plans are required even if a project is not expected to generate data that requires sharing </li></ul><ul><li>DMP should clearly explain non-sharing in light of COI standards (peer review) </li></ul><ul><li>Between the lines: Not sharing will require justification and close scrutiny by NSF </li></ul><ul><li>Sharing is preferred </li></ul>
  25. 25. <ul><li>In short: How will researchers obtain the appropriate permissions to use your data? </li></ul><ul><li>Keep in mind: </li></ul><ul><ul><li>Is a blanket permissions statement or a case-by-case policy more efficient/practical? </li></ul></ul><ul><ul><li>What responsibilities will users of your data have re: privacy, intellectual property, etc.? </li></ul></ul><ul><ul><li>How will you deal with users who violate these provisions? </li></ul></ul>
  26. 26. <ul><li>In short: How will you make sure your data stays intact and available once you are done using it? </li></ul><ul><li>Keep in Mind: </li></ul><ul><ul><li>What are your retention requirements? Is this a permanent data set? </li></ul></ul><ul><ul><li>What storage media will you use? Are you prepared to migrate/emulate as needed? </li></ul></ul><ul><ul><li>Do you have a data backup plan? </li></ul></ul>
  27. 27. <ul><li>Preparing, sharing, and archiving your data sets </li></ul>
  28. 28. <ul><li>Think about where you will put your data </li></ul><ul><ul><li>Local? Network drive? Online data management system? </li></ul></ul><ul><li>Think about how you (or others) will find your data </li></ul><ul><ul><li>Think about how others may use your data, when found </li></ul></ul><ul><li>Think about how to store your data in the long term (or if to store it long-term at all) </li></ul>
  29. 29. <ul><li>Will anybody be able to read these files at the end of your time horizon? </li></ul><ul><li>Where possible, prefer file formats that are: </li></ul><ul><ul><li>Open, standardized </li></ul></ul><ul><ul><li>Documented </li></ul></ul><ul><ul><li>In wide use </li></ul></ul><ul><ul><li>Easy to data-mine, transform, recast </li></ul></ul><ul><li>If you need to transform data for durability, </li></ul><ul><li>do it now, not later. </li></ul>
  30. 30. <ul><li>Fundamental question: What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them? </li></ul><ul><li>Consider the differences between someone inside your lab, someone outside your lab but in your field, and someone outside your field. </li></ul><ul><li>Two parts: metadata and methods </li></ul>
  31. 31. <ul><li>About the project </li></ul><ul><ul><li>Title, people, key dates, funders and grants </li></ul></ul><ul><li>About the data </li></ul><ul><ul><li>Title, key dates, creator(s), subjects, rights, included files, format(s),versions, checksums </li></ul></ul><ul><ul><li>Interpretive aids: codebooks, data dictionaries, algorithms, code </li></ul></ul><ul><li>Keep this with the data– think of it as a Readme file </li></ul>
  32. 32. <ul><li>Reason #1 for not reusing someone else’s data: “I don’t know enough about how it was gathered to trust it.” </li></ul><ul><li>Document what you did. (A published article may or may not be enough.) </li></ul><ul><li>Document any limitations of what you did. </li></ul><ul><li>If you ran code on the data, document the code and keep it with the data. </li></ul><ul><li>Need a codebook? Or a data dictionary? </li></ul><ul><ul><li>If I can’t identify at sight what each bit of your dataset means, yes, you do need a codebook or data dictionary. </li></ul></ul><ul><ul><li>DO NOT FORGET UNITS! </li></ul></ul>
  33. 33. <ul><li>Your own drive (PC, server, flash drive, etc.) </li></ul><ul><ul><li>And if you lose it? Or it breaks? </li></ul></ul><ul><li>Somebody else’s drive </li></ul><ul><ul><li>Departmental or campus drive </li></ul></ul><ul><ul><li>“ Cloud” drive </li></ul></ul><ul><ul><li>Do they care as much about your data as you do? </li></ul></ul><ul><li>What about versioning? </li></ul><ul><li>Library motto: Lots Of Copies Keeps Stuff Safe. </li></ul><ul><ul><li>Two onsite copies, one offsite copy. </li></ul></ul><ul><ul><li>Keep confidentiality and security requirements in mind, of course </li></ul></ul>
  34. 34. <ul><li>If data need to persist beyond project end, you have to deal with a new kind of risk: organizational risk. </li></ul><ul><ul><li>Servers come and go. So do labs. So do entire departments. </li></ul></ul><ul><ul><li>This is especially important if you share data! Don’t let it 404! </li></ul></ul><ul><li>You need to find a trustworthy partner. </li></ul><ul><ul><li>On campus: try the library or Research and Sponsored Programs. (UITS has a role but can’t do it alone!) </li></ul></ul><ul><ul><li>Off campus: look for a disciplinary data repository, or a journal that accepts data. (It’s a good idea to do this as part of your planning process.) </li></ul></ul><ul><li>Let somebody else worry! You have new projects to get on with. </li></ul>
  35. 35. <ul><li>Where to go for help and more information </li></ul>
  36. 36. <ul><li>Informational websites </li></ul><ul><ul><li>UW-Madison: </li></ul></ul><ul><ul><li>UW-Milwaukee: </li></ul></ul><ul><ul><li>Don’t just use the site for your own campus! </li></ul></ul><ul><li>Data experts </li></ul><ul><ul><li>IT cyberinfrastructure experts </li></ul></ul><ul><ul><li>Archivists/records managers </li></ul></ul><ul><li>MINDS@UW: </li></ul><ul><ul><li>Data in final form that make sense as discrete files </li></ul></ul>
  37. 37. <ul><li>For Information: </li></ul><ul><ul><li>NSF Grant Proposal Guide </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>MIT Data Management and Publishing </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><li>For storage/management (non-inclusive): </li></ul><ul><ul><li>A partial list of potential repositories: </li></ul></ul><ul><ul><li>Ask: can my home institution provide better service? </li></ul></ul>
  38. 38. <ul><li>For assistance with writing your plan: </li></ul><ul><ul><li>California Digital Library DMP Creation Tool </li></ul></ul><ul><ul><ul><li> (select “None of the Above”) </li></ul></ul></ul><ul><ul><li>Data Conservancy DMP Template/Questionnaire </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>DataONE Best Practices Examples </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
  39. 39. <ul><li>Make sure your data plan covers at least the minimum requirements set out by NSF </li></ul><ul><li>Create appropriate metadata to help you manage and find data </li></ul><ul><li>Use open, universal standards and file formats </li></ul><ul><li>Be prepared to preserve access tools along with data itself </li></ul><ul><li>Be aware of time periods for data sharing and retention </li></ul>
  40. 40. <ul><li>Contact the presenter </li></ul><ul><ul><li>Brad Houston, UW-Milwaukee </li></ul></ul><ul><ul><li>[email_address] (414) 229-6979 </li></ul></ul><ul><li>This presentation available online at: </li></ul><ul><ul><li> </li></ul></ul>