Brad HoustonUniversity Records Officer   November 30, 2012
   Document describing data (and/or digital    materials) that have been or will be gathered in    a study or project.  ...
   Starting January 2011 for NEW, non-    collaborative proposals   Not voluntary – “integral part” of proposal   Data ...
   Must address both physical and digital data   “Efficiency and effectiveness” of the DMP will    be considered by NSF ...
Such dissemination of data is necessary for thecommunity to stimulate new advances as quickly aspossible and to allow prom...
   It makes your research easier!   Data available in case you need it later   Helps avoid accusations of fraud or bad ...
Gene expression microarray data: “Publicly  available data was significantly (p=0.006)  associated with a 69% increase in ...
   Discuss specific requirements for NSF    Data Management plans   Suggest ways to manage, share, and    archive data m...
Planning, retention, data defined
   What data are you collecting or making?   Can it be recreated? How much would that cost?    How much of it? How fast...
   Who are the audiences for your data?     You (including Future You), your lab colleagues      (including future ones)...
   How do you and your lab get from where you    are to where you need to be?        Document, document, document all de...
   Four kinds of data defined by OMB:       Observational         Examples: Sensor data, telemetry, survey data, sample...
   Preliminary analyses       Raw data is included in this definition   Drafts of scientific papers   Plans for future...
   As early as possible, but no later than    guidelines laid down by relevant Directorate     Engineering Section: “no ...
   Again, specific retention periods will depend    on the type of data and the grant program       Example: NSF Enginee...
   Analyzed data (incl. images, tables and tables of    numbers used for making graphs)   Metadata that defines how data...
   Investigators are expected to preserve/share    primary data, samples, physical collections, &    supporting materials...
Requirements and their implications
        All submitted plans must include, at         minimum:    1.    Expected Data: types, physical/electronic collecti...
   In short: What kind of data will be produced by    your research processes?   Keep in mind:     File formats of comp...
   In short: how will you organize your data within    datasets to make it widely accessible, and how    will you make da...
   In short: How will you allow other researchers    to find and use your data?   Keep in mind:     How will other rese...
   Data Management Plans are required even if a    project is not expected to generate data that    requires sharing   D...
   In short: How will researchers obtain the    appropriate permissions to use your data?   Keep in mind:     Is a blan...
   In short: How will you make sure your data    stays intact and available once you are done    using it?   Keep in Min...
Preparing, sharing, and archiving your data sets
   Think about where you will put your data       Local? Network drive? Online data management        system?   Think a...
   Will anybody be able to read these files at the    end of your time horizon?    Where possible, prefer file formats t...
   Fundamental question: What would someone    unfamiliar with your data need in order to    find, evaluate, understand, ...
   About the project       Title, people, key dates, funders and grants   About the data     Title, key dates, creator...
   Reason #1 for not reusing someone else’s data: “I    don’t know enough about how it was gathered to    trust it.”    ...
   Your own drive (PC, server, flash drive, etc.)       And if you lose it? Or it breaks?   Somebody else’s drive     ...
   If data need to persist beyond project end, you have to    deal with a new kind of risk: organizational risk.     Ser...
Where to go for help and more information
   Informational websites     UW-Madison: http://researchdata.wisc.edu/     UW-Milwaukee: http://dataplan.uwm.edu     ...
   For Information:       NSF Grant Proposal Guide         http://www.nsf.gov/pubs/policydocs/pappguide/nsf         110...
   For assistance with writing your plan:       California Digital Library DMP Creation Tool         https://dmp.cdlib....
   Make sure your data plan covers at least the    minimum requirements set out by NSF   Create appropriate metadata to ...
   Contact the presenter             Brad Houston, UW-Milwaukee            houstobn@uwm.edu (414) 229-6979   This presen...
Upcoming SlideShare
Loading in...5
×

Data management plans

684

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
684
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data management plans

  1. 1. Brad HoustonUniversity Records Officer November 30, 2012
  2. 2.  Document describing data (and/or digital materials) that have been or will be gathered in a study or project. Often includes details on how data will be organized, preserved, and accessed Facilitates re-use of data sets by either PI or other researchers Required component of grants for MANY agencies (NSF and NIH)
  3. 3.  Starting January 2011 for NEW, non- collaborative proposals Not voluntary – “integral part” of proposal Data Management Plans for all data resulting from any level of NSF funding Supplementary 2-page document (max) Optional: Also part of 15-page (max) Project Description
  4. 4.  Must address both physical and digital data “Efficiency and effectiveness” of the DMP will be considered by NSF and disciplinary division or directorate Must include sufficient information that peer reviewers and project monitors can assess present proposal and past performance As of January 2011, proposals will not be accepted without an accompanying data plan!
  5. 5. Such dissemination of data is necessary for thecommunity to stimulate new advances as quickly aspossible and to allow prompt evaluation of the resultsby the scientific community. “ – NSF (italics mine)Part of Openness trend in federal government(data.gov - Open Government Initiative)NIH Public Access Policy (2008)Public access to federally funded research hearings- Information Policy, Census and National ArchivesSubcommittee of U.S. Congress (July, 2010)
  6. 6.  It makes your research easier! Data available in case you need it later Helps avoid accusations of fraud or bad science To share it for others to use and learn from To get credit for producing it To keep from drowning in irrelevant stuff  ... especially at grant/project end
  7. 7. Gene expression microarray data: “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.”  Piwowar, Heather et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI: 10.1371/journal.pone.0000308 Maybe there’s an advantage here!
  8. 8.  Discuss specific requirements for NSF Data Management plans Suggest ways to manage, share, and archive data more effectively Provide resources for more information
  9. 9. Planning, retention, data defined
  10. 10.  What data are you collecting or making? Can it be recreated? How much would that cost? How much of it? How fast is it growing? Does it change? What file format(s)? What’s your infrastructure for data collection and storage like? How do you find it, or find what you’re looking for in it? How easy is it to get new people up to speed? Or share data with others?
  11. 11.  Who are the audiences for your data?  You (including Future You), your lab colleagues (including future ones), your PIs  Disciplinary colleagues, at your institution or at others  Colleagues in allied disciplines  The world! What are your obligations to others?  Funder requirements  Confidentiality issues  IP questions  Security
  12. 12.  How do you and your lab get from where you are to where you need to be?  Document, document, document all decisions and all processes! Secret sauce: the more you strategize upfront, the less angst and panic later.  “Make it up as you go along” is very bad practice!  But the best-laid plans go agley... so be flexible.  And watch your field! Best practices are still in flux.
  13. 13.  Four kinds of data defined by OMB:  Observational  Examples: Sensor data, telemetry, survey data, sample data, neuroimages.  Experimental  Examples: gene sequences, chromatograms, toroid magnetic field data.  Simulation  Examples: climate models, economic models.  Derived or compiled  Examples: text and data mining, compiled database, 3D models, data gathered from public documents.
  14. 14.  Preliminary analyses  Raw data is included in this definition Drafts of scientific papers Plans for future research Peer reviews or communications with colleagues Physical objects, such as gel samples
  15. 15.  As early as possible, but no later than guidelines laid down by relevant Directorate  Engineering Section: “no later than the acceptance for publication of the main findings of the final data”  Earth Sciences: “No later than two (2) years after the data were collected.”  Social and Economic Sciences: “within one year after the expiration of an award” Be aware of concerns that may require earlier or later disclosure  FERPA? Human Subjects data? HIPAA?
  16. 16.  Again, specific retention periods will depend on the type of data and the grant program  Example: NSF Engineering Section suggests retention period of “three years after either completion of the grant project or public release of research data, whichever is later” Certain types of data will need to be retained longer  Patent data, longitudinal data sets, etc. Ask: is your data of permanent value?
  17. 17.  Analyzed data (incl. images, tables and tables of numbers used for making graphs) Metadata that defines how data was generated, such as experiment descriptions, computer code, and computer-calculation input
  18. 18.  Investigators are expected to preserve/share primary data, samples, physical collections, & supporting materials Provide easily accessible information about data holdings, including quality assessments and guidance/finding aids Data may be made available through submission to national data center, publication in journal, book, or accessible website of institutional archives
  19. 19. Requirements and their implications
  20. 20.  All submitted plans must include, at minimum: 1. Expected Data: types, physical/electronic collections, materials to be produced 2. Standards for data and metadata format and content 3. Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, etc. 4. Policies and provisions for re-use, re-distribution, and the production of derivatives 5. Plans for archiving data, samples, and other research products, and for preservation of access to them
  21. 21.  In short: What kind of data will be produced by your research processes? Keep in mind:  File formats of complete data sets  Any software or code that will be needed/produced  Physical samples or other individual data points  Some divisions require retention of physical samples; consult your Program Officer
  22. 22.  In short: how will you organize your data within datasets to make it widely accessible, and how will you make data sets identifiable? Keep in mind:  Any data formatting standards for your particular discipline  Any metadata (author, date, subject, etc.) that your program attaches automatically, and what you will need to attach manually  How will you find your data for later consultation? How will others find it?
  23. 23.  In short: How will you allow other researchers to find and use your data? Keep in mind:  How will other researchers find your data? (i.e. How will you publicize its existence?)  How will you provide access to your data?(CD-RW? Data Repository? Download via pantherFILE?)  How will you prepare your data for sharing?  Do you need to depersonalize or declassify anything?
  24. 24.  Data Management Plans are required even if a project is not expected to generate data that requires sharing DMP should clearly explain non-sharing in light of COI standards (peer review) Between the lines: Not sharing will require justification and close scrutiny by NSF Sharing is preferred
  25. 25.  In short: How will researchers obtain the appropriate permissions to use your data? Keep in mind:  Is a blanket permissions statement or a case-by-case policy more efficient/practical?  What responsibilities will users of your data have re: privacy, intellectual property, etc.?  How will you deal with users who violate these provisions?
  26. 26.  In short: How will you make sure your data stays intact and available once you are done using it? Keep in Mind:  What are your retention requirements? Is this a permanent data set?  What storage media will you use? Are you prepared to migrate/emulate as needed?  Do you have a data backup plan?
  27. 27. Preparing, sharing, and archiving your data sets
  28. 28.  Think about where you will put your data  Local? Network drive? Online data management system? Think about how you (or others) will find your data  Think about how others may use your data, when found Think about how to store your data in the long term (or if to store it long-term at all)
  29. 29.  Will anybody be able to read these files at the end of your time horizon? Where possible, prefer file formats that are:  Open, standardized  Documented  In wide use  Easy to data-mine, transform, recast If you need to transform data for durability,do it now, not later.
  30. 30.  Fundamental question: What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them? Consider the differences between someone inside your lab, someone outside your lab but in your field, and someone outside your field. Two parts: metadata and methods
  31. 31.  About the project  Title, people, key dates, funders and grants About the data  Title, key dates, creator(s), subjects, rights, included files, format(s),versions, checksums  Interpretive aids: codebooks, data dictionaries, algorithms, code Keep this with the data– think of it as a Readme file
  32. 32.  Reason #1 for not reusing someone else’s data: “I don’t know enough about how it was gathered to trust it.” Document what you did. (A published article may or may not be enough.) Document any limitations of what you did. If you ran code on the data, document the code and keep it with the data. Need a codebook? Or a data dictionary?  If I can’t identify at sight what each bit of your dataset means, yes, you do need a codebook or data dictionary.  DO NOT FORGET UNITS!
  33. 33.  Your own drive (PC, server, flash drive, etc.)  And if you lose it? Or it breaks? Somebody else’s drive  Departmental or campus drive  “Cloud” drive  Do they care as much about your data as you do? What about versioning? Library motto: Lots Of Copies Keeps Stuff Safe.  Two onsite copies, one offsite copy.  Keep confidentiality and security requirements in mind, of course
  34. 34.  If data need to persist beyond project end, you have to deal with a new kind of risk: organizational risk.  Servers come and go. So do labs. So do entire departments.  This is especially important if you share data! Don’t let it 404! You need to find a trustworthy partner.  On campus: try the library or Research and Sponsored Programs. (UITS has a role but can’t do it alone!)  Off campus: look for a disciplinary data repository, or a journal that accepts data. (It’s a good idea to do this as part of your planning process.) Let somebody else worry! You have new projects to get on with.
  35. 35. Where to go for help and more information
  36. 36.  Informational websites  UW-Madison: http://researchdata.wisc.edu/  UW-Milwaukee: http://dataplan.uwm.edu  Don’t just use the site for your own campus! Data experts  IT cyberinfrastructure experts  Archivists/records managers MINDS@UW: minds.wisconsin.edu  Data in final form that make sense as discrete files
  37. 37.  For Information:  NSF Grant Proposal Guide  http://www.nsf.gov/pubs/policydocs/pappguide/nsf 11001/gpg_index.jsp  MIT Data Management and Publishing  http://libraries.mit.edu/guides/subjects/data- management/index.html For storage/management (non-inclusive):  A partial list of potential repositories: http://databib.org  Ask: can my home institution provide better service?
  38. 38.  For assistance with writing your plan:  California Digital Library DMP Creation Tool  https://dmp.cdlib.org/ (Select “UWM” as institution)  Data Conservancy DMP Template/Questionnaire  http://dataconservancy.org/dataManagementPlans  DataONE Best Practices Examples  http://www.dataone.org/plans  Data Curation Profiles (Purdue University)  http://datacurationprofiles.org/  Digital Curation Center Tools Catalog  http://www.dcc.ac.uk/resources/external/tools- services
  39. 39.  Make sure your data plan covers at least the minimum requirements set out by NSF Create appropriate metadata to help you manage and find data Use open, universal standards and file formats Be prepared to preserve access tools along with data itself Be aware of time periods for data sharing and retention
  40. 40.  Contact the presenter Brad Houston, UW-Milwaukee houstobn@uwm.edu (414) 229-6979 This presentation available online at:  http://www.slideshare.net/herodotusjr/data- management-plans-dmp-for-nsf
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×