Smith RDAP11 NSF Data Management Plan Case Studies


Published on

MacKenzie Smith, MIT; NSF Data Management Plan Case Studies; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Smith RDAP11 NSF Data Management Plan Case Studies

  2. 2. NSF DMP GUIDELINES WANT <ul><li>Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements </li></ul><ul><li>Policies and provisions for re-use, re-distribution, and the production of derivatives </li></ul>RDAP Summit ©2011, MacKenzie Smith
  3. 3. WHAT IS DRIVING THIS? <ul><li>Scientific progress requires international, interdisciplinary interoperability, including frictionless data integration at large-scales (e.g. the Web) </li></ul><ul><li>Data interoperability includes </li></ul><ul><li>technical issues (data integration, protocols) </li></ul><ul><li>social issues (scientific norms, credit mechanisms or lack thereof) </li></ul><ul><li>legal issues (incompatible laws and policies for data and databases) </li></ul>RDAP Summit ©2011, MacKenzie Smith
  4. 4. DATA USE/REUSE/REDISTRIBUTION <ul><li>Data use : Using research data for the current research purpose/activity to infer new knowledge about the research subject. </li></ul><ul><li>Data re-use : Using research data for a research purpose/activity other than that for which it was intended. </li></ul><ul><li>Howard, T., Darlington, M., Ball, A., Culley, S., McMahon, C., 2010. Understanding and Characterizing Engineering Research Data for its Better Management. Project Report. Bath, UK: University of Bath, ERIM Project Document. erim2rep100420mjd10 </li></ul>RDAP Summit ©2011, MacKenzie Smith
  5. 5. DATA USE/REUSE/REDISTRIBUTION <ul><li>Data purposing : Making research data available and fit for the current research activity. </li></ul><ul><li>Data re-purposing : Making research data available and fit for a future known research activity </li></ul><ul><li>Data re-use: Managing research data such that it will be available for a future unknown research activity. </li></ul>RDAP Summit ©2011, MacKenzie Smith
  6. 6. SUPPORTING DATA REUSE <ul><li>Future users unknown, potentially interdisciplinary </li></ul><ul><li>You don’t know them and they don’t know you (or what you/your discipline expects) </li></ul><ul><li>Data documentation and policies need to be clear, not require contact or ad hoc negotiations (what if you’ve moved or you’re dead?) </li></ul>RDAP Summit ©2011, MacKenzie Smith
  7. 7. INTERNATIONAL COLLABORATIONS <ul><li>If I participate in a collaborative international research project, do I need to be concerned with data management policies established by institutions outside the United States? </li></ul><ul><li>Yes . There may be cases where data management plans are affected by formal data protocols established by large international research consortia or set forth in formal science and technology agreements signed by the United States Government and foreign counterparts. Be sure to discuss this issue with your sponsored projects office (or equivalent) and your international research partner when first planning your collaboration. </li></ul>RDAP Summit ©2011, MacKenzie Smith
  8. 8. DATA LICENSING IN US <ul><li>US Gov data in the Public Domain </li></ul><ul><ul><li>explicit rights statement rare </li></ul></ul><ul><li>Factual data not copyrightable in the US </li></ul><ul><ul><li>creativity matters, ‘sweat of the brow’ does not </li></ul></ul><ul><ul><li>not much legal precedent in science </li></ul></ul><ul><ul><li>generally not known by users </li></ul></ul><ul><li>EULAs in place for many data archives </li></ul><ul><ul><li>all different, varying practicality, hard to enforce </li></ul></ul>RDAP Summit ©2011, MacKenzie Smith
  9. 9. CREATIVE COMMONS <ul><li>Tools for data sharing towards Web-scale interoperability (e.g. Linked Open Data) </li></ul><ul><li>CC0 or CC-By </li></ul><ul><li>Public Domain mark </li></ul><ul><li>Best practice for URI-based attribution (e.g. to avoid attribution stacking) </li></ul>RDAP Summit ©2011, MacKenzie Smith
  10. 10. CREATIVE COMMONS <ul><li>CC0 waives copyright and associated rights (e.g. data rights) where applicable </li></ul><ul><ul><li>Important for interoperability with legal jurisdictions that have sui generis data rights (e.g. Europe) </li></ul></ul><ul><li>CC-By-SA </li></ul><ul><ul><li>bad for interoperability </li></ul></ul><ul><li>CC-By with attribution </li></ul><ul><ul><li>via URI (Aus and NZ examples) </li></ul></ul><ul><ul><li>Attribution stacking </li></ul></ul>RDAP Summit ©2011, MacKenzie Smith
  11. 11. ISSUES <ul><li>Licenses </li></ul><ul><li>Attribution </li></ul><ul><li>Persistent IDs </li></ul><ul><li>Provenance </li></ul><ul><li>Metadata </li></ul><ul><li>Registries </li></ul>RDAP Summit ©2011, MacKenzie Smith
  12. 12. WHAT DO RESEARCHERS WANT? SUPPLY SIDE <ul><li>CREDIT </li></ul><ul><li>CONTROL </li></ul><ul><li>CONFIDENCE (in appropriate use of their data) </li></ul><ul><li>and sometimes… </li></ul><ul><li>IP </li></ul><ul><li>but always… FUNDING </li></ul>RDAP Summit ©2011, MacKenzie Smith
  13. 13. WHAT DO RESEARCHERS WANT? DEMAND SIDE <ul><li>Easy reuse of their own data </li></ul><ul><li>Easy discovery of and access to outside data </li></ul><ul><li>Easier integration/interoperability of their own, other data (i.e. “re-purposing”) </li></ul>RDAP Summit ©2011, MacKenzie Smith
  14. 14. HOW CAN RESEARCHERS ACHIEVE THAT? <ul><li>Standard copyright licenses or waivers </li></ul><ul><li>Standards terms & conditions (EULA) </li></ul><ul><li>… via their institutional repository! </li></ul><ul><li>Researchers want good advice, have zero interest in complex legal issues </li></ul><ul><li>IRs can establish practices that help researchers achieve their goals with low effort </li></ul>RDAP Summit ©2011, MacKenzie Smith
  15. 15. DMP BOILERPLATE <ul><li>Sharing . </li></ul><ul><li>Project data will be made publicly accessible/downloadable from the university’s data archive website (via a standard Web UI) as … Once located on the archive website, image sets will be downloadable via standard Web protocols (i.e. http). Included in the associated metadata for each image set will be rights information such as copyright and licensing terms for use and reuse of the data . Each image set will be assigned a unique, persistent URI (web identifier, resolvable as a URL) for use in citations. The university’s data archive uses Handles for persistent URIs. </li></ul>RDAP Summit ©2011, MacKenzie Smith
  16. 16. DMP BOILERPLATE <ul><li>Licensing . </li></ul><ul><li>Images, even scientific research images generated by scanners, may be subject to copyright in the U.S., so images produced by the project will be collected and shared using a Creative Commons license, specifically CC-BY (i.e. with attribution to the copyright owner, who is the Principal Investigator for this project, with the approval of the university’s IP counsel) . By using the CC-BY license, we are authorizing all interested researchers to use the image data produced by this project in whatever manner they choose, as long as they cite the Principal Investigator as the source of the data. </li></ul>RDAP Summit ©2011, MacKenzie Smith
  17. 17. DMP BOILERPLATE <ul><li>Licensing, cont. </li></ul><ul><li>Metadata associated with the image sets will be released under a CC0 license (public domain dedication) since it is normally not copyrightable and we want it to be reusable in new contexts (e.g. Google indexes). With these licensing terms, future researchers will be able to combine the image data and associated metadata produced by this project with data produced from their own or other projects, to create super- or sub-sets of images needed for their own research (i.e. “derivative” datasets). </li></ul><ul><li>  </li></ul>RDAP Summit ©2011, MacKenzie Smith
  18. 18. DMP BOILERPLATE <ul><li>In the university’s central data archive, researchers will be able to determine the rights assigned to the project’s data via the metadata displayed in the UI for the dataset (i.e. in the rights fields of the relevant catalog record for the dataset) . The archive’s search interface supports filtering searches by rights category (e.g. Public Domain, CC-BY, embargoed) so that researchers can search for only data that they may reuse in their own research. </li></ul>RDAP Summit ©2011, MacKenzie Smith
  19. 19. CONCLUSION <ul><li>IRs serving as data archives can </li></ul><ul><ul><li>Standardize institutional data policies </li></ul></ul><ul><ul><li>Encourage OA </li></ul></ul><ul><ul><li>Lower barriers to researchers to comply with NSF intent </li></ul></ul><ul><li>DMPs </li></ul><ul><ul><li>encourage use of IR </li></ul></ul><ul><ul><li>over time, reassure NSF of consistent practice </li></ul></ul>RDAP Summit ©2011, MacKenzie Smith