Meeting the NSF DMP Requirement June 13, 2012


Published on

June 13 version of the IUPUI workshop Meeting the NSF Data Management Plan Requirement: What you need to know. This workshop is co-sponsored by the Office of the Vice Chancellor for Research and the University Library.

Published in: Education, Technology
  • Be the first to comment

Meeting the NSF DMP Requirement June 13, 2012

  2. 2. WHO ARE WE?Heather CoatesDigital Scholarship & Data Management LibrarianLiaison to the School of Public HealthUniversity LibraryKristi PalmerDigital Scholarship Team LeaderLiaison to the Department of Historyand Programs of Womens and American StudiesUniversity Library
  3. 3. LEARNING OBJECTIVESAfter attending this workshop: You will understand the NSF data policies. You will be aware of the relevant data -related services at IUPUI. You will have resources to develop a data management plan (DMP) for your NSF proposal(s). You will be able to write a comprehensive DMP for your NSF proposal(s). You will send your DMP draft to the Data Services Program for review and assistance as needed.
  4. 4. OVERVIEW Context for the NSF data policies Meeting the NSF DMP requirement  The requirement: 5 elements  Developing a Data Management Plan  Implementing your plan Workshop Evaluation ( 5 minutes)
  5. 5. CONTEXT: SCHOLARLY COMMUNICATIONS Funding, funding, funding Scholarly Impact  Exposure  increased citation  More equal access (especially for students)  Facilitates reproducibility  Facilitate new discoveries via secondary analysis/data re -use  Foster productive collaborations  Lead to new computational techniques Planning for the future  If we can’t find it, it doesn’t exist  Persistent access  Long-term preservation of scholarly records
  6. 6. CONTEXT: WHY THE LIBRARY? preservation, curation, access Trusted member of the institution Organizational structure lends itself to collaboration with researchers Interdisciplinary by nature Existing infrastructure for digital information Existing expertise in preserving and providing access to information  Program of Digital Scholarship  Archives
  7. 7. CONTEXT: DATA SERVICES PROGRAM Part of the Program of Digital Scholarship Mission  Identifying data issues and connecting you to the solutions Services  Workshops  Individual consultations  Data repository Resources  Guide to NSF Data Management Plan Requirement  Website
  8. 8. CONTEXT: TERMINOLOGY Cyberinfrastructure: computing resources & networks, services, & people (see Empowering People, 2009 for more) Data management: technical processing and preparation of data for analysis Data curation: selection of data for preservation and adding value for current and future use Data citation: mechanisms to enable easy reuse and verification, track impact of data, and create structures to recognize and reward researchers ( DataCite) Data sharing: must take into account ethical and legal issues; a spectrum with many options Data stewardship:
  9. 9. CONTEXT: FEDERAL POLICIES Issues in scholarly communication  Open access  Open data & data citation  Data management & curation Federal policies (incremental steps towards openness)  National Research Council, 1985  Office of Management & Budget, 1999: Circular A-110  NIH Data Sharing Policy, 2003  NIH Public Access Policy, 2008  NSF DMP Requirement, 2011  Other policies: NEH, NOAA, NASA, Howard Hughes Medical Institute Wellcome Trust
  10. 10. CONTEXT: IU STRATEGIC PLANIU Empowering People Strategic Plan for IT (2009), Action33: “IU should provision a data utility service for research data that affords abundant near- and long-term storage, ease of use, and preservation capabilities. This data utility will need to offer a range of services for securing data, providing authorized access within and beyond IU; ensuring metadata description, annotation, and provenance; and providing backup/recovery services.”
  11. 11. CONTEXT: OPEN ACCESS What is Open Access?  Freely available, online, and free of most copyright restrictions Why should you care?  Right thing to do?  Increase your citations  “We analysed 119,924 conference articles in computer science and related disciplines. The mean number of citations to offline articles is 2.74, and the mean number of citations to online articles is 7.03, an increase of 157%.” (Lawrence, 2008)  Publisher functions need not reside in for profit hands  "Between 1975 and 2005 the average cost of journals in chemistry and physics rose from $76.84 to $1,879.56. In the same period, the cost of a gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon of gas had increased in price at the same rate as chemistry and physics journals over this period it would have reached $12.43 in 2005, and would be over $14.50 today.” (Lewis, 2008)
  12. 12. CONTEXT: OPEN ACCESS @ IUPUI AND IU IUPUI University Library Program of Digital Scholarship  Open Journals  IUPUIScholarWorks-Faculty Scholarship  Electronic Theses and Dissertations  Cultural Heritage Collections  Data  eArchives
  13. 13. CONTEXT: RESEARCH LIFE CYCLESource: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.<>.
  14. 14. CONTEXT: BENEFITS OF PLANNING Saves time  Less reorganization down the road Increases efficiency  Gathers necessary information for analysis and writing  Prevents problems in understanding data and metadata Prevents data loss  If you have a plan, you are more likely to back up your data Makes it easier to preserve your data  Documentation is more easily created throughout a project  Metadata generation can be automated or incorporated into procedures Requirements of some funding agencies and institutions
  15. 15. DMP: INTERPRETING THE POLICY Why?  Increased impact of research money  Reduce redundant data collection  Enhance use and value of existing data  Further scientific research  Data gathering tool  What kinds of data are we collecting?  How are researchers collecting, managing, and preserving data?  What are community norms? Language is broad to allow input from research communities Implementation costs of the DMP CAN be included in direct costs
  16. 16. DMP: KEEP IN MIND The gist of it…  Describe what you will do with your data during and after the proposed project  Ensures data is safe now and in the future DMP should reflect…  Awareness of data management and curation in your discipline  Feasible plan to utilize available cyberinfrastructure Try to…  Explain the rationale for your choices  Identify roles for data management and curation activities
  17. 17. DMP: ELEMENTS Types of data Standards and metadata Access and sharing Re-use, re-distribution, and the production of derivatives Long-term preservation [Budget]
  18. 18. DMP: TYPES OF DATA [1] Use standards common in your research community Characterize the data  Types of data  experimental, observational, raw or derived, models, simulations, curriculum materials, software, images, audio, video, etc.  File formats (i.e., text, spreadsheet, database, etc.)  How much data? (# of files, total size)  Will the data be reproducible? Relationship to existing data? (i.e., interoperability)  Syntactic  Semantic
  19. 19. DMP: TYPES OF DATA [2] How will data be collected?  How? (tools, instruments, measurements, etc.)  When? (timeframe, series)  Where? (sites, settings) How will data be processed?  Workflows (brief overview using flow chart)  Software packages How will the data be stored and managed?  File naming conventions  Version control
  20. 20. DMP: TYPES OF DATA [3] What QA & QC measures will be used?  Identify steps during processing and analysis to eliminate missing data points, identify outliers, and provide statistical summaries (e.g., double data entry, histograms, scatterplots)  Before data are collected, define and enforce standards and assign responsibility  During project, document processes and any changes or deviations What is the backup and security plan?  Identify particular security or confidentiality issues  Describe location & frequency Roles & responsibilities  Who will carry out data collection, processing, and backup activities?
  21. 21. EXAMPLE: TYPES OF DATAAtmospheric Concentrations of CO2, Mauna LoaObservatory, Hawaii, 2011 -2013 /sites/all/documents/DMP_MaunaLoa_Formatted.pdfArthropod responses to grassland nutrient limitation /sites/all/documents/DMP_NutNet_Formatted.pdf
  22. 22. DMP: STANDARDS & METADATA [1] Metadata describes the who, what, when, where, how, why of the data  Include workflow: how you get from raw data to final products Purpose: enable finding, organization, interoperability, identification, archiving & preservation Standards are commonly agreed upon terms and definitions in a structured format  Dublin Core (commonly used by libraries)  Darwin Core (geographic occurrence of species)  EML (ecology)  Data Documentation Initiative (DDI; social sciences)  IEEE LOM (learning objects metadata)
  23. 23. DMP: STANDARDS & METADATA [2] Ask yourself: will your datasets be self -explanatory or understandable in isolation? Decisions to make about metadata  Relevant standard(s)  Format  Content  What information is needed to use and interpret in 5 years, 25 years? How are metadata created?  Automatically generated  Manually created
  24. 24. EXAMPLE: STANDARDS & METADATA [1]Atmospheric Concentrations of CO2, Mauna Loa Observatory,Hawaii, 2011-2013 /sites/all/documents/DMP_MaunaLoa_Formatted.pdfMetadata will be comprised of two formats —Contextualinformation about the data in a text based document and ISO19115 standard metadata in an xml file. These two formats formetadata were chosen to provide a full explanation of the data(text format) and to ensure compatibility with internationalstandards (xml format). The standard XML file will be morecomplete; the document file will be a human -readable summary ofthe XML file.
  25. 25. EXAMPLE: STANDARDS & METADATA [2]R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di umhtt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdf M i c ro s o f t A c c e s s D ata b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n di t i s co m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f t wa re /a rc g i s / i n d ex . ht m l ) , aG e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n gco nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a taf i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s top ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d a ta b l e d a tas o u rc e s a re ref re s h e d . Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a ni d e nt i f i e d d a ta s et . Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r m ed u s i n g S Q Ls t a te m e nt s t h a t ca p i ta l i ze o n t h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s ei nte g r i t y. A p p ro p r i ate p r i m a r y key s w i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s .Po t e nt i a l d u p l i ca te s i te I D s , w i l l b e h a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h ec re a t i o n o f a l te r n a te I D ta b l e s . A d ata d i c t i o n a r y w i l l b e c re a te d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l ef i e l d s , a n d ta b l e f i e l d d a ta t y p e s . A n e nt i t y - ­ ­ re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a td ef i n e s t h e re l a t i o n a l s t r u c t u re o f t h e d a ta b a s e . A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h ee nt i re g e o d a ta b a s e . T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nts t a n d a rd s .
  26. 26. DMP: ACCESS & SHARING What are your obligations for sharing?  Funding agency, institution, other organization, legal, etc. What are the ethical or legal issues? (i.e., privacy, confidentiality, security, intellectual property, or other rights) How will the data be made available? What is the process for gaining access? When will the data be made available?  When will the data become available?  For how long will the data be available? What is the process for gaining access? Who will have access to the data?
  27. 27. DMP: RE-USE, RE-DISTRIBUTION, ETC. What rights will you retain before data is made available? Will permission restrictions be necessary?  Limits or conditions for political, commercial, or patent reasons? Is there an embargo period? Why? Future users and uses  Who might be interested in the data?  How might you anticipate this data being used?  What value might the data have for these people?
  28. 28. EXAMPLE: ACCESS, SHARING, RE-USEDevelopment of a NanoKlein Calorimeter We expect to apply for a patent for this instrument. All of the materials submitted as part of the patent process will be a matter of public record. We will also make technical drawings, test data and calibration data available through our institutional repository.Cave Microbiology
  29. 29. DMP: LONG-TERM PRESERVATION Project-based funding does not lend itself to long -term preservation. What data will be preserved?  What transformations are necessary to prepare the data? How long do you think the data will be useful? How long will the data be preserved? Contextual information needed to make the data reusable  metadata, references, reports, manuscripts, grant proposal, etc. Where will it be preserved?  Links to published materials and other outcomes? Use of persistent citation?  Procedures for preservation and back-up? Who will be the contact for the dataset?
  30. 30. EXAMPLE: LONG-TERM PRESERVATION [1]Arthropod responses to grassland nutrient limitation /sites/all/documents/DMP_NutNet_Formatted.pdfWe will preserve both arthropod datasets generated during thisproject (abundance and stoichiometry) for the long term in theDigital Conservancy at the U of M. We will include the .csvfiles, along with the associated metadata files. We will also submitan abstract with the datasets that describe their original contextand any potentially relevant project information. Borer will beresponsible for preparing data for long -term preservation and forupdating contact information for investigators.
  31. 31. EXAMPLE: LONG-TERM PRESERVATION [2]Improving the long-term preservability of HDF-formatted data bycreating maps to file contents /sites/all/documents/DMP_HDFMap_Formatted.pdfThe writer software will be preserved by the HDF Group for the lifeof the HDF libraries. The HDF Group uses industry­standard bestpractices to ensure the integrity of their software and systems.Once the map writer has been used to generate maps for everyHDF file in existence, the continued existence of the writersoftware is not required. The reader software will be preserved for as long as there is community interest. Thecollection of HDF files will be preserved at NSIDC as long as utilityis deemed high.
  32. 32. IUPUIDATAWORKS Institutional repository that can facilitate subject repositories Policies are being developed, informed by faculty needs Pilot projects  More support at little/no cost  Flexibility in what we are willing to do  New tools to demonstrate impact of research The future  Standardized levels of service  Standardized policies, responsive to faculty needs  Cost recovery for significant intellectual/time investment
  33. 33. IMPLEMENTING YOUR PLAN [1] The DMP is a working document NSF expects progress to be reported (progress reports, final reports, new grant proposals) Incorporate implementation into the project startup process  C&G, IRB, IACUC all have to be in place before data collection can begin  Review, revise, and set up your system during startup Good documentation ensures…  A shared understanding of the data throughout a project  That future researchers will be able to understand data within the relevant context  That re-users of data are able to interpret the data appropriately
  34. 34. IMPLEMENTING YOUR PLAN [2]Research File System: Data Archive: Technologies, UITS: Ser vices, UITS: Cyberinfrastructure, UITS: TSI Tools: /rct (Alfresco Share, REDCap )Program of Digital Scholarship: for Research & Learning: of Academic Affairs: http://www.academicaffairs.iupui.eduIntellectual Property Policy: academicguide/index.php/Policy_I-11IUWare: https://iuware.iu.eduIUanyWare: Consulting Center:
  35. 35. PRACTICAL tutorials: Cleaning Up Your Excel Data (2010) Managing & Analyzing Data in Excel (2010) Data Validation in Depth (2010)DMPTool: /DMPOnline: Data Archive Costing Tool: Commons Licenses & Data: /DataLicensing Research Data, Digital Curation Centre -guides/license-research-dataCIC Author Addendum
  36. 36. RECOMMENDED READINGUK Data Archive: Managing & Sharing Data Brochure:
  37. 37. MORE RESOURCES National Science Board, Digital Research Data Sharing & Management, 2012 (pre-publication): Committee on Science, Engineering, and Public Policy (U.S.). (2009). Ensuring the integrity, accessibility, and stewardship of research data in the digital age. Washington, D.C.: National Academies Press. National Science Board Committee on Strategy and Budget Task Force on Data Policies. (2011). Digital Research Data Sharing & Management. Washington, D.C.: National Science Board. America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Reauthorization Act of 2010, Pub. L. No. 111 -358. 124 Stat. 3982 (2010). Retrieved from the Library of Congress Thomas database .
  38. 38. REFERENCES1. Higgins, S. ( nd). What are metadata standards. http://ww resources/bri efing -papers/standards -watch-papers/what -are- metadata - standards2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles. Retrieved from http://ww -us/dcc- charter.3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced Cyberinf rast ructure. Retri eved from rast ructure.pdf.4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s Strategic Plan for Information Technology. Retrieved from http://ovpit.iu. edu/st rategic2/ .5. National Science Foundati on. (2011 ). Award and Administration Guide: Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved from http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .6. Lawrence, S., Free online availability substantially increases a paper ’s impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e - access/Articles/lawrence.html (accessed November 5, 2008,)7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y communication: Transformati ons in academic publishing." C&RL News, May 2008, Vol. 69, No. 5. [Available at: http://ww /ala/mgrps/di vs/acrl/publicati ons/crlnews/ 2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]
  39. 39. COMPELLING CASES FOR OPEN DATASPARC, Research is more valuable when it ’s shared: /sparc/greaterreach/index.shtmlTim Berners-Lee: cancer research: problem blogs: /about/ -data-success-story.html -economics-of-open-data-mini-case-transit-data-translink/
  40. 40. THANK YOUTell us what you think, take a brief survey.Find us @ Coates,, 317-278-7125Kristi Palmer,, 317-274-8230IUBStacy Konkiel,, 812-856-5295
  41. 41. EXTRA: NIH DATA SHARING POLICY $500,000 or more in direct costs in any year of the proposed research Final research data, not summary statistics or tables, not underlying pathology reports and other clinical source documents, might include both raw data and derived variables If an application describes a data -sharing plan, NIH expects that plan to be enacted. NIH expects the timely release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset. It is the responsibility of the investigators, their Institutional Review Board (IRB), and their institution to protect the rights of subjects and the confidentiality of the data. Prior to sharing , data should be redacted to strip all identifiers, and effective strategies should be adopted to minimize risks of unauthorized disclosure of personal identifiers.
  42. 42. EXTRA: NIH DATA SHARING PLAN describe briefly the expected schedule for data sharing the format of the final dataset the documentation to be provided whether or not any analytic tools also will be provided whether or not a data -sharing agreement will be required  if so, a brief description of such an agreement (including the criteria for deciding who can receive the data and whether or not any conditions will be placed on their use) mode of data sharing (e.g., under their own auspices by mailing a disk or posting data on their institutional or personal website, through a data archive or enclave) Applicants may request funds in their application for data sharing.
  43. 43. RESOURCESNational Institutes of Health, Data Sharing Policy /data_sharing_guidance.htmNIH Public Access Policy Implications