Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meeting the NSF DMP Requirement: March 7, 2012


Published on

March 7 version of the IUPUI workshop Meeting the NSF Data Management Plan Requirement: What you need to know. This workshop is co-sponsored by the Office of the Vice Chancellor for Research and the University Library.

Published in: Education, Technology
  • Be the first to comment

Meeting the NSF DMP Requirement: March 7, 2012

  2. 2. WHO ARE WE?Heather CoatesDigital Scholarship & Data Management LibrarianUniversity LibraryKristi PalmerDigital Libraries Team LeaderUniversity Library
  3. 3. LEARNING OBJECTIVESAfter attending this workshop: You will understand the NSF data policies. You will be aware of the relevant data -related services at IUPUI. You will have resources to develop a data management plan (DMP) for your NSF proposal(s). You will be able to write a comprehensive DMP for your NSF proposal(s). You will send your DMP draft to the Data Services Program for review and assistance as needed.
  4. 4. OVERVIEW Context for the NSF data policies Meeting the NSF DMP requirement  The requirement: 5 elements  Developing a Data Management Plan  Implementing your plan Workshop Evaluation
  5. 5. CONTEXT: SCHOLARLY COMMUNICATIONS Funding agency requirements Scholarly Impact  Exposure  increased citation  More equal access (especially for students)  Facilitates reproducibility  Facilitate new discoveries via secondary analysis/data re -use  Foster productive collaborations  Lead to new computational techniques Planning for the future  If we can’t find it, it doesn’t exist  Persistent access  Long-term preservation
  6. 6. CONTEXT: WHY THE LIBRARY? preservation, curation, access Trusted member of the institution Organizational structure lends itself to collaboration with researchers Interdisciplinary by nature Existing infrastructure for digital information Existing expertise in preserving and providing access to information  Program of Digital Scholarship  Archives
  7. 7. CONTEXT: DATA SERVICES PROGRAM Part of the Program of Digital Scholarship Mission  Identifying data issues and connecting you to the solutions Services  Workshops  Individual consultations  Data repository Resources  Guide to NSF Data Management Plan Requirement  Website
  8. 8. CONTEXT: TERMINOLOGY Cyberinfrastructure: computing resources & networks, services, & people (see Empowering People, 2009 for more) Data management: technical processing and preparation of data for analysis Data curation: selection of data for preservation and adding value for current and future use Data citation: mechanisms to enable easy reuse and verification, track impact of data, and create structures to recognize and reward researchers (DataCite) Data sharing: must take into account ethical and legal issues; a spectrum with many options
  9. 9. CONTEXT: FEDERAL POLICIES Issues in scholarly communication  Open access  Open data & data citation  Data management & curation Federal policies (incremental steps towards openness)  National Research Council, 1985  Office of Management & Budget, 1999: Circular A-110  NIH Data Sharing Policy, 2003  NIH Public Access Policy, 2008  NSF DMP Requirement, 2011  Other policies: Wellcome Trust, Howard Hughes Medical Institute, NOAA, NEH
  10. 10. CONTEXT: IU STRATEGIC PLANIU Empowering People Strategic Plan for IT (2009) Action 33: “IU should provision a data utility service for research data that affords abundant near- and long-term storage, ease of use, and preservation capabilities. This data utility will need to offer a range of services for securing data, providing authorized access within and beyond IU; ensuring metadata description, annotation, and provenance; and providing backup/recovery services.”
  11. 11. CONTEXT: OPEN ACCESS What is Open Access?  Freely available, online, and free of most copyright restrictions Why should you care?  Right thing to do?  Increase your citations  “We analysed 119,924 conference articles in computer science and related disciplines. The mean number of citations to offline articles is 2.74, and the mean number of citations to online articles is 7.03, an increase of 157%.” (Lawrence, 2008)  Publisher functions need not reside in for profit hands  "Between 1975 and 2005 the average cost of journals in chemistry and physics rose from $76.84 to $1,879.56. In the same period, the cost of a gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon of gas had increased in price at the same rate as chemistry and physics journals over this period it would have reached $12.43 in 2005, and would be over $14.50 today.” (Lewis, 2008)
  12. 12. CONTEXT: OPEN ACCESS @ IUPUI IUPUI University Library Program of Digital Scholarship  Open Journals  IUPUIScholarWorks-Faculty Scholarship  Electronic Theses and Dissertations  Cultural Heritage Collections  Data  eArchives
  13. 13. CONTEXT: RESEARCH LIFE CYCLESource: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.<>.
  14. 14. CONTEXT: BENEFITS OF PLANNING Saves time  Less reorganization down the road Increases efficiency  Gathers necessary information for analysis and writing  Prevents problems in understanding data and metadata Makes it easier to preserve your data Requirements from some funding agencies and institutions
  15. 15. DMP: THE REQUIREMENT Why?  Increased impact of research money  Reduce redundant data collection  Enhance use and value of existing data  Further scientific research Language is broad to allow input from research communities Implementation costs of the DMP CAN be included in direct costs
  16. 16. DMP: PRACTICAL TIPS The gist of it…  Describe what you will do with your data during and after the proposed project  Ensures data is safe now and in the future DMP should reflect…  Awareness of data management and curation in your discipline  Feasible plan to utilize available cyberinfrastructure Try to…  Explain the rationale for your choices  Identify roles for data management and curation activities
  17. 17. DMP: ELEMENTS Types of data Standards and metadata Access and sharing Re-use, re-distribution, and the production of derivatives Long-term preservation [Budget]
  18. 18. DMP: TYPES OF DATA [1] Use standards common in your research community Characterize the data to be generated or used  Types of data?  experimental, observational, raw or derived, models, simulations, curriculum materials, software, images, audio, video, etc.  What file formats will be used?  Text, spreadsheet, database, etc.  How will it be collected? (describe the process)  How much data?  Will the data be reproducible? How does the project relate to existing data?  If dataset will be combined, how to ensure interoperability?
  19. 19. DMP: TYPES OF DATA [2] How will data be collected?  How? (tools, instruments, measurements, etc.)  When? (timeframe, series)  Where? How will data be processed?  Workflows  Software packages How will the data be stored and managed?  File naming conventions  Version control
  20. 20. DMP: TYPES OF DATA [3] What QA & QC measures will be used?  Identify steps during processing and analysis to eliminate bad data points  Examples: double data entry, data screening tests What is the backup and security plan?  Plan for particular security or confidentiality issues  Location & frequency Roles & responsibilities  Who will carry out data collection, processing, and backup activities?
  21. 21. EXAMPLE: TYPES OF DATAAtmospheric Concentrations of CO2, Mauna Loa Observatory,Hawaii, 2011-2013 /sites/all/documents/DMP_MaunaLoa_Formatted.pdfArthropod responses to grassland nutrient limitation /sites/all/documents/DMP_NutNet_Formatted.pdf
  22. 22. DMP: STANDARDS & METADATA [1] Metadata describes the who, what, when, where, how, why of the data Purpose of metadata is to enable finding, organization, interoperability, identification, archiving & preservation Standards are commonly agreed upon terms and definitions in a structured format
  23. 23. DMP: STANDARDS & METADATA [2] Will your datasets be self -explanatory or understandable in isolation? Decisions to make about metadata  Relevant standard(s)  Format  Content  What information is needed to use and interpret in 5 years, 25 years?  Ask your fellow researchers and check with data centers or repositories How are metadata created?  Automatically generated  Manually created
  24. 24. EXAMPLE: STANDARDS & METADATA [1]Atmospheric Concentrations of CO2, Mauna Loa Observatory,Hawaii, 2011-2013 /sites/all/documents/DMP_MaunaLoa_Formatted.pdfMetadata will be comprised of two formats —Contextualinformation about the data in a text based document and ISO19115 standard metadata in an xml file. These two formats formetadata were chosen to provide a full explanation of the data(text format) and to ensure compatibility with internationalstandards (xml format). The standard XML file will be morecomplete; the document file will be a human -readable summary ofthe XML file.
  25. 25. EXAMPLE: STANDARDS & METADATA [2]R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di umhtt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdfM i c ro s o f t A c c e s s D a ta b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n d i t i sco m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f twa re /a rcg i s /i n d ex . ht ml ), aG e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n gco nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a taf i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s top ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d ata b l e d atas o u rc e s a re ref re s h e d .Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a n i d e nt i f i e d d a ta s et .Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r me d u s i n g S Q L s ta te m e nt s t h a t ca p i ta l i ze o nt h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s e i nte g r i t y. A p p ro p r i a te p r i m a r y key sw i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s . Po te nt i a l d u p l i ca te s i te I D s , w i l l b eh a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h e c re a t i o n o f a l te r n a te I D ta b l e s .A d a ta d i c t i o n a r y w i l l b e c re ate d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l e f i e l d s , a n d ta b l ef i e l d d a ta t y p e s . A n e nt i t y - ­ ­ re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a t d ef i n e s t h ere l a t i o n a l s t r u c t u re o f t h e d a ta b a s e .A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h e e nt i reg e o d a ta b a s e. T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nts t a n d a rd s .
  26. 26. DMP: ACCESS & SHARING What are your obligations for sharing?  Funding agency, institution, other organization, legal, etc. What are the ethical or legal issues? (i.e., privacy, confidentiality, security, intellectual property, or other rights) How will the data be made available? What is the process for gaining access? When will the data be made available?  When will the data become available?  For how long will the data be available? What is the process for gaining access? Who will have access to the data?
  27. 27. DMP: RE-USE, RE-DISTRIBUTION, ETC. What rights will you retain before data is made available? Will permission restrictions be necessary?  Limits or conditions for political, commercial, or patent reasons? Is there an embargo period? Why? Future users and uses  Who might be interested in the data?  How might you anticipate this data being used?  What value might the data have for these people?
  28. 28. EXAMPLE: ACCESS, SHARING, RE-USEDevelopment of a NanoKlein Calorimeter expect to apply for a patent for this instrument. All of thematerials submitted as part of the patent process will be a matterof public record. We will also make technical drawings, test dataand calibration data available through our institutional repository.Cave Microbiology
  29. 29. DMP: LONG-TERM PRESERVATION What data will be preserved?  What transformations are necessary to prepare the data? How long do you think the data will be useful? How long will the data be preserved? Contextual information needed to make the data reusable  metadata, references, reports, manuscripts, grant proposal, etc. Where will it be preserved?  Links to published materials and other outcomes? Use of persistent citation?  Procedures for preservation and back-up? Who will be the contact for the dataset?
  30. 30. EXAMPLE: LONG-TERM PRESERVATION [1]Arthropod responses to grassland nutrient limitation /sites/all/documents/DMP_NutNet_Formatted.pdfWe will preserve both arthropod datasets generated during thisproject (abundance and stoichiometry) for the long term in theDigital Conservancy at the U of M. We will include the .csv files,along with the associated metadata files. We will also submit anabstract with the datasets that describe their original context andany potentially relevant project information. Borer will beresponsible for preparing data for long -term preservation and forupdating contact information for investigators.
  31. 31. EXAMPLE: LONG-TERM PRESERVATION [2]Improving the long-term preservability of HDF-formatted data bycreating maps to file contents /sites/all/documents/DMP_HDFMap_Formatted.pdfThe writer software will be preserved by the HDF Group for the lifeof the HDF libraries. The HDF Group uses industry­standard bestpractices to ensure the integrity of their software and systems.Once the map writer has been used to generate maps for everyHDF file in existence, the continued existence of the writersoftware is not required. The reader software will be preserved for as long as there is community interest. Thecollection of HDF files will be preserved at NSIDC as long as utilityis deemed high.
  32. 32. IMPLEMENTING YOUR PLAN [1] The DMP is a working document NSF expects progress to be reported Incorporate implementation into the project startup process  C&G, IRB, IACUC all have to be in place before data collection can begin  Review, revise, and set up your system during startup Good documentation ensures…  A shared understanding of the data throughout a project  That future researchers will be able to understand data within the relevant context  That re-users of data are able to interpret the data appropriately Resources for backing up data during a project  Research File System:  Scholarly Data Archive:
  33. 33. IMPLEMENTING YOUR PLAN [2]Program of Digital Scholarship: for Research & Learning: of Academic Affairs: http://www.academicaffairs.iupui.eduIntellectual Property Policy: File System: Data Archive: Technologies, UITS: Ser vices, UITS: Cyberinfrastructure, UITS: TSI Tools: /rct (Alfresco Share, REDCap )IUWare: https://iuware.iu.eduIUanyWare: Consulting Center:
  34. 34. RESOURCES [1]Data Services Program site: Science Board, Digital Research Data Sharing &Management, 2012 (pre-publication): Institutes of Health, Data Sharing Policy /data_sharing_guidance.htmNIH Public Access Policy Implications New Employee Compliance Orientation (NECO)
  35. 35. RESOURCES [2]UK Data Archive: Managing & Sharing Data Brochure: Data Archive Costing Tool: Commons Licenses & Data: /DataLicensing Research Data, Digital Curation Centre -guides/license-research-dataCIC Author Addendum /DMPOnline:
  36. 36. COMPELLING CASES FOR OPEN DATATim Berners-Lee: cancer research: problem blogs: /about/ -data-success-story.html -economics-of-open-data-mini-case-transit-data-translink/
  37. 37. REFERENCES1. Higgins, S. ( nd). What are metadata standards. http://ww resources/bri efing -papers/standards -watch-papers/what -are- metadata - standards2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles. Retrieved from http://ww -us/dcc- charter.3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced Cyberinf rast ructure. Retri eved from rast ructure.pdf.4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s Strategic Plan for Information Technology. Retrieved from http://ovpit.iu. edu/st rategic2/ .5. National Science Foundati on. (2011 ). Award and Administration Guide: Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved from http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .6. Lawrence, S., Free online availability substantially increases a paper ’s impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e - access/Articles/lawrence.html (accessed November 5, 2008,)7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y communication: Transformati ons in academic publishing." C&RL News, May 2008, Vol. 69, No. 5. [Available at: http://ww /ala/mgrps/di vs/acrl/publicati ons/crlnews/ 2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]
  38. 38. THANK YOUTell us what you think, take a brief survey.Find us @ Coates,, 317-278-7125Kristi Palmer,, 317-274-8230
  39. 39. EXTRA: NIH DATA SHARING POLICY $500,000 or more in direct costs in any year of the proposed research Final research data, not summary statistics or tables, not underlying pathology reports and other clinical source documents, might include both raw data and derived variables If an application describes a data -sharing plan, NIH expects that plan to be enacted. NIH expects the timely release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset. It is the responsibility of the investigators, their Institutional Review Board (IRB), and their institution to protect the rights of subjects and the confidentiality of the data. Prior to sharing, data should be redacted to strip all identifiers, and effective strategies should be adopted to minimize risks of unauthorized disclosure of personal identifiers.
  40. 40. EXTRA: NIH DATA SHARING PLAN describe briefly the expected schedule for data sharing the format of the final dataset the documentation to be provided whether or not any analytic tools also will be provided whether or not a data -sharing agreement will be required  if so, a brief description of such an agreement (including the criteria for deciding who can receive the data and whether or not any conditions will be placed on their use) mode of data sharing (e.g., under their own auspices by mailing a disk or posting data on their institutional or personal website, through a data archive or enclave) Applicants may request funds in their application for data sharing.