June 13 version of the IUPUI workshop Meeting the NSF Data Management Plan Requirement: What you need to know. This workshop is co-sponsored by the Office of the Vice Chancellor for Research and the University Library.
Housekeeping: hold questions until the end, make sure everyone has handoutsResources: SlidesDSP Guide to NSF DMPNSF Policy language handoutCIC Author Addendum
We’re going to spend the majority of our time today walking through practical tips and examples for each section of the DMP, but there is important background information you need to know first.
The NSF data sharing policy and data management plan requirement came about within the context of broader discussions about how information is disseminated in the sciences, so we’ll quickly review that discussionbefore getting into the practical steps of developing a DMP. We want to accomplish a 2 things:We want to prepare you to engage in discussions about the scholarly record and how research products are disseminated, specifically your rights and options. We want to give you the information you need to make informed decisions regarding copyright, IP, patent, and other issues when it comes to choosing where to publish and preserve all things related to your research. Data is just one piece of this picture.In addition to funding, there are many compelling reasons to plan for preserving and sharing your data. The good news is that data sharing can boost the scholarly impact of your data and research in general, which is always good for promotion and tenure. -collaborations funders are increasingly looking for interdisciplinary and multi-institution collaborationsThe benefits of digital data come with costs. Unlike with paper-based data and records, we can’t assume that we’ll be able to access and use digital information in 5, 10, or 50 years. We need to plan for managing and preserving valuable digital data so that the scholarly record isn’t lost. If we can’t find something, it doesn’t exist. These issues of persistent access and long-term preservation are challenges that libraries have been solving for thousands of years.
Some people wonder why the library is taking on this challenge of helping researchers to manage and preserve their data. There are several good reasons.-every college or university has a library-our place within IUPUI facilitates collaboration; we have existing relationships with each department; these collaborations are another way to build capacity for data management, sharing, preservation, and curation by making use of resources that are already available-libraries and librarians have been caring for information in many formats for thousands of years; while the formats change more rapidly these days, our core principles remain the sameOther campus units can help you with your research, but have a different focus, such as compliance with human subjects or animal use guidelines, contracts and grants, bioethics, etc.
The Data Services Program is part of the University Library’s Program of Digital Scholarship. The Data Services Program offers workshops and consultations for developing an NSF data management plan as well as data management and curation in general. In addition, we have established a data repository for IUPUI research. The repository is one of many tools available to your for preserving and sharing, if appropriate, your research data.On our website, we’ve provided links to :Sample NSF DMP from other institutionsvarious toolsGuidance from institutions like the ICPSR and Digital Curation Centre (UK)Significant publications discussing data management and curation
I want to clarify some terms so we’re all on the same page. Data management is largely seen as the purview of scientists and biostatisticians since it varies by research community and discipline. Data sharing is not an all-or-none proposition. It encompasses a wide spectrum of activities ranging from open data publishing on the internet without restriction to controlled access by pre-defined partners or collaboratorsData citation is a concept similar to citation of scholarly publications and refers to mechanisms that allowseasy reuse and verification of data (DataCite);the impact of data to be tracked (DataCite);And creates a scholarly structure that recognizes and rewards data producers (DataCite)
These policies came about as a result of broader conversations about scholarly communication. In case you aren’t familiar with the term, it refers to the processes by which we produce and disseminate information relating to teaching, research, and other scholarly activities. Our goal is to provide you with the information necessary to engage in these discussions within IU and your research communities so that you are making informed decisions about how your research gets out there, who retains rights, and who can access it.The NSF data policies are not radical deviations; they are logical steps forward towards more formal guidelines for providing public access, data management best practices, minimal requirements for sharing data, and data stewardship. http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp
IU as an institution has been engaged in discussions about scholarly communications for several years and have voiced their commitment to these issues by including data management and curation in the IT Strategic Plan.
The purpose of diagrams like this are to provide a common ground for discussion of a complex and diverse process.This represents a rough map of the research process that we all engage in.The open access conversation is focused on the dissemination of research products like peer-reviewed articles and books at the end of the research life cycle, whereas data management planning is most effective when it’s initiated before data collection begins and implemented throughout the research life cycle.
Important to know that the language was crafted to:Allow the research community to shape the implementationRole for communities of practice to develop relevant best practicesThe budget allocations and narrative should tell a cohesive story; if you identify big challenges in data storage and preservation, but do not allocate funds to address these challenges, it will likely raise a red flag for the review committee.
Ultimately, this document should demonstrate that you are aware of data management and preservation issues in general, more specifically the relevant practices within your research community or discipline, and that you have thought through how these affect your proposed project. The plans proposed should be feasible – for you and for us.
If you look at the guide I’ve provided, you’ll see these topics are broken down into a variety of specific questions to address. We’ll go through each section in more detail.It may be helpful to begin your DMP with a few sentences describing the research project in general, to provide context for the detailed information in each section.As you develop each section of your DMP, it’s important to do two things: Explain your reasoning it could just be that it’s a standard practice in your field/communityIdentify roles for data management and curation activities think about who on your team or in another campus unit will carry out the activities described; this section should identify who will be carrying out the major elements of your plan. This may include the PI, staff, students, external contractors, institutional IT, the library, and external data repositories.
In this first section, you want to describe two things: the data you will generate or use and the documentation you will create to facilitate data management and curation.Syntactic interoperability: ensures technical infrastructure (hardware, software, data formats) used to create and discover data can work together/communicate with each otherSemantic interoperability: ensures that the data can be interpreted once exchanged, through use of common data and metadata structures and content
In addition to describing your plan in the DMP, these activities should be described in working documents throughout the life of the project. Creating data documentation is easiest and most efficient at the beginning of a project. Good documentation ensures 3 things: a shared understanding of the data throughout a project; that future researchers will be able to understand data within the context they were created;that re-users of data are able to interpret the data appropriately. You don’t need to spend a lot of time or space describing the planned documentation, but it is worthwhile to mention what format it will take and who will be responsible for creating and maintaining it. This can relate to the second section describing metadata and standards.
Data screening tests: histograms, boxplots, Z-scores, etc.Research methods, even within a single lab, change over time. Good documentation can facilitate efficient data collection and processing and preserve data integrity.
CO2:Site, frequency, raw data file descriptionFinal data productFile format and sizeArthropodprocess of data collection is well described with referral to proposal for further detailPre-defined sample/note code – naming convention & unique IDMeasurements of interest are described, but not defined; common definition or practice?File naming convention and formatsProcess for transfer of files is not describedRelationship to existing data described & process for integration is included
Who created the data?What is the content of the data?When were the data created?Where is it geographically?How were the data developed?Why were the data developed?-can use flow charts for simple workflows
Ask yourself are your data self-explanatory? Consider it from the perspective of a typical reader of a journal you publish in or a colleague who might be interested in collaborating. The solution is good documentation and metadata. More frequently, the people analyzing the data are not those who collected it. Metadata and good data documentation facilitate stronger understanding of the data, facilitate quality and appropriate re-useThere are a lot of standards out there; you can ask what others in your discipline or research community are doing or contact us to see if we are aware of emerging standards. If you know you will be depositing your data in a particular repository, you can ask them what their requirements or recommendations are.
Metadata formats included; formal ISO standardContent of metadata not fully describedNo rationale
Format describedSoftware toolsFile naming conventionQuality control procedures describedData dictionaryMetadata standardNo rationale
Let’s take a look at the handout with the NSF policy language. Again, the language is broad and allows for practices to vary by research community. As you can see from the policy, data dissemination and sharing does not refer to publishing in scholarly journals. In this section, you should define what you will share, how, and the procedures for access. If you plan to use a specific data repository, they can help you develop this section; likely, they will have standard processes in place.Acceptable practices for data sharing vary by discipline; some have very mature data repositories while others rely on informal channels. Best practices for persistent access indicate more permanent and secure mechanisms than a faculty or department website. The solution at IUPUI is our data repository (IUPUIDataWorks).In terms of the access procedures, you want to think about what mechanism will be used for requests, whether registration and authentication are necessary, and what information you want to keep for your own records about those who request and receive your data. This can be useful information to demonstrate the value and impact of your research.Data sharing encompasses a wide spectrum of activities; you can decide what, when, how, and with whom you will share your data. Even if you are part of a community in which data sharing is not common practice, I urge you to think about what data might be shared or re-used without compromising your intellectual property or competitiveness. Sharing your data broadly can increase the impact of your research, benefit the institution, and your research community. Often, the value of data are unknown or unrecognized until they have been examined by a wide audience.
This section will relate to the access and sharing section, but should focus on policies and permissions for re-use, re-distribution, and production of derivatives works as opposed to the mechanisms described in the previous section. You can protect your ability to use the data for ongoing analysis while sharing as much of it as possible with your research community and the general public. While you can’t plan for every case, it is useful to imagine who might be interested in the data, how it might be used, and set up a process for handling those cases. Depending on where you decide to deposit your data, this could be very formalized or relatively informal.If you decide to share your data through a repository, often there are mechanisms built in for applyingCreative Commons licenses. This is true for our data repository as well.
CalorimeterThere are many avenues for sharing data and the results of a study; the patent process is one of them.Cave MicrobiologyThis is representative of a reference datasetDemonstrates history of sharing their dataInclude selected license for their dataPhysical samples – not required to share, but they include this infoField notes – requirement to share as public documentsSEM images – subject repositoryWebsite for educational resources
Ultimately, the impetus for data management, preservation, sharing, and curation will need to come from someone other than funding agencies. Institutions and libraries are looking at sustainable ways to fulfill our commitments and make sure that we can be good data stewards. These systems are still being developed. Realistically, we can promise that your data will be available in 10 years; in 10 years, we hope to have solutions for many of these problems.Here, you should relate the strategies you’ve outlined in previous sections to your long-term preservation strategy. This is an opportunity for you to discuss with us or an external data repository in your discipline, the long-term plan for keeping your data safe. If you are completely unsure how to approach this, feel free to contact the DSP for support. We can help you develop a feasible and appropriate preservation strategy that relies on existing services and infrastructure, whether at IU or elsewhere. A key component of your plan is the description of the cyberinfrastructure available to you and how you will use it to carry out your plan as a responsible data steward. Although your lab may be equipped to store and maintain the data for a project while it’s active, you may not have the capacity to make sure the data is preserved once the project is complete and your lab resources are dedicated to new endeavors. Neither IU nor NSF want to see scientific data lost and are investing significant effort and resources in maintaining the scientific record.These are activities that the Library specifically is invested in and equipped to do; our focus is on long-term preservation, curation, and access. What this means will likely vary by dataset, project, and lab; we’re happy to think this through with you to develop a plan that will meet the needs.
Identified an institutional repositorySelected files to be archived; could be more specificType and format of project information is vague; abstract may not be sufficient
Example of software archival/preservationWhat are “industry-standard best practices”Continued existence of writer software is not required, but reader software will be preservedIsSourceForge an institution that is likely to persist for 10, 25, 50 years?
There are a wealth of resources at IU to help you with your research. These are just a few of those relevant to data management and curation.
Meeting the NSF DMP Requirement June 13, 2012
DATA MANAGEMENT June 13, 2012PLANS & PLANNING: MEETING THE NSF REQUIREMENT
WHO ARE WE?Heather CoatesDigital Scholarship & Data Management LibrarianLiaison to the School of Public HealthUniversity LibraryKristi PalmerDigital Scholarship Team LeaderLiaison to the Department of Historyand Programs of Womens and American StudiesUniversity Library
LEARNING OBJECTIVESAfter attending this workshop: You will understand the NSF data policies. You will be aware of the relevant data -related services at IUPUI. You will have resources to develop a data management plan (DMP) for your NSF proposal(s). You will be able to write a comprehensive DMP for your NSF proposal(s). You will send your DMP draft to the Data Services Program for review and assistance as needed.
OVERVIEW Context for the NSF data policies Meeting the NSF DMP requirement The requirement: 5 elements Developing a Data Management Plan Implementing your plan Workshop Evaluation ( 5 minutes)
CONTEXT: SCHOLARLY COMMUNICATIONS Funding, funding, funding Scholarly Impact Exposure increased citation More equal access (especially for students) Facilitates reproducibility Facilitate new discoveries via secondary analysis/data re -use Foster productive collaborations Lead to new computational techniques Planning for the future If we can’t find it, it doesn’t exist Persistent access Long-term preservation of scholarly records
CONTEXT: WHY THE LIBRARY? preservation, curation, access Trusted member of the institution Organizational structure lends itself to collaboration with researchers Interdisciplinary by nature Existing infrastructure for digital information Existing expertise in preserving and providing access to information Program of Digital Scholarship Archives
CONTEXT: DATA SERVICES PROGRAM Part of the Program of Digital Scholarship Mission Identifying data issues and connecting you to the solutions Services Workshops Individual consultations Data repository Resources Guide to NSF Data Management Plan Requirement Website
CONTEXT: TERMINOLOGY Cyberinfrastructure: computing resources & networks, services, & people (see Empowering People, 2009 for more) Data management: technical processing and preparation of data for analysis Data curation: selection of data for preservation and adding value for current and future use Data citation: mechanisms to enable easy reuse and verification, track impact of data, and create structures to recognize and reward researchers ( DataCite) Data sharing: must take into account ethical and legal issues; a spectrum with many options Data stewardship:
CONTEXT: FEDERAL POLICIES Issues in scholarly communication Open access Open data & data citation Data management & curation Federal policies (incremental steps towards openness) National Research Council, 1985 Office of Management & Budget, 1999: Circular A-110 NIH Data Sharing Policy, 2003 NIH Public Access Policy, 2008 NSF DMP Requirement, 2011 Other policies: NEH, NOAA, NASA, Howard Hughes Medical Institute Wellcome Trust
CONTEXT: IU STRATEGIC PLANIU Empowering People Strategic Plan for IT (2009), Action33: “IU should provision a data utility service for research data that affords abundant near- and long-term storage, ease of use, and preservation capabilities. This data utility will need to offer a range of services for securing data, providing authorized access within and beyond IU; ensuring metadata description, annotation, and provenance; and providing backup/recovery services.”
CONTEXT: OPEN ACCESS What is Open Access? Freely available, online, and free of most copyright restrictions Why should you care? Right thing to do? Increase your citations “We analysed 119,924 conference articles in computer science and related disciplines. The mean number of citations to offline articles is 2.74, and the mean number of citations to online articles is 7.03, an increase of 157%.” (Lawrence, 2008) Publisher functions need not reside in for profit hands "Between 1975 and 2005 the average cost of journals in chemistry and physics rose from $76.84 to $1,879.56. In the same period, the cost of a gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon of gas had increased in price at the same rate as chemistry and physics journals over this period it would have reached $12.43 in 2005, and would be over $14.50 today.” (Lewis, 2008)
CONTEXT: OPEN ACCESS @ IUPUI AND IU IUPUI University Library Program of Digital Scholarship http://www.ulib.iupui.edu/digitalscholarship Open Journals IUPUIScholarWorks-Faculty Scholarship Electronic Theses and Dissertations Cultural Heritage Collections Data eArchives
CONTEXT: RESEARCH LIFE CYCLESource: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.<http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf>.
CONTEXT: BENEFITS OF PLANNING Saves time Less reorganization down the road Increases efficiency Gathers necessary information for analysis and writing Prevents problems in understanding data and metadata Prevents data loss If you have a plan, you are more likely to back up your data Makes it easier to preserve your data Documentation is more easily created throughout a project Metadata generation can be automated or incorporated into procedures Requirements of some funding agencies and institutions
DMP: INTERPRETING THE POLICY Why? Increased impact of research money Reduce redundant data collection Enhance use and value of existing data Further scientific research Data gathering tool What kinds of data are we collecting? How are researchers collecting, managing, and preserving data? What are community norms? Language is broad to allow input from research communities Implementation costs of the DMP CAN be included in direct costs
DMP: KEEP IN MIND The gist of it… Describe what you will do with your data during and after the proposed project Ensures data is safe now and in the future DMP should reflect… Awareness of data management and curation in your discipline Feasible plan to utilize available cyberinfrastructure Try to… Explain the rationale for your choices Identify roles for data management and curation activities
DMP: ELEMENTS Types of data Standards and metadata Access and sharing Re-use, re-distribution, and the production of derivatives Long-term preservation [Budget]
DMP: TYPES OF DATA  Use standards common in your research community Characterize the data Types of data experimental, observational, raw or derived, models, simulations, curriculum materials, software, images, audio, video, etc. File formats (i.e., text, spreadsheet, database, etc.) How much data? (# of files, total size) Will the data be reproducible? Relationship to existing data? (i.e., interoperability) Syntactic Semantic
DMP: TYPES OF DATA  How will data be collected? How? (tools, instruments, measurements, etc.) When? (timeframe, series) Where? (sites, settings) How will data be processed? Workflows (brief overview using flow chart) Software packages How will the data be stored and managed? File naming conventions Version control
DMP: TYPES OF DATA  What QA & QC measures will be used? Identify steps during processing and analysis to eliminate missing data points, identify outliers, and provide statistical summaries (e.g., double data entry, histograms, scatterplots) Before data are collected, define and enforce standards and assign responsibility During project, document processes and any changes or deviations What is the backup and security plan? Identify particular security or confidentiality issues Describe location & frequency Roles & responsibilities Who will carry out data collection, processing, and backup activities?
EXAMPLE: TYPES OF DATAAtmospheric Concentrations of CO2, Mauna LoaObservatory, Hawaii, 2011 -2013https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Formatted.pdfArthropod responses to grassland nutrient limitationhttps://www.dataone.org /sites/all/documents/DMP_NutNet_Formatted.pdf
DMP: STANDARDS & METADATA  Metadata describes the who, what, when, where, how, why of the data Include workflow: how you get from raw data to final products Purpose: enable finding, organization, interoperability, identification, archiving & preservation Standards are commonly agreed upon terms and definitions in a structured format Dublin Core (commonly used by libraries) Darwin Core (geographic occurrence of species) EML (ecology) Data Documentation Initiative (DDI; social sciences) IEEE LOM (learning objects metadata)
DMP: STANDARDS & METADATA  Ask yourself: will your datasets be self -explanatory or understandable in isolation? Decisions to make about metadata Relevant standard(s) Format Content What information is needed to use and interpret in 5 years, 25 years? How are metadata created? Automatically generated Manually created
EXAMPLE: STANDARDS & METADATA Atmospheric Concentrations of CO2, Mauna Loa Observatory,Hawaii, 2011-2013https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Formatted.pdfMetadata will be comprised of two formats —Contextualinformation about the data in a text based document and ISO19115 standard metadata in an xml file. These two formats formetadata were chosen to provide a full explanation of the data(text format) and to ensure compatibility with internationalstandards (xml format). The standard XML file will be morecomplete; the document file will be a human -readable summary ofthe XML file.
EXAMPLE: STANDARDS & METADATA R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di umhtt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdf M i c ro s o f t A c c e s s D ata b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n di t i s co m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f t wa re /a rc g i s / i n d ex . ht m l ) , aG e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n gco nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a taf i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s top ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d a ta b l e d a tas o u rc e s a re ref re s h e d . Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a ni d e nt i f i e d d a ta s et . Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r m ed u s i n g S Q Ls t a te m e nt s t h a t ca p i ta l i ze o n t h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s ei nte g r i t y. A p p ro p r i ate p r i m a r y key s w i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s .Po t e nt i a l d u p l i ca te s i te I D s , w i l l b e h a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h ec re a t i o n o f a l te r n a te I D ta b l e s . A d ata d i c t i o n a r y w i l l b e c re a te d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l ef i e l d s , a n d ta b l e f i e l d d a ta t y p e s . A n e nt i t y - re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a td ef i n e s t h e re l a t i o n a l s t r u c t u re o f t h e d a ta b a s e . A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h ee nt i re g e o d a ta b a s e . T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nts t a n d a rd s .
DMP: ACCESS & SHARING What are your obligations for sharing? Funding agency, institution, other organization, legal, etc. What are the ethical or legal issues? (i.e., privacy, confidentiality, security, intellectual property, or other rights) How will the data be made available? What is the process for gaining access? When will the data be made available? When will the data become available? For how long will the data be available? What is the process for gaining access? Who will have access to the data?
DMP: RE-USE, RE-DISTRIBUTION, ETC. What rights will you retain before data is made available? Will permission restrictions be necessary? Limits or conditions for political, commercial, or patent reasons? Is there an embargo period? Why? Future users and uses Who might be interested in the data? How might you anticipate this data being used? What value might the data have for these people?
EXAMPLE: ACCESS, SHARING, RE-USEDevelopment of a NanoKlein Calorimeterhttp://libguides.unm.edu/content.php?pid=137795&sid=1422879 We expect to apply for a patent for this instrument. All of the materials submitted as part of the patent process will be a matter of public record. We will also make technical drawings, test data and calibration data available through our institutional repository.Cave Microbiologyhttp://libguides.unm.edu/content.php?pid=137795&sid=1422879
DMP: LONG-TERM PRESERVATION Project-based funding does not lend itself to long -term preservation. What data will be preserved? What transformations are necessary to prepare the data? How long do you think the data will be useful? How long will the data be preserved? Contextual information needed to make the data reusable metadata, references, reports, manuscripts, grant proposal, etc. Where will it be preserved? Links to published materials and other outcomes? Use of persistent citation? Procedures for preservation and back-up? Who will be the contact for the dataset?
EXAMPLE: LONG-TERM PRESERVATION Arthropod responses to grassland nutrient limitationhttps://www.dataone.org /sites/all/documents/DMP_NutNet_Formatted.pdfWe will preserve both arthropod datasets generated during thisproject (abundance and stoichiometry) for the long term in theDigital Conservancy at the U of M. We will include the .csvfiles, along with the associated metadata files. We will also submitan abstract with the datasets that describe their original contextand any potentially relevant project information. Borer will beresponsible for preparing data for long -term preservation and forupdating contact information for investigators.
EXAMPLE: LONG-TERM PRESERVATION Improving the long-term preservability of HDF-formatted data bycreating maps to file contentshttps://www.dataone.org /sites/all/documents/DMP_HDFMap_Formatted.pdfThe writer software will be preserved by the HDF Group for the lifeof the HDF libraries. The HDF Group uses industrystandard bestpractices to ensure the integrity of their software and systems.Once the map writer has been used to generate maps for everyHDF file in existence, the continued existence of the writersoftware is not required. The reader software will be preserved atSourceForge.org for as long as there is community interest. Thecollection of HDF files will be preserved at NSIDC as long as utilityis deemed high.
IUPUIDATAWORKS Institutional repository that can facilitate subject repositories Policies are being developed, informed by faculty needs Pilot projects More support at little/no cost Flexibility in what we are willing to do New tools to demonstrate impact of research The future Standardized levels of service Standardized policies, responsive to faculty needs Cost recovery for significant intellectual/time investment
IMPLEMENTING YOUR PLAN  The DMP is a working document NSF expects progress to be reported (progress reports, final reports, new grant proposals) Incorporate implementation into the project startup process C&G, IRB, IACUC all have to be in place before data collection can begin Review, revise, and set up your system during startup Good documentation ensures… A shared understanding of the data throughout a project That future researchers will be able to understand data within the relevant context That re-users of data are able to interpret the data appropriately
IMPLEMENTING YOUR PLAN Research File System: http://pti.iu.edu/storage/rfsScholarly Data Archive: http://pti.iu.edu/storage/sdaResearch Technologies, UITS: http://uits.iu.edu/page/avelCore Ser vices, UITS: http://pti.iu.edu/csScholarly Cyberinfrastructure, UITS: http://uits.iu.edu/page/ameeC TSI Tools: http://www.indianactsi.org /rct (Alfresco Share, REDCap )Program of Digital Scholarship: http://ulib.iupui.edu/digitalscholarshipCenter for Research & Learning: http://crl.iupui.edu/OVCR: http://research.iupui.edu/development/Office of Academic Affairs: http://www.academicaffairs.iupui.eduIntellectual Property Policy: https://www.indiana.edu/~vpfaa/ academicguide/index.php/Policy_I-11IUWare: https://iuware.iu.eduIUanyWare: https://iuanyware.iu.edu/vpn/index.htmlStatMath: http://www.indiana.edu/~statmath/Statistics Consulting Center: http://www.math.iupui.edu/asci/
PRACTICAL TOOLSLynda.com tutorials: http://ittraining.iu.edu/lynda/default.aspx Cleaning Up Your Excel Data (2010) Managing & Analyzing Data in Excel (2010) Data Validation in Depth (2010)DMPTool: https://dmp.cdlib.org /DMPOnline: https://dmponline.dcc.ac.uk/UK Data Archive Costing Tool:http://www.data-archive.ac.uk/media/257647/ukda_jiscdmcosting.pdfCreative Commons Licenses & Data:http://wiki.creativecommons.org /DataLicensing Research Data, Digital Curation Centrehttp://www.dcc.ac.uk/resources/how -guides/license-research-dataCIC Author Addendumhttp://www.cic.net/authors
RECOMMENDED READINGUK Data Archive: Managing & Sharing Data Brochure:http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
MORE RESOURCES National Science Board, Digital Research Data Sharing & Management, 2012 (pre-publication): http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf Committee on Science, Engineering, and Public Policy (U.S.). (2009). Ensuring the integrity, accessibility, and stewardship of research data in the digital age. Washington, D.C.: National Academies Press. National Science Board Committee on Strategy and Budget Task Force on Data Policies. (2011). Digital Research Data Sharing & Management. Washington, D.C.: National Science Board. America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Reauthorization Act of 2010, Pub. L. No. 111 -358. 124 Stat. 3982 (2010). Retrieved from the Library of Congress Thomas database .
REFERENCES1. Higgins, S. ( nd). What are metadata standards. http://ww w.dcc.ac.uk/ resources/bri efing -papers/standards -watch-papers/what -are- metadata - standards2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles. Retrieved from http://ww w.dcc.ac.uk/about -us/dcc- charter.3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced Cyberinf rast ructure. Retri eved from http://pti.iu.edu/cyberinf rast ructure.pdf.4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s Strategic Plan for Information Technology. Retrieved from http://ovpit.iu. edu/st rategic2/ .5. National Science Foundati on. (2011 ). Award and Administration Guide: Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved from http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .6. Lawrence, S., Free online availability substantially increases a paper ’s impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e - access/Articles/lawrence.html (accessed November 5, 2008,)7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y communication: Transformati ons in academic publishing." C&RL News, May 2008, Vol. 69, No. 5. [Available at: http://ww w.ala.org /ala/mgrps/di vs/acrl/publicati ons/crlnews/ 2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]
COMPELLING CASES FOR OPEN DATASPARC, Research is more valuable when it ’s shared:http://www.arl.org /sparc/greaterreach/index.shtmlTim Berners-Lee: http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlOpen-source cancer research: http://www.ted.com/talks/jay_bradner_open_source_cancer_research.htmlPolymath problem blogs:http://polymathprojects.org /about/http://stevekochscience.blogspot.com/2011/02/open -data-success-story.htmlhttp://eaves.ca/2011/09/07/the -economics-of-open-data-mini-case-transit-data-translink/
THANK YOUTell us what you think, take a brief survey.Find us @http://ulib.iupui.edu/digitalscholarship/dataservicesHeather Coates, firstname.lastname@example.org, 317-278-7125Kristi Palmer, email@example.com, 317-274-8230IUBStacy Konkiel, firstname.lastname@example.org, 812-856-5295
EXTRA: NIH DATA SHARING POLICY $500,000 or more in direct costs in any year of the proposed research Final research data, not summary statistics or tables, not underlying pathology reports and other clinical source documents, might include both raw data and derived variables If an application describes a data -sharing plan, NIH expects that plan to be enacted. NIH expects the timely release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset. It is the responsibility of the investigators, their Institutional Review Board (IRB), and their institution to protect the rights of subjects and the confidentiality of the data. Prior to sharing , data should be redacted to strip all identifiers, and effective strategies should be adopted to minimize risks of unauthorized disclosure of personal identifiers.
EXTRA: NIH DATA SHARING PLAN describe briefly the expected schedule for data sharing the format of the final dataset the documentation to be provided whether or not any analytic tools also will be provided whether or not a data -sharing agreement will be required if so, a brief description of such an agreement (including the criteria for deciding who can receive the data and whether or not any conditions will be placed on their use) mode of data sharing (e.g., under their own auspices by mailing a disk or posting data on their institutional or personal website, through a data archive or enclave) Applicants may request funds in their application for data sharing.
RESOURCESNational Institutes of Health, Data Sharing Policyhttp://grants.nih.gov/grants/policy/data_sharing /data_sharing_guidance.htmNIH Public Access Policy Implicationshttp://publicaccess.nih.gov/public_access_policy_implications_2012.pdf