Your SlideShare is downloading. ×
RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

777
views

Published on

Research Data Access and Preservation Summit, 2014 …

Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Lightning Talks

William Mischo, University of Illinois at Urbana-Champaign

Published in: Education

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
777
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014 William H. Mischo, Mary C. Schlembach, Megan A. O’Donnell University of Illinois at Urbana-Champaign Iowa State University
  • 2. NSF data Management Plans • Data Management Plans (DMPs): required element in NSF proposals, January 2011 • July 2011: the Library, working with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of DMPs in submitted NSF grant proposals • Currently, looked at 1,600 grants with 1,260 in the analysis.
  • 3. Reasons for DMPs • Make key research data available and sharable • Allow the use of data for verification of results and reproducibility of research work • Agency can show significant return on investment to justify funding • We want to know storage venues and mechanisms for sharing and reuse • Also use of local templates and local campus resources such as IDEALS
  • 4. Follow-on • Develop campus-wide infrastructure (Research Data Service - RDS) to support UIUC researchers in managing their data • Assist in compliance with federal agencies • Develop important partnerships with campus units (CITES, NCSA, Colleges) and national entities • Develop best practices and standard approaches
  • 5. Analysis • Analysis attempts to characterize and classify DMPs into categories • DMPs assigned multiple categories • 1,260 DMPs from July 2011 to November 2013
  • 6. Categories • PI Server – Servers and workstations that the PIs (and their students/staff) use to store project data. Examples: laboratory server, external hard drive, and group computer. • PI Website – Websites edited or administered by the PI or a group they belong to. If a departmental URL was given, it was also given the term “department.” Examples: lab website, project website, wiki, PI’s website
  • 7. Categories • Campus – Services located, operated by, run by UIUC or endorsed by UIUC. This includes IDEALS, netfiles and Box.net, NCSA, and Beckman. • Department – Used when a department was specifically mentioned as providing a storage or hosting resource. Examples: Departmental website, departmental server, departmental backup service or a web address traced back to an academic department. Also given the “campus” label.
  • 8. Categories • Remote – Services and sites not located on the UIUC campus. Examples: NASA, other campuses, collaborative projects, non-UIUC institutes • Disciplinary – Disciplinary repositories. Many are open access but not all. Examples: GenBank, arXiv, ICPSR, SEAD, Nanohub, and Dryad • Cloud – Storage services using cloud technology. Examples: Google Documents, Google Code, Box.net, Amazon, Microsoft, Dropbox
  • 9. Categories • Publication – Scholarly outputs including journal articles, workshops, and conference presentations or posters. Very few DMPs were explicit as to how their “publications” and data were related or separated. • Analog - Physical records including lab notebooks, photographs, and files. Does not include specimens or artifacts. • Specimens - – Physical specimens; usually biological or artifacts
  • 10. Categories • Optical Disc - DVD, CD, and Blu-ray discs. Often used as a backup mechanism • Not specified – the DMP was not specific enough for us to record details • No Data – Indicated the proposal will produce no data products. Many were theoretical studies (math), travel grants, or workshop planning sessions. • Local Template Used
  • 11. All DMPs (including “no data”) n = 1260 Category Number Percent PI Server 503 39.9% PI Website 529 41.9% Campus 667 52.9% Department 142 11.2% Remote 353 28.0% Disciplinary 275 21.8% Publication 556 44.1% Cloud 63 5.0% Optical Disc 56 4.0% Analog 131 10.4% Specimens 111 8.8% Not Specified 66 5.2% Collaborative 164 13.0% No Data 103 8.2%
  • 12. Data Venue and Risk Data Location Submitted Proposals Funded Proposals Since July 2011 n = 1260 Risk of Loss, Corruption, Breach n = 298 Risk of Loss, Corruption, Breach PI Server/Website 64% High 61% High Departmental Server/Website 11.2% Medium to High 7% Medium to High Campus-Wide Resource 52.9% Low 45% Low IDEALS Institutional Repository 21.9% 19.8% NCSA 4.3% 16.4% Disciplinary Repository/Cloud 25.8% Medium to Low 21.4% Medium to Low Remote Repository 28% Medium to High 22.8% Medium to High Optical Disk, Specimens, Analog 19.4% Out of Scope 11% Out of Scope
  • 13. Notables • Funded: 298 • Used locally developed template: 254 • IDEALS: 275 • NCSA/XSEDE: 55 • Dryad: 22 • ICPSR: 17 • Genbank/Genetics Repository: 55 • ArX: 61 • Only 87 DMPS contained information about file types
  • 14. Analysis • Any differences in storage venue or technologies between the unfunded proposals and the funded proposals? • Any differences between the proposals from the first year and the more current proposals? • Can look at differences in any of the proposal categories between funded and unfunded • 734 active NSF awards, $861.8 million
  • 15. Analysis • Use of IDEALS institutional repository: 62 funded, 197 not funded: chi-square: 0.17 • Storing data on PI server or website: 183 funded, 569 not funded: chi-square: 0.7 • Disciplinary or Cloud: 67 funded, 241 not funded: chi-square: 0.85 • Remote storage: 68 funded, 267 not funded: chi-square: 3.01
  • 16. Analysis • Use of IDEALS before August 2012 = 108, after (thru November 2013) = 166, chi-square: 4.59, p < .05 • Use of disciplinary or Cloud before August 2012 = 121, after = 182, chi-square: 4.33, p < .05
  • 17. Implications • Conclusions: 1: no significant differences between funded/unfunded proposals in storage venues -- no advantage in IDEALS, Disciplinary; 2: more recent proposals suggest IDEALS and disciplinary repositories included at a significantly higher level • What is the role of the library? The campus? The subject discipline? • Connecting data to the literature important