Chen RDAP11 NSF Data Management Plan Case Studies


Published on

Eric Chen, Cornell; NSF Data Management Plan Case Studies; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Source material from Gail SOutline – background (brief) why we’re doing this Cornell context and the structure we came up with to support these requirements Early results Observations – challenges, questionsteinhart, could not be here in person today.
  • Before the NSF DMP was announcedStarted with a group of us had already been exploring how we could cooperate to support the needs of data-driven science (our other head start).The focus of the DRSG has been twofold: assessment of researchers’ needs with respect to CI and data mgmt (series of interviews), and pilot projects on support for data-driven science.While not specifically focused on data mgmt planning, it had already brought together many of the right groups on campus that would comprise the cross-institution response
  • NSF announces DMP plan requirement that lead to the question of how to respond to the needs of researchers to create a plan.
  • Like many of our peers, Cornell has many groups that provide data management services from central campus IT, research computing, and library computing to name a few.The conversation started with the question of how would a researcher navigate through the various service groups to create a data management planHow to present them with a single point of contact so they don’t have to navigate something that looks like this?At Cornell, this is what this could look like for a researcher attempting to piece together the services they need to develop a robust data management plan. (+ have left out some smaller, more specialized service providers – SRI, CBSU, CCTEC…)
  • Solution - new VO; meant to address two main issues: distribution of resources – making that appear seamless to researchers by providing a single point of contact for assistance with data management. identifying and filling gaps in services in a coordinated way.This comes out of a proposal we submitted to the Vice Provost for Research and the University Librarian; available on website.
  • Here is a overview of the virtual organization. At the top we have the sponsors of the group that include the Vice Provost for Research, University Librarian, and the Faculty Advisory Board.VO itself is comprised of a management group: the management council (reps from major service providers; these are people who can allocate staff and resources to get work done), and a staff coordinator, that’s me!, to hold it all together.Implementation teams are charged by the management group to get work done for the RDMSG.… of course all of existing service providers – which may do some work related to the RDMSG, but also provide services outside of the RDMSG.One thing to note about this new organization is no new $ – participants have all accepted this structure and the work that comes with it as consistent with their mission and purpose.
  • Think of it as a concierge service for data management.Wikipedia / re: hotel concierges: “ a concierge is often expected to "achieve the impossible", dealing with any request a guest may have, no matter how strange, relying on an extensive list of contacts with localmerchants and service providers.”
  • Just to give you an idea of some of the things this new group has been working on… Survey: to better understand the potential impact on existing services, and to identify service gaps Website (still early on that) and single point of contact email Small and growing pool of consultants to field help requests – these are people with particular subject or IT expertise Emails to the ‘help’ email address generate a help ticket; triage process to route help requests to qualified consultant Ran three information sessions for prospective PIs and other interested staff; total attendance >100, slides and video of one session on the web Convened faculty advisory board Agreed to establish implementation teams to work in a handful of areas that support the RDMSG
  • We haven’t had a change to really look at all of the information yet, but can share a few preliminary results.Left: gives a sense of who responded to the surveyRight: significant interest in support for data mgmt planning (and some uncertainty)
  • “Not sure” is the most consistent winner when asked which approach(es) would PI use to share data
  • One kind of think we can get out of this exercise: most respondents weren’t sure about using the library’s IR (several commented that they don’t know what it is).Look at “yes” – blue – by size bins. IR has limit of ~50MB per upload, and w/o special intervention from sysadmin, files are uploaded one at a time by contributor. Several researchers planning to use IR for some ~sizable data collections – so anticipate managing expectations WRT services, and redirecting users to more appropriate services (could become a problem when redirecting from free to fee services).
  • FAQs (bold are big winners), persistently asked Qs, and comments.Lot of basic questions and a lot of uncertainty.
  • Lack of detail: will NSF provide any additional guidance, feedback, examples? Have to advise w/o having seen good and bad DMPs. Would like to hear more on this from NSF (note directorate guidelines are often nearly as vague as umbrella policy).Longer term needs: challenge for us to come up with business models / cost structures to support. Princeton has one, but addresses only simple storage, not preservation. Business models for preservation are active area of research and complicated – see blue ribbon panel report – so this is a tough problem.Interim solutions: good to have movement in this area – if it takes policy/requirement to make that happen, that’s good. Have to balance requirements with good decision making (which can take some time). Concern: interim adoption of mediocre solutions, can be difficult to migrate or reverse engineer.
  • Chen RDAP11 NSF Data Management Plan Case Studies

    1. 1. Research Data Management Service Group<br />Cross-institution response to NSF Data Management Plan<br />Eric Chen<br />Analyst Consultant for Data-Driven Science<br />Cornell University<br />March 31st, 2011 <br />
    2. 2. Responding to NSF DMP<br />Background<br />Cross-institution response<br />Current activities<br />Next steps<br />
    3. 3.
    4. 4. NSF DMP<br />
    5. 5. How to respond?<br />Existing data management service providers<br />Cornell University Library<br />Center for Advanced Computing<br />Institutional Review Board<br />Office of Sponsored Programs<br />Vice Provost for Research<br />Cornell Inst. for Social & Econ. Research<br />DISCOVER Research Service Group<br />Cornell <br />IT<br />Weill Cornell Medical College IT<br />
    6. 6. RDMSG<br />Research Data Management Service Group (RDMSG)<br />Virtual organization<br />Comprise existing campus data management service providers<br />
    7. 7. RDMSG<br />Vice Provost for Research<br />University Librarian<br />Faculty Advisory Board<br />Sponsors and advisors<br />Management Group<br />RDMSG Virtual Organization<br />Services assessment<br />Outreach and training<br />others as appropriate<br />Implementation teams<br />Service Providers<br />others as appropriate<br />Management Council<br />Staff Coordinator<br />CISER<br />CIT<br />CAC<br />CUL<br />
    8. 8. <ul><li>Data management planning
    9. 9. Storage and backup
    10. 10. Metadata
    11. 11. Data analysis
    12. 12. Collaboration tools
    13. 13. High performance computing
    14. 14. Privacy and confidentiality
    15. 15. Intellectual property
    16. 16. Data publication</li></ul><br />
    17. 17. RDMSG Activities<br /><ul><li>Survey of active NSF PIs
    18. 18. RDMSG website,
    19. 19. Consultant pool
    20. 20. Ticketing and triage system
    21. 21. Information sessions on NSF DMPs
    22. 22. Faculty advisory board
    23. 23. Implementation teams:
    24. 24. Outreach and training
    25. 25. Documentation
    26. 26. Intellectual property and copyright
    27. 27. Survey report
    28. 28. Sharing and access
    29. 29. Long-term preservation</li></li></ul><li>RDMSG Activities<br /><ul><li>Survey of active NSF PIs
    30. 30. RDMSG website,
    31. 31. Consultant pool
    32. 32. Ticketing and triage system
    33. 33. Information sessions on NSF DMPs
    34. 34. Faculty advisory board
    35. 35. Implementation teams:
    36. 36. Outreach and training
    37. 37. Documentation
    38. 38. Intellectual property and copyright
    39. 39. Survey report
    40. 40. Sharing and access
    41. 41. Long-term preservation</li></li></ul><li>Preliminary Survey Results<br />Directorate of most recent award<br />DMP support?<br />
    42. 42. Sharing strategies<br />Disciplinary data center<br />Journals<br />Institutional repository<br />CAC disk farm<br />Custom<br />
    43. 43. Institutional Repository<br />Institutional repository<br />Intent to use IR by amount of data<br />
    44. 44. Information Sessions<br />3 Sessions about NSF DMP requirement<br />“common-sense” interpretation<br />Over 100 participants from 60 different campus affiliations<br />Over 30 questions<br />
    45. 45. Information Sessions<br />What counts as data? Does that include raw data?<br />I publish my findings in journals, so I already share my data.<br />How do we budget for data storage and access beyond the end of a grant?<br />For how long should data be archived?<br />What about the additional burden on reviewers?<br />What if your research plans change so that the original DMP is no longer relevant?<br />
    46. 46. Next steps<br /><ul><li>Lack of detail makes support difficult
    47. 47. Finite award period vs. longer term needs
    48. 48. Potential for interim / less than optimal solutions</li></li></ul><li>Questions?<br /><br /><br />