DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)

Uploaded on


More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Change Title, webinar #, and date in green text at top
  • This led to a collaboration between Purdue Libraries and Graduate School of Library Information Science UIUC – launched in the Fall of 2007.“who will share what (kinds of data) with whom, and when?”“[Beyond] technical solutions for curating… data, there are numerous social and organizational challenges to address with supporting faculty in depositing their data into an institutional repository (IR). This project will [engage with] a cross disciplinary segment of researchers [interested in] sharing, archiving or disseminating various levels of research data.”
  • Point out the overlap between Purdue & UIUC in some subjects – goal to investigate commonalities between disciplines at different institutions.Also – each institution interviewed several faculty from a particular department to investigate commonalities between researchers. Purdue – 4 in Agronomy; UIUC – 3 in Geology.This was mostly a convenience sample for Purdue – people we already had contacted in the past. In some cases (most cases for UIUC), we had the library liaisons associated with this project help us recruit faculty.
  • Again, the first interviews were very broad in nature (deliberately). These are the general categories of questions for the first interview.
  • These are examples of the types of needs that we were trying to get additional information about from the 2nd interviews.Few people, convenience sample, NOT STATISTICALLY ACCURATE….. However, makes us wonder, and want to learn more; thus the Profile.(Note: our n=19, but our graph goes up to 20. Need to fix this?)
  • More examples of needs from the 2nd interview
  • There are 2 overarching sections of the Data Curation Profile – The 1st section is “Data”Overview of the Research – designed to provide context for the dataData Kinds and Stages – a description and information about the data set itself - Data lifecycle: The stages that the data pass through during its life (more on this later)- Target Data: at what stage in the data’s lifecycle would the researcher be willing / able to shareValue: what real or potential value would the data set have for others (in the same field, in a different field, to the general public)Contextual Narrative: Additional information that be relevant for the reader of the profile to better understand the data, as well as the disciplinary norms and practices with data sets in the researcher’s field.
  • The data table – (this is a simplified data table from the Traffic Flow Profile)It’s meant to be an easy to read summary of the data lifecycle, highlighting the key attributes of the data (more on this later)
  • The 2nd section of the DCP: NeedsThese are the areas that were identified by faculty in the interviews.Information about data sharing is the heart of the profile and is captured throughout, rather than in one specific section. We do not have time today to go through every individual section. We will be spending some time later this afternoon discussing the “Intellectual Property” and the “Organization and Description of data” sections.
  • Feel free to customize this to your taste and with the info you want!


  • 1. Logistics for Webinar You must call in for audio: 866-740-1260 access code 9870179# Participants muted Ask questions in chat any time 20 minutes for Q&A Recording & slides, schedule of webinars: DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS 13 August 2013
  • 2. 28 May Introduction to the DMPTool 4 June Learning about data management: Resources, tools, materials 18 June Customizing the DMPTool for your institution 25 June Environmental Scan:Who's important at your campus 9 July Promoting institutional services; EZID Outreach Made Simple! 16 July Health Sciences & DMPTool - Lisa Federer, UCLA 30 July Digital humanities and the DMPTool - Miriam Posner, UCLA 13 Aug Data curation profiles and the DMPTool – Jake Carlson, Purdue 27 Aug Talking points for meeting with institutional stakeholders 10 Sep Tools and resources that work with/complement the DMPTool Beyond funder requirements: more extensive DMPs Case studies 1 – How librarians have successfully used the tool Case studies 2 – How librarians have successfully used the tool Outreach Kit Introduction Certification program introduction
  • 3. Data Curation Profiles & the DMPTool Jake Carlson Associate Professor of Library Science / Data Services Specialist Purdue University Libraries DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS 13 August 2013
  • 4. Road Map • History / Background of the DCP Toolkit • Comparing the DMP and the DCP • Case Study in using the DCP
  • 5. “Investigating Data Curation Profiles across Research Domains” • Awarded in 2007 to Purdue Libraries and Graduate School of Library and Information Science at UIUC • Goals of the project: – To understand the practices, attitudes and needs of researchers in managing and sharing their data. – To Identify possible roles for librarians to facilitate data sharing and curation. – To develop a tool for librarians to gather information on researcher needs for their data.
  • 6. Interview areas: 20 faculty, 12 disciplines Agronomy & Soil Science (Purdue & UIUC), Anthropology (UIUC), Biochemistry (Purdue), Biology (Purdue), Civil Engineering (Purdue), Earth & Atmospheric Sciences (Purdue & UIUC), Electrical & Computer Engineering (Purdue), Food Science (Purdue), Geology (UIUC), Horticulture & Plant Science (Purdue & UIUC), Kinesiology (UIUC), Speech and Hearing (UIUC)
  • 7. What we asked … • Research Data Lifecycle (story of the data) • Characteristics of the Data • Data Management / Storage • Data Dissemination and Sharing • Data Preservation and Repositories • Roles for Libraries and Librarians
  • 8. The ability to cite this dataset in my publications The ability for researchers within my discipline to easily find this dataset The ability for researchers outside of my discipline to easily find this dataset The ability for people to easily discover this dataset using Google Prioritize your needs for the following types of services Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA. n=19
  • 9. Prioritize your needs for the following types of services The ability for me to submit this dataset to a repository myself The process of submitting this dataset to a repository is automated The ability to make these data accessible in multiple formats The ability of the repository to provide version control for the data Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA. n=19
  • 10. An interview based tool for gathering: • Information about a particular data set. • What a researcher is doing to manage / curate the data set. • What a researcher would like to do with the data.
  • 11. DCP Sections • Information about the Data and its Context –Overview of the Research • Focus • Intended Audience • Funding –Data Kinds and Stages • Data Narrative (data lifecycle) • Target Data for Sharing • Use/re-useValue • Contextual Narrative
  • 12. Data Stage Output Typical File Size Format Other / Notes Primary Data Raw Sensor data 100k in 1 file per day proprietary to the sensor FTP downloads are mostly automated. Processing Stage 1 Sensor data – open/acces sible format Roughly 6kb .csv / .xls Data are formatted into .csv before bring reformatted into a mySQL database. Processed Data vectors 800 records per intersection per day. SQL / .xls Data are extracted from the mySQL database for analysis purposes. Analyzed charts/ Graphs .xls / .emf charts and graphs used for interpretation. Published charts/ graphs .ppt Data are presented via power point. Ancillary Data Image Stills taken from video .gif /.jpg / .ppt Images generated from video.
  • 13. More DCP Sections  Information about Needs –Intellectual Property –Organization and description of data –Ingest –Access –Discovery –Tools –Interoperability –Measuring Impact –Data Management –Preservation
  • 14. Context • Focused on a specific context: developing a data management plan for submission to a funding agency. • Focused on a broad context: understanding the researcher’s data and needs well enough to respond.
  • 15. Timing • For use in the “Planning Stages” of the Data Lifecycle • For use in the “Active Data Stages” of the Data Lifecycle
  • 16. “The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.
  • 17. Structure • The DMPTool’s structure is based on the specific elements of the agency’s data management plan. • The DCPToolkit is modular in nature. Questions and sections can be changed.
  • 18. Level of Investment • Generating a DMP using the DMPTool is a short term investment. • Generating a DCP is a longer term investment, but with a potentially large payoff.
  • 19. Sharable Output • Data management plans are intended to be submitted to a funding agency, not to be shared publicly. • Data curation profiles are intended to be shared with others.
  • 20.
  • 21. • Both tools seek to help researchers identify and address needs in managing and curating data. • In particular, both tools aim to foster the creation of data that are discoverable, accessible, well-described and usable by others.
  • 22. “The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting Group.
  • 23. • Both tools can be used to help librarians connect with researchers about their data. • Both organizations recognize and support the roles of librarians in providing services to support the data lifecycle.
  • 24. Case Study: Water Quality Field Station with Marianne Bracke Agricultural Sciences Information Specialist Associate Professor of Library Science Purdue University Libraries
  • 25. The Water Quality Field Station  On a 991 acre farm facility northwest of Purdue opened in 1992.  Used to identify agricultural practices that minimize movement of AG chemicals into water supplies.  Informs the development of new and more ecologically-balanced technologies for crop production.
  • 26. Graduate Students  Graduate students are on the front lines of data.  Sharing data locally, between graduate students, was challenging to do.
  • 27. Project Steps Utilize Data Curation Profiles to collect information about current data gathering, workflow and documentation. Identify common issues and needs as observed in the Data Curation Profiles. Produce a report with recommendations and possible approaches to addressing issues and needs Identify Assess Analyze
  • 28. Identify  6 interviews with Graduate Students conducted in summer of 2011.  Developed Data Curation Profiles from these interviews.  Reviewed DCPs for needs.
  • 29. Analyze  There is a lack of clear and shared expectations on how data should be documented, described and organized.  Locally – variation of practice by individual by circumstance, previous training / experience, intended use, etc.  Discipline – there is a lack of standards specifically for Agronomy data.
  • 30. Analyze  Data are not being generated or processed in ways that could facilitate sharing externally, or even locally at Purdue or within the lab.  Inheriting data from previous graduate students was common and potentially problematic.  Many graduate students who had received data reported some problems understanding or making use of the data.
  • 31. Analyze  Graduate Students stated that they lack knowledge and skills of how they should document, describe, organize and manage their data.  These activities tend to be done in relative isolation from the lab, or even the advisor.  Physical lab notebooks are still the primary means of documentation / provenance.
  • 32. Assess
  • 33. DMP & DCP Connections May uncover issues that merit further investigation through a DCP. Uncovering data management issues could inform data management planning.
  • 34. Another Case Study with DCPs
  • 35. Thanks! Any Questions? Jake Carlson Associate Professor of Library Science / Data Services Specialist Purdue University Libraries DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool Sponsored by IMLS 13 August 2013
  • 36. webinar-series From Flickr by Jeff Keacher In 2 weeks: Talking Points for Meeting with Stakeholders Presenter: Dan Phipps Tuesday 27 Aug @ 10am PT
  • 37.
  • 38. Email Twitter Blog Facebook @TheDMPTool Questions?