Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake Carlson)
1. Logistics for Webinar
You must call in for audio:
866-740-1260 access code 9870179#
Participants muted
Ask questions in chat any time
20 minutes for Q&A
Recording & slides, schedule of webinars:
blog.dmptool.org/webinar-series
DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool
Sponsored by IMLS
13 August 2013
2. 28 May Introduction to the DMPTool
4 June Learning about data management: Resources, tools, materials
18 June Customizing the DMPTool for your institution
25 June Environmental Scan:Who's important at your campus
9 July Promoting institutional services; EZID Outreach Made Simple!
16 July Health Sciences & DMPTool - Lisa Federer, UCLA
30 July Digital humanities and the DMPTool - Miriam Posner, UCLA
13 Aug Data curation profiles and the DMPTool – Jake Carlson, Purdue
27 Aug Talking points for meeting with institutional stakeholders
10 Sep Tools and resources that work with/complement the DMPTool
Beyond funder requirements: more extensive DMPs
Case studies 1 – How librarians have successfully used the tool
Case studies 2 – How librarians have successfully used the tool
Outreach Kit Introduction
Certification program introduction
blog.dmptool.org/webinar-series
3. Data Curation Profiles & the
DMPTool
Jake Carlson
Associate Professor of Library Science / Data Services Specialist
Purdue University Libraries
DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool
Sponsored by IMLS
13 August 2013
4. Road Map
• History / Background
of the DCP Toolkit
• Comparing the DMP
and the DCP
• Case Study in using
the DCP
5. “Investigating Data Curation Profiles
across Research Domains”
• Awarded in 2007 to Purdue Libraries and Graduate School
of Library and Information Science at UIUC
• Goals of the project:
– To understand the practices, attitudes and needs of
researchers in managing and sharing their data.
– To Identify possible roles for librarians to facilitate data
sharing and curation.
– To develop a tool for librarians to gather information on
researcher needs for their data.
7. What we asked …
• Research Data Lifecycle (story of the data)
• Characteristics of the Data
• Data Management / Storage
• Data Dissemination and Sharing
• Data Preservation and Repositories
• Roles for Libraries and Librarians
8. The ability to cite this dataset in my publications
The ability for researchers within my discipline to easily find this dataset
The ability for researchers outside of my discipline to easily find this dataset
The ability for people to easily discover this dataset using Google
Prioritize your needs for the following types of services
Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.
n=19
9. Prioritize your needs for the following types of services
The ability for me to submit this dataset to a repository myself
The process of submitting this dataset to a repository is automated
The ability to make these data accessible in multiple formats
The ability of the repository to provide version control for the data
Witt, M. (2009, May 18). Eliciting Faculty Requirements for Research Data Repositories 4th Int’l Conference on Open Repositories. Georgia Tech, Atlanta, GA.
n=19
10. An interview based tool for gathering:
• Information about a particular data set.
• What a researcher is doing to manage /
curate the data set.
• What a researcher would like to do with
the data.
http://datacurationprofiles.org
11. DCP Sections
• Information about the Data and its Context
–Overview of the Research
• Focus
• Intended Audience
• Funding
–Data Kinds and Stages
• Data Narrative (data lifecycle)
• Target Data for Sharing
• Use/re-useValue
• Contextual Narrative
12. Data Stage Output Typical File Size Format Other / Notes
Primary Data
Raw
Sensor
data
100k in 1 file per
day
proprietary to
the sensor
FTP downloads are
mostly automated.
Processing
Stage 1
Sensor
data –
open/acces
sible format Roughly 6kb .csv / .xls
Data are formatted
into .csv before bring
reformatted into a
mySQL database.
Processed
Data
vectors
800 records per
intersection per
day. SQL / .xls
Data are extracted
from the mySQL
database for analysis
purposes.
Analyzed
charts/
Graphs .xls / .emf
charts and graphs
used for
interpretation.
Published
charts/
graphs .ppt
Data are presented
via power point.
Ancillary Data
Image
Stills taken
from video
.gif /.jpg /
.ppt
Images generated
from video.
13. More DCP Sections
Information about Needs
–Intellectual
Property
–Organization
and description
of data
–Ingest
–Access
–Discovery
–Tools
–Interoperability
–Measuring
Impact
–Data
Management
–Preservation
15. Context
• Focused on a specific
context: developing
a data management
plan for submission
to a funding agency.
• Focused on a broad
context:
understanding the
researcher’s data
and needs well
enough to respond.
16. Timing
• For use in the
“Planning Stages” of
the Data Lifecycle
• For use in the “Active
Data Stages” of the
Data Lifecycle
17. “The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting
Group.
18. Structure
• The DMPTool’s
structure is based on
the specific elements
of the agency’s data
management plan.
• The DCPToolkit is
modular in nature.
Questions and
sections can be
changed.
19. Level of Investment
• Generating a DMP
using the DMPTool
is a short term
investment.
• Generating a DCP is a
longer term
investment, but with
a potentially large
payoff.
20. Sharable Output
• Data management
plans are intended to
be submitted to a
funding agency, not
to be shared publicly.
• Data curation
profiles are intended
to be shared with
others.
23. • Both tools seek to help researchers identify
and address needs in managing and curating
data.
• In particular, both tools aim to foster the
creation of data that are
discoverable, accessible, well-described and
usable by others.
24. “The Research Lifecycle” model developed by the University of Virginia Library’s Scientific Data Consulting
Group.
25. • Both tools can be used to help librarians
connect with researchers about their data.
• Both organizations recognize and support the
roles of librarians in providing services to
support the data lifecycle.
26. Case Study:
Water Quality Field Station
with
Marianne Bracke
Agricultural Sciences Information Specialist
Associate Professor of Library Science
Purdue University Libraries
27. The Water Quality Field Station
On a 991 acre farm facility northwest of
Purdue opened in 1992.
Used to identify agricultural practices that
minimize movement of AG chemicals into
water supplies.
Informs the development of new and more
ecologically-balanced technologies for
crop production.
28. Graduate Students
Graduate students are on the front lines of data.
Sharing data locally, between graduate students,
was challenging to do.
29. Project Steps
Utilize Data Curation Profiles to collect
information about current data
gathering, workflow and documentation.
Identify common issues and needs as
observed in the Data Curation Profiles.
Produce a report with recommendations
and possible approaches to addressing
issues and needs
Identify
Assess
Analyze
30. Identify
6 interviews with Graduate
Students conducted in
summer of 2011.
Developed Data Curation
Profiles from these
interviews.
Reviewed DCPs for needs.
31. Analyze
There is a lack of clear and shared expectations
on how data should be documented, described
and organized.
Locally – variation of practice by individual by
circumstance, previous training / experience,
intended use, etc.
Discipline – there is a lack of standards
specifically for Agronomy data.
32. Analyze
Data are not being generated or processed in
ways that could facilitate sharing externally, or
even locally at Purdue or within the lab.
Inheriting data from previous graduate
students was common and potentially
problematic.
Many graduate students who had received
data reported some problems understanding
or making use of the data.
33. Analyze
Graduate Students stated that they lack
knowledge and skills of how they should
document, describe, organize and manage their
data.
These activities tend to be done in relative
isolation from the lab, or even the advisor.
Physical lab notebooks are still the primary
means of documentation / provenance.
35. DMP & DCP Connections
May uncover issues that merit
further investigation through a DCP.
Uncovering data management issues could
inform data management planning.
36. Another Case Study with DCPs
http://www.dlib.org/dlib/july13/wright/07wright.html
37. Thanks! Any Questions?
Jake Carlson
Associate Professor of Library Science / Data Services Specialist
Purdue University Libraries
jakecarlson@purdue.edu
DMPToolWebinar Series 8: Data Curation Profiles & the DMPTool
Sponsored by IMLS
13 August 2013
Change Title, webinar #, and date in green text at top
This led to a collaboration between Purdue Libraries and Graduate School of Library Information Science UIUC – launched in the Fall of 2007.“who will share what (kinds of data) with whom, and when?”“[Beyond] technical solutions for curating… data, there are numerous social and organizational challenges to address with supporting faculty in depositing their data into an institutional repository (IR). This project will [engage with] a cross disciplinary segment of researchers [interested in] sharing, archiving or disseminating various levels of research data.”
Point out the overlap between Purdue & UIUC in some subjects – goal to investigate commonalities between disciplines at different institutions.Also – each institution interviewed several faculty from a particular department to investigate commonalities between researchers. Purdue – 4 in Agronomy; UIUC – 3 in Geology.This was mostly a convenience sample for Purdue – people we already had contacted in the past. In some cases (most cases for UIUC), we had the library liaisons associated with this project help us recruit faculty.
Again, the first interviews were very broad in nature (deliberately). These are the general categories of questions for the first interview.
These are examples of the types of needs that we were trying to get additional information about from the 2nd interviews.Few people, convenience sample, NOT STATISTICALLY ACCURATE….. However, makes us wonder, and want to learn more; thus the Profile.(Note: our n=19, but our graph goes up to 20. Need to fix this?)
More examples of needs from the 2nd interview
There are 2 overarching sections of the Data Curation Profile – The 1st section is “Data”Overview of the Research – designed to provide context for the dataData Kinds and Stages – a description and information about the data set itself - Data lifecycle: The stages that the data pass through during its life (more on this later)- Target Data: at what stage in the data’s lifecycle would the researcher be willing / able to shareValue: what real or potential value would the data set have for others (in the same field, in a different field, to the general public)Contextual Narrative: Additional information that be relevant for the reader of the profile to better understand the data, as well as the disciplinary norms and practices with data sets in the researcher’s field.
The data table – (this is a simplified data table from the Traffic Flow Profile)It’s meant to be an easy to read summary of the data lifecycle, highlighting the key attributes of the data (more on this later)
The 2nd section of the DCP: NeedsThese are the areas that were identified by faculty in the interviews.Information about data sharing is the heart of the profile and is captured throughout, rather than in one specific section. We do not have time today to go through every individual section. We will be spending some time later this afternoon discussing the “Intellectual Property” and the “Organization and Description of data” sections.