Agile Curation: 2015 AGU Presentation

Agile Data Curation:
A Conceptual Framework and
Approach for Practitioner
Data Management
Presenting Author: Josh Young1
Co-Authors: Karl Benedict2 and Christopher Lenhardt3
1.UniversityCorporationforAtmosphericResearch(UCAR)UnidataProgramCenter,Boulder,USA
3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA
2.UniversityofNewMexico, AlbuquerqueUSA

Scope
Imagine a project:
• that includes a well-thought out and documented
data management plan,
• and robust implementation of that plan through out
the project and beyond.
• This talk is not for that project; it is for the rest of
us.

So why do we care about data
management?
• Internal reasons: do good research, write
papers, get tenure, win more grants.
• External reasons: public access &
reproducibility
 Risk of becoming dark data (Heidorn, 2008)

Why care about external access?
• Intangibles for an Investigator
• Maybe someday I’ll benefit from someone else’s data
• Maybe I’ll learn something through informal dialogue
• Most science funding is from public resources and should/could be
considered a public trust resource
• Peer pressure
• Tangibles for an Investigator
• Increased efficiency
• My funders require it.

So why do we care about data
management?
• Internal reasons: do good research, write
papers, get tenure, win more grants.
• External reasons: greater impact
Agile
Curation

Agile Curation:
• Means taking implementable steps to
improve data management for external
access.
• Philosophically, it attempts to apply
lessons from agile software development
to data management.

Agile Curation Principles,
2nd Generation
1) Delivery, access, use and citation of research
data are the primary measures of success.
2) Maximize the impact of research data through the
continuous integration of curation activities
3) Support unanticipated needs for and uses of
research data (and documentation) and develop
flexible systems to capture new uses.

2nd Generation
4) Make data open and accessible as early in the process as
possible.
5) Encourage crowd-sourced / community feedback to improve
and enhance the data. Provide basic metadata for data
available early in the process even if the data are not
finalized.
6) Identify key individuals in a research project that have the
requisite motivation, knowledge, or ability to learn and get
out of their way.

2nd Generation continued
7) Data creators and data curators should work closely
throughout the data life story to ensure the most efficient and
streamlined process.
8) Identify the most effective method(s) for maintaining close
communication between the data creators and curators
involved and use them.
9) Target the steady delivery of incremental improvements to
research data discovery, access and use that is consistent
with a sustainable level of effort and available funding.

2nd Generation continued
9) Start with the basics and only make systems more
complex as needed, while maintaining a low bar to
entry.
10) Continuous attention to technical excellence and
good design enhances agility.
11) Continuously develop a community of data providers,
curators and users that participate in the evolution of
the research data systems.

What happens next?
• Case Studies documentation:
 To clarify and/or verify these principles
 To provide workflow examples that can
be adopted or revised for reuse
• Nascent community of interest within the
Research Data Alliance

Scope
Imagine a project:
• that includes a well-thought out data management
plan,
• and robust implementation of that plan through out
the project.
• This talk is not for that project; it is for the rest of
us.

Unidata is one of the University Corporation for
Atmospheric Research (UCAR)'s Community
Programs (UCP), and is funded primarily by
the National Science Foundation
(Grant NSF-1344155).

Questions?
Contact me at: jwyoung@ucar.edu @unidata_josh 303-497-8646

1st Generation
1) Access to data is the first goal;
2) Generative value is supported (Zittrain, 2006)
3) Researcher involvement through a participatory framework that
aligns data management with scientific research processes
(Yarmey and Baker, 2013)
4) Projects will utilize free open-source resources to the greatest
extent practical;
5) Community participation increases project capacity;

1st Generation part 2
6) Data management requirements and practices evolve as the
research project proceeds;
7) Bright and dedicated individuals can learn appropriate skills and
respond to the demands of their particular project, as they
proceed;
8) Approaches apply across scales
9) Consider technical debt
10) Data evaluation can be conducted through use and feedback;

How we got here
• Idea formulated during discussion of Data
Management Lifecycles at GeoData 2014
• Principles drafted for AGU 2014
• Two Research Data Alliance (RDA) Birds of a
Feather sessions to explore community
experiences

Agile Curation: 2015 AGU Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Agile Curation: 2015 AGU Presentation

Similar to Agile Curation: 2015 AGU Presentation (20)

More from Josh Young

More from Josh Young (7)

Recently uploaded

Recently uploaded (20)

Agile Curation: 2015 AGU Presentation

Editor's Notes