2016 Ocean Sciences Meeting tutorial

Moving Beyond Planning to
Implementation: Open-Source
Tools…
Josh Young
Ocean Sciences Meeting
February 24, 2016

Scope
Imagine a project:
• that includes a well-thought out and
documented data management plan,
• and robust implementation of that
plan through out the project and
beyond.
• This talk is not for that project; it is for
the rest of us.

So why do we care about data
management?
• Internal reasons: do good research,
write papers, get tenure, win more
grants.
• External reasons: public access &
reproducibility
 Risk of becoming dark data (Heidorn,
2008)

Why care about external access?
• Intangibles for an Investigator
• Maybe someday I’ll benefit from someone else’s
data
• Maybe I’ll learn something through informal dialogue
• Most science funding is from public resources and
should/could be considered a public trust resource
• Peer pressure
• Tangibles for an Investigator
• Increased efficiency
• My funders require it.

So why do we care about data
management?
• Internal reasons: do good research,
write papers, get tenure, win more
grants.
• External reasons: greater impact

What is the DMRC & do we really
need another Data Plan Project?
• Probably not
• The DMRC is not a Data Plan tool
• Unidata community requested help
with implementation
• Therefore, the DMRC is primarily a
curated list of tools for implementation

What the DMRC Offers
• Highlights requirements from funding
agencies;
• Points to Best Practices developed by
others in the Data Management
space;
• Sorts available tools by best practice;
• Details available tools.

Requirements
• Highlight data management funding
requirements from NASA, NOAA,
NSF
• These are the agencies that fund our
community so we try to stay up to
date, but remember the agency
posted information is always the
authority

Activity Best Practices
& Possible Tools
Activity column based on DataOne Best Practices

What We Are Exploring
• Dataverse by Harvard
• Designed for sharing, archiving, and
citing data
• Allows you to create a DOI
• Allows you to store and make data
accessible in perpetuity

Known Dataverse Characteristics:
• Largest single file limited to 10GB
• No limit to number of files
• Users create their own Dataverse
• Designate private or public
• Open to data from all science disciplines
• Does not corrupt at least some software
files (e.g. IDV bundles)
• FREE

Possible Dataverse Contributions:
• Description (providing DOIs)
• Sharing (access for perpetuity)
• Preservation (static copy for perpetuity)
• Cost (free) very suitable for projects that
might otherwise become long-tail data

We Welcome Your Resource
Suggestions!
• Please visit:
http://goo.gl/forms/Ngp4Xu9nGr

Example Workflow Implementation
• Radar and Lidar data from the
University of Wyoming King Air
• Millersville University Plains Elevated
Convection at Night (PECAN) data
• North Carolina State University WRF
North Atlantic Model Outputs

Part of a larger effort: Agile Data
Curation
• Means taking implementable steps to
improve data management for
external access.
• Philosophically, it attempts to apply
lessons from agile software
development to data management.

Agile Curation Principles,
2nd Generation
(J.Young, K.Benedict, & C. Lenhardt, AGU 2015 Fall Meeting)
1) Delivery, access, use and citation of
research data are the primary measures of
success.
2) Maximize the impact of research data
through the continuous integration of
curation activities
3) Support unanticipated needs for and uses
of research data (and documentation) and
develop flexible systems to capture new
uses.

2nd Generation
4) Make data open and accessible as early in the
process as possible.
5) Encourage crowd-sourced / community
feedback to improve and enhance the data.
Provide basic metadata for data available early
in the process even if the data are not finalized.
6) Identify key individuals in a research project
that have the requisite motivation, knowledge,
or ability to learn and get out of their way.

2nd Generation continued
7) Data creators and data curators should work
closely throughout the data life story to ensure
the most efficient and streamlined process.
8) Identify the most effective method(s) for
maintaining close communication between the
data creators and curators involved and use
them.
9) Target the steady delivery of incremental
improvements to research data discovery,
access and use that is consistent with a
sustainable level of effort and available funding.

2nd Generation continued
10) Start with the basics and only make systems
more complex as needed, while maintaining a
low bar to entry.
11)Continuous attention to technical excellence
and good design enhances agility.
12)Continuously develop a community of data
providers, curators and users that participate in
the evolution of the research data systems.

We Welcome Your Stories
• Please email: jwyoung@ucar.edu

Balancing infrastructure development & scientific
advancement to create sustainable, multidisciplinary
solutions
M. Chan
• Advance science
• Meet grand challenges
• Leverage shared
cyberinfrastructure
technology
NSF’s EarthCube
Cyber
Infrastructure
Science
RCNs
Building
Blocks
Interactive
Activities
End User
Workshops
EC
Committees
GOALS

Get Involved!
Science
Committee
Technology &
Architecture
Committee
Liaison
Team
LEADERSHIP
COUNCIL
Office
Council of
Data
Facilities
Engagement
Team
• Talk to EarthCube
Participants!
• Attend EarthCube
Workshops!
• Join the mailing list at
earthcube.org
• Apply for funding (EC Travel
Grants, Distinguished
Lecturers)
• Follow on twitter @earthcube

Unidata is one of the University Corporation
for Atmospheric Research (UCAR)'s
Community Programs (UCP), and is
funded primarily by the National Science
Foundation (Grant NSF-1344155).

2016 Ocean Sciences Meeting tutorial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to 2016 Ocean Sciences Meeting tutorial

Similar to 2016 Ocean Sciences Meeting tutorial (20)

Recently uploaded

Recently uploaded (20)

2016 Ocean Sciences Meeting tutorial

Editor's Notes