“Our data are going to be valuable for science for the next 50 years, so please make sure you preserve them and keep them accessible for active research for at least that period.” These were approximately the words used by the principal investigator of the Kepler Asteroseismic Science Consortium (KASC), when the task was presented to us.
The data in question consists of data products produced by KASC researchers and working groups as part of their research, as well as underlying data imported from the NASA archives.
The overall requirements for 50 years of preservation while, at the same time, enabling reuse of the data for active research presented a number of specific challenges, closely intertwining data handling and data infrastructure with scientific issues. This paper reports our work to deliver the best possible solution, performed in close cooperation between the research team and library personnel.
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
Reuse for research, presentation, idcc17
1. Reuse for Research
Curating Astrophysical Datasets
for Future Researchers
Practice Paper, IDCC17
Anders Conrad, Royal Danish Library
Michael Svendsen, Royal Danish Library
Rasmus Handberg, Aarhus University
2. The NASA Kepler/K2 Mission
Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
4. From Space to Aarhus…
Spacecraft
Deep Space Network
NASA MAST archive
KASOC archive, Aarhus
KASC scientists/ working
groups KASOC website
(kasoc.phys.au.dk)
5. The challenge - Where Next?
• Data will remain valuable for active research
for at least 50 years!
• Who will take care when the current research
organisation (Kepler Asteroseismic Science
Consortium, KASC) does no longer exist?
• How can data be kept accessible for continued
active research?
6. KASC requirements for a Living Archive
• Available for 50 years
• Always freely available on-line
• Continue to be used for active research
• Extendable: New information can be added
• Formats must be readable by both humans and
computers
• Understandable and useful for future
researchers – no matter the science case
7. Future workshops - Reuse for Research
• For which research questions might future
researchers find this data useful?
• How would they most likely want to see data
packaged?
• What documentation is needed to understand
data outside the current context?
• What search criteria would most likely be used
to discover data?
8. The 50 Years Issue
• Institutionally:
Who can offer more than 5-10 years of storage
and preservation?
• Financially:
Who will pay?
• Technically:
How will data remain readable and
understandable?
• Scientifically:
How will data remain useful and trustworthy?
9. From ”Who” and ”How” to…
• How to best
• Structure datasets in a way that is most useful for
research
• Use formats that are suitable for long-term
preservation
• Secure sufficient contextual and specific
documentation for scientific reuse
• Facilitate cross-institutional collaboration, to
provide a sustainable service
• Secure access and discoverability according to
scientific needs
• Secure possibility for continued deposit
10. Dataset Structure
• One self-containing dataset for each star
• 5 different types of data products
• Dataset-specific documentation
• TOC file (machine and human readable)
• References to publications (bibcodes)
• One generic documentation package
• E.g. NASA and KASC release notes
11. One BagIt Archive for Each Star
Kepler_10.zip
│ bag-info.txt
│ bagit.txt
│ fetch.txt
│ manifest-sha1.txt
└───data
│ bundle.xml
│ readme.txt
├───datafiles
│ └───...
├───additional_files
│ └───...
├───documentation
│ └───...
└───stellar_models
└───...
15. Conclusions – as of February 2017
• Data packages designed in a way that can
outlive repository software
• Caveat: may imply limitations in the use of
repository features
• Preservation actions will potentially be
possible, even if we don’t plan them
• We still work on establishing funding and a
sustainable business model
• We need to establish a production
environment for repository