SlideShare a Scribd company logo
Reuse for Research
Curating Astrophysical Datasets
for Future Researchers
Practice Paper, IDCC17
Anders Conrad, Royal Danish Library
Michael Svendsen, Royal Danish Library
Rasmus Handberg, Aarhus University
The NASA Kepler/K2 Mission
Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
The Kepler Photometer
From Space to Aarhus…
Spacecraft
Deep Space Network
NASA MAST archive
KASOC archive, Aarhus
KASC scientists/ working
groups KASOC website
(kasoc.phys.au.dk)
The challenge - Where Next?
• Data will remain valuable for active research
for at least 50 years!
• Who will take care when the current research
organisation (Kepler Asteroseismic Science
Consortium, KASC) does no longer exist?
• How can data be kept accessible for continued
active research?
KASC requirements for a Living Archive
• Available for 50 years
• Always freely available on-line
• Continue to be used for active research
• Extendable: New information can be added
• Formats must be readable by both humans and
computers
• Understandable and useful for future
researchers – no matter the science case
Future workshops - Reuse for Research
• For which research questions might future
researchers find this data useful?
• How would they most likely want to see data
packaged?
• What documentation is needed to understand
data outside the current context?
• What search criteria would most likely be used
to discover data?
The 50 Years Issue
• Institutionally:
Who can offer more than 5-10 years of storage
and preservation?
• Financially:
Who will pay?
• Technically:
How will data remain readable and
understandable?
• Scientifically:
How will data remain useful and trustworthy?
From ”Who” and ”How” to…
• How to best
• Structure datasets in a way that is most useful for
research
• Use formats that are suitable for long-term
preservation
• Secure sufficient contextual and specific
documentation for scientific reuse
• Facilitate cross-institutional collaboration, to
provide a sustainable service
• Secure access and discoverability according to
scientific needs
• Secure possibility for continued deposit
Dataset Structure
• One self-containing dataset for each star
• 5 different types of data products
• Dataset-specific documentation
• TOC file (machine and human readable)
• References to publications (bibcodes)
• One generic documentation package
• E.g. NASA and KASC release notes
One BagIt Archive for Each Star
Kepler_10.zip
│ bag-info.txt
│ bagit.txt
│ fetch.txt
│ manifest-sha1.txt
└───data
│ bundle.xml
│ readme.txt
├───datafiles
│ └───...
├───additional_files
│ └───...
├───documentation
│ └───...
└───stellar_models
└───...
Documentation for Each Dataset
<star kic="12345678">
<numax value="3100" error="20" unit="uHz" />
<mass value="1.0" error="0.01" unit="solar" />
<radius value="1.0" error="0.01" unit="solar" />
<datafiles>
<datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” />
<datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”>
<dependency datafile=”1” />
</datafile>
…
</datafiles>
<model path=”stellar_models/kic12345678/” />
</star>
● The bundle.xml file
Proof-of-concept - Repository Setup
• Using Dataverse repository software
• Support for astrophysics metadata
• Discoverability and citability (Datacite DOI’s)
• API’s for automatic ingest workflow
• Versioning – allowing redeposit of extended
versions of datasets
• Issues:
• Missing numeric fields for celestial coordinates (for
discovery)
• Limited options for mapping to external storage (we
use erda.dk)
Institutional Collaboration
Conclusions – as of February 2017
• Data packages designed in a way that can
outlive repository software
• Caveat: may imply limitations in the use of
repository features
• Preservation actions will potentially be
possible, even if we don’t plan them
• We still work on establishing funding and a
sustainable business model
• We need to establish a production
environment for repository
Reuse for Research
Contact: Michael Svendsen, @tullemich, Royal Danish Library

More Related Content

Similar to Reuse for research, presentation, idcc17

EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Kerstin Lehnert
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Advanced-Concepts-Team
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication System
Planetek Italia Srl
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) Overview
Planet OS
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
Ola Spjuth
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
Panasas ® University of Oxford
Panasas ®  University of OxfordPanasas ®  University of Oxford
Panasas ® University of Oxford
Panasas
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
Amazon Web Services
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
DuraSpace
 
DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)
Academy of Science of South Africa
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Microsoft Azure for Research
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
Daniel S. Katz
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
Laura Clarke
 
African Open Science Platform
African Open Science PlatformAfrican Open Science Platform
African Open Science Platform
Academy of Science of South Africa (ASSAf)
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
ARDC
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
petermurrayrust
 
AgriOcean DSpace
AgriOcean DSpaceAgriOcean DSpace

Similar to Reuse for research, presentation, idcc17 (20)

EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication System
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) Overview
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Panasas ® University of Oxford
Panasas ®  University of OxfordPanasas ®  University of Oxford
Panasas ® University of Oxford
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
African Open Science Platform
African Open Science PlatformAfrican Open Science Platform
African Open Science Platform
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
AgriOcean DSpace
AgriOcean DSpaceAgriOcean DSpace
AgriOcean DSpace
 

Recently uploaded

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 

Recently uploaded (20)

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 

Reuse for research, presentation, idcc17

  • 1. Reuse for Research Curating Astrophysical Datasets for Future Researchers Practice Paper, IDCC17 Anders Conrad, Royal Danish Library Michael Svendsen, Royal Danish Library Rasmus Handberg, Aarhus University
  • 2. The NASA Kepler/K2 Mission Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
  • 4. From Space to Aarhus… Spacecraft Deep Space Network NASA MAST archive KASOC archive, Aarhus KASC scientists/ working groups KASOC website (kasoc.phys.au.dk)
  • 5. The challenge - Where Next? • Data will remain valuable for active research for at least 50 years! • Who will take care when the current research organisation (Kepler Asteroseismic Science Consortium, KASC) does no longer exist? • How can data be kept accessible for continued active research?
  • 6. KASC requirements for a Living Archive • Available for 50 years • Always freely available on-line • Continue to be used for active research • Extendable: New information can be added • Formats must be readable by both humans and computers • Understandable and useful for future researchers – no matter the science case
  • 7. Future workshops - Reuse for Research • For which research questions might future researchers find this data useful? • How would they most likely want to see data packaged? • What documentation is needed to understand data outside the current context? • What search criteria would most likely be used to discover data?
  • 8. The 50 Years Issue • Institutionally: Who can offer more than 5-10 years of storage and preservation? • Financially: Who will pay? • Technically: How will data remain readable and understandable? • Scientifically: How will data remain useful and trustworthy?
  • 9. From ”Who” and ”How” to… • How to best • Structure datasets in a way that is most useful for research • Use formats that are suitable for long-term preservation • Secure sufficient contextual and specific documentation for scientific reuse • Facilitate cross-institutional collaboration, to provide a sustainable service • Secure access and discoverability according to scientific needs • Secure possibility for continued deposit
  • 10. Dataset Structure • One self-containing dataset for each star • 5 different types of data products • Dataset-specific documentation • TOC file (machine and human readable) • References to publications (bibcodes) • One generic documentation package • E.g. NASA and KASC release notes
  • 11. One BagIt Archive for Each Star Kepler_10.zip │ bag-info.txt │ bagit.txt │ fetch.txt │ manifest-sha1.txt └───data │ bundle.xml │ readme.txt ├───datafiles │ └───... ├───additional_files │ └───... ├───documentation │ └───... └───stellar_models └───...
  • 12. Documentation for Each Dataset <star kic="12345678"> <numax value="3100" error="20" unit="uHz" /> <mass value="1.0" error="0.01" unit="solar" /> <radius value="1.0" error="0.01" unit="solar" /> <datafiles> <datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” /> <datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”> <dependency datafile=”1” /> </datafile> … </datafiles> <model path=”stellar_models/kic12345678/” /> </star> ● The bundle.xml file
  • 13. Proof-of-concept - Repository Setup • Using Dataverse repository software • Support for astrophysics metadata • Discoverability and citability (Datacite DOI’s) • API’s for automatic ingest workflow • Versioning – allowing redeposit of extended versions of datasets • Issues: • Missing numeric fields for celestial coordinates (for discovery) • Limited options for mapping to external storage (we use erda.dk)
  • 15. Conclusions – as of February 2017 • Data packages designed in a way that can outlive repository software • Caveat: may imply limitations in the use of repository features • Preservation actions will potentially be possible, even if we don’t plan them • We still work on establishing funding and a sustainable business model • We need to establish a production environment for repository
  • 16. Reuse for Research Contact: Michael Svendsen, @tullemich, Royal Danish Library