SlideShare a Scribd company logo
1 of 16
Reuse for Research
Curating Astrophysical Datasets
for Future Researchers
Practice Paper, IDCC17
Anders Conrad, Royal Danish Library
Michael Svendsen, Royal Danish Library
Rasmus Handberg, Aarhus University
The NASA Kepler/K2 Mission
Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
The Kepler Photometer
From Space to Aarhus…
Spacecraft
Deep Space Network
NASA MAST archive
KASOC archive, Aarhus
KASC scientists/ working
groups KASOC website
(kasoc.phys.au.dk)
The challenge - Where Next?
• Data will remain valuable for active research
for at least 50 years!
• Who will take care when the current research
organisation (Kepler Asteroseismic Science
Consortium, KASC) does no longer exist?
• How can data be kept accessible for continued
active research?
KASC requirements for a Living Archive
• Available for 50 years
• Always freely available on-line
• Continue to be used for active research
• Extendable: New information can be added
• Formats must be readable by both humans and
computers
• Understandable and useful for future
researchers – no matter the science case
Future workshops - Reuse for Research
• For which research questions might future
researchers find this data useful?
• How would they most likely want to see data
packaged?
• What documentation is needed to understand
data outside the current context?
• What search criteria would most likely be used
to discover data?
The 50 Years Issue
• Institutionally:
Who can offer more than 5-10 years of storage
and preservation?
• Financially:
Who will pay?
• Technically:
How will data remain readable and
understandable?
• Scientifically:
How will data remain useful and trustworthy?
From ”Who” and ”How” to…
• How to best
• Structure datasets in a way that is most useful for
research
• Use formats that are suitable for long-term
preservation
• Secure sufficient contextual and specific
documentation for scientific reuse
• Facilitate cross-institutional collaboration, to
provide a sustainable service
• Secure access and discoverability according to
scientific needs
• Secure possibility for continued deposit
Dataset Structure
• One self-containing dataset for each star
• 5 different types of data products
• Dataset-specific documentation
• TOC file (machine and human readable)
• References to publications (bibcodes)
• One generic documentation package
• E.g. NASA and KASC release notes
One BagIt Archive for Each Star
Kepler_10.zip
│ bag-info.txt
│ bagit.txt
│ fetch.txt
│ manifest-sha1.txt
└───data
│ bundle.xml
│ readme.txt
├───datafiles
│ └───...
├───additional_files
│ └───...
├───documentation
│ └───...
└───stellar_models
└───...
Documentation for Each Dataset
<star kic="12345678">
<numax value="3100" error="20" unit="uHz" />
<mass value="1.0" error="0.01" unit="solar" />
<radius value="1.0" error="0.01" unit="solar" />
<datafiles>
<datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” />
<datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”>
<dependency datafile=”1” />
</datafile>
…
</datafiles>
<model path=”stellar_models/kic12345678/” />
</star>
● The bundle.xml file
Proof-of-concept - Repository Setup
• Using Dataverse repository software
• Support for astrophysics metadata
• Discoverability and citability (Datacite DOI’s)
• API’s for automatic ingest workflow
• Versioning – allowing redeposit of extended
versions of datasets
• Issues:
• Missing numeric fields for celestial coordinates (for
discovery)
• Limited options for mapping to external storage (we
use erda.dk)
Institutional Collaboration
Conclusions – as of February 2017
• Data packages designed in a way that can
outlive repository software
• Caveat: may imply limitations in the use of
repository features
• Preservation actions will potentially be
possible, even if we don’t plan them
• We still work on establishing funding and a
sustainable business model
• We need to establish a production
environment for repository
Reuse for Research
Contact: Michael Svendsen, @tullemich, Royal Danish Library

More Related Content

Similar to Reuse for research, presentation, idcc17

EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Kerstin Lehnert
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsAdvanced-Concepts-Team
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemPlanetek Italia Srl
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewPlanet OS
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
 
Panasas ® University of Oxford
Panasas ®  University of OxfordPanasas ®  University of Oxford
Panasas ® University of OxfordPanasas
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis GannonMicrosoft Azure for Research
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformLaura Clarke
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service ARDC
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)petermurrayrust
 

Similar to Reuse for research, presentation, idcc17 (20)

EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication System
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) Overview
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Panasas ® University of Oxford
Panasas ®  University of OxfordPanasas ®  University of Oxford
Panasas ® University of Oxford
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)DATAD-R African Open Science Platform (AOSP)
DATAD-R African Open Science Platform (AOSP)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
African Open Science Platform
African Open Science PlatformAfrican Open Science Platform
African Open Science Platform
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
AgriOcean DSpace
AgriOcean DSpaceAgriOcean DSpace
AgriOcean DSpace
 

Recently uploaded

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 

Recently uploaded (20)

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 

Reuse for research, presentation, idcc17

  • 1. Reuse for Research Curating Astrophysical Datasets for Future Researchers Practice Paper, IDCC17 Anders Conrad, Royal Danish Library Michael Svendsen, Royal Danish Library Rasmus Handberg, Aarhus University
  • 2. The NASA Kepler/K2 Mission Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
  • 4. From Space to Aarhus… Spacecraft Deep Space Network NASA MAST archive KASOC archive, Aarhus KASC scientists/ working groups KASOC website (kasoc.phys.au.dk)
  • 5. The challenge - Where Next? • Data will remain valuable for active research for at least 50 years! • Who will take care when the current research organisation (Kepler Asteroseismic Science Consortium, KASC) does no longer exist? • How can data be kept accessible for continued active research?
  • 6. KASC requirements for a Living Archive • Available for 50 years • Always freely available on-line • Continue to be used for active research • Extendable: New information can be added • Formats must be readable by both humans and computers • Understandable and useful for future researchers – no matter the science case
  • 7. Future workshops - Reuse for Research • For which research questions might future researchers find this data useful? • How would they most likely want to see data packaged? • What documentation is needed to understand data outside the current context? • What search criteria would most likely be used to discover data?
  • 8. The 50 Years Issue • Institutionally: Who can offer more than 5-10 years of storage and preservation? • Financially: Who will pay? • Technically: How will data remain readable and understandable? • Scientifically: How will data remain useful and trustworthy?
  • 9. From ”Who” and ”How” to… • How to best • Structure datasets in a way that is most useful for research • Use formats that are suitable for long-term preservation • Secure sufficient contextual and specific documentation for scientific reuse • Facilitate cross-institutional collaboration, to provide a sustainable service • Secure access and discoverability according to scientific needs • Secure possibility for continued deposit
  • 10. Dataset Structure • One self-containing dataset for each star • 5 different types of data products • Dataset-specific documentation • TOC file (machine and human readable) • References to publications (bibcodes) • One generic documentation package • E.g. NASA and KASC release notes
  • 11. One BagIt Archive for Each Star Kepler_10.zip │ bag-info.txt │ bagit.txt │ fetch.txt │ manifest-sha1.txt └───data │ bundle.xml │ readme.txt ├───datafiles │ └───... ├───additional_files │ └───... ├───documentation │ └───... └───stellar_models └───...
  • 12. Documentation for Each Dataset <star kic="12345678"> <numax value="3100" error="20" unit="uHz" /> <mass value="1.0" error="0.01" unit="solar" /> <radius value="1.0" error="0.01" unit="solar" /> <datafiles> <datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” /> <datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”> <dependency datafile=”1” /> </datafile> … </datafiles> <model path=”stellar_models/kic12345678/” /> </star> ● The bundle.xml file
  • 13. Proof-of-concept - Repository Setup • Using Dataverse repository software • Support for astrophysics metadata • Discoverability and citability (Datacite DOI’s) • API’s for automatic ingest workflow • Versioning – allowing redeposit of extended versions of datasets • Issues: • Missing numeric fields for celestial coordinates (for discovery) • Limited options for mapping to external storage (we use erda.dk)
  • 15. Conclusions – as of February 2017 • Data packages designed in a way that can outlive repository software • Caveat: may imply limitations in the use of repository features • Preservation actions will potentially be possible, even if we don’t plan them • We still work on establishing funding and a sustainable business model • We need to establish a production environment for repository
  • 16. Reuse for Research Contact: Michael Svendsen, @tullemich, Royal Danish Library