Rescue of Long-Tail Data from the Ocean Bottom to the Moon
Rescue of Long-Tail Data
from the Ocean Bottom to the Moon!
Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,!
! John Delano , James B. Gill , Maurice Tivey
Lamont-Doherty Earth Observatory, Columbia University,!
! 1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution!
IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science!
Fall AGU 2013!
Data at Risk!
¤ "Data at Risk" is scientiﬁc data that are !
¤ not in formats that permit full electronic access to the information they
¤ Data at Risk may be !
¤ non-digital (e.g., handwritten or photographic), !
¤ on near-obsolete digital media (such as ﬂoppy disks), !
¤ or insufﬁciently described (lacking metadata). !
¤ Some born-digital data are considered "at risk" if they cannot be
ingested into managed databases because they lack adequate
formatting or metadata.!
Deﬁnition from the ICSU CODATA Data at Risk Task Group (DARTG)!
¤ A “Data Rescue Mission” is any effort to preserve data at risk. Rescue
missions can come in the form of digitization, format migration, treating
damaged materials (e.g., water or mold), adding metadata or any action
taken to make data accessible in the long term.!
Definition from ICSU CODATA Data at Risk Task Group (DARTG)
Long Tail Data are often Data at Risk!
Long Tail Characteristics!
On C drives!
Hard to ﬁnd!
Collected by many
q Citizen science!
IEDA Data Rescue Mini-Awards!
¤ Established to preserve valuable legacy data sets that
are in danger by impending retirement or degradation!
¤ Evaluated by highest impact on future research by quality, size,
rarity, unique location or data type!
¤ Made accessible to the community for re-use by inclusion in the IEDA
data collections (EarthChem, MGDS, SESAR)!
¤ $7000 award to support proper compilation, documentation, transfer!
¤ 3 awardees chosen from 11 entries over a wide range of geochemical
and geophysical data!
1: Geologic samples and geochemistry!
¤ WHAT: Compilation of sample
metadata and geochemical
analyses from three areas – Fiji,
Izu Arc, and Endeavour segment.
(James B. Gill)!
Maps made with GeoMapApp
¤ WHY: study of intra-ocean arcs
and spreading centers!
¤ HOW: Check and add incomplete
data, digitize data, add persistent
identiﬁers. Link between related
¤ Major challenge: Physical sample
The importance of Sample identiﬁcation!
¤ Individual samples can play a large role in scientiﬁc conclusions, so
accurate documentation of sample metadata is critical.!
¤ The key measurement was the one backarc basalt called "PPTUW”...
Subsequent efforts to conﬁrm the observation ran into problems. The
apparently-same sample was variously called PPTU, PPTUW/5,
PPTUW-1, and TVZ19 in four other papers. None of those papers gave
its latitude and longitude… (J. Gill and E. Todd)!
2: Near-bottom magnetics!
¤ WHAT: Compilation of near-bottom
magnetometer data, including raw,
merged, processed, and navigation
metadata (Maurice Tivey)!
¤ WHY: study of magnetic reversals,
effect of tectonics on magnetic ﬁeld!
¤ HOW: gather data from different
formats, add complete metadata
¤ Challenge: over three decades of
technology and ﬁle formats!
Lessons learned: investigator!
¤ Take ownership of your own legacy!
¤ Data curation by others may not be complete or correct!
¤ Data rescue of an entire career does not need to be
¤ Start with small steps!
¤ Disciplinary repositories will help and guide you to what is needed!
¤ Despite the time investment, data rescue is worth it!
¤ Others will now be able to re-use the data!
¤ Notes taken years ago actually explain anomalies!
Lessons learned: repository!
¤ For Long Tail Data, every project is different !
¤ There is not an established workﬂow – just past experience!
¤ Time commitment from staff is nontrivial!
¤ Disciplinary training helps a great deal!
¤ Investigators need help determining the best products!
¤ A small incentive will motivate investigators!
¤ Data Rescue missions help the repository determine
next steps for development of tools and services!
Summary of Long-tail Data Rescue!
¤ Three Data Rescue efforts this past year by IEDA have
made data that were at risk!
¤ digitized from analog data and near-obsolete media!
¤ sufﬁciently described for reuse!
¤ in formats that permit full electronic access!
¤ Citable, with persistent identiﬁers, and ready for reuse!
¤ The projects also helped IEDA identify improvements in
data rescue workﬂow, and future tools and services!
More Data Rescue Activities!
¤ Elsevier-IEDA Data Rescue Process Study!
¤ A data entry tool for lunar geochemistry: MoonDB!
¤ Elsevier-IEDA International Data Rescue Award!
¤ Winner announced at reception tonight, Monday Dec 9th, 2013!
¤ Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm!