Successfully reported this slideshow.

Rescue of Long-Tail Data from the Ocean Bottom to the Moon


Published on

AGU 2013 presentation on IEDA Data Rescue Mini Awards.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Rescue of Long-Tail Data from the Ocean Bottom to the Moon

  1. 1. Rescue of Long-Tail Data 
 from the Ocean Bottom to the Moon! ! Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,! 1 2 3! ! John Delano , James B. Gill , Maurice Tivey ! Lamont-Doherty Earth Observatory, Columbia University,! ! 1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution! ! ! IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science! Fall AGU 2013! IEDA
  2. 2. Data at Risk! ¤  "Data at Risk" is scientific data that are ! ¤  not in formats that permit full electronic access to the information they contain. ! ¤  Data at Risk may be ! ¤  non-digital (e.g., handwritten or photographic), ! ¤  on near-obsolete digital media (such as floppy disks), ! ¤  or insufficiently described (lacking metadata). ! ¤  Some born-digital data are considered "at risk" if they cannot be ingested into managed databases because they lack adequate formatting or metadata.! ! Definition from the ICSU CODATA Data at Risk Task Group (DARTG)! IEDA
  3. 3. Data Rescue! ¤  A “Data Rescue Mission” is any effort to preserve data at risk. Rescue missions can come in the form of digitization, format migration, treating damaged materials (e.g., water or mold), adding metadata or any action taken to make data accessible in the long term.! M. Tivey Definition from ICSU CODATA Data at Risk Task Group (DARTG) IEDA
  4. 4. Long Tail Data are often Data at Risk! The Head: Long Tail Characteristics! Astronomy, Climate, High Energy Physics, Genomics q  q  q  q  q  q  Long Tail: Environmental and Earth sciences L. Wyborn More specialised! Low volume! On C drives! Hard to find! Heterogeneous! Collected by many people! q  Citizen science! q  Etc! q  Etc! IEDA
  5. 5. IEDA Data Rescue Mini-Awards! ¤ Established to preserve valuable legacy data sets that are in danger by impending retirement or degradation! ¤  Evaluated by highest impact on future research by quality, size, rarity, unique location or data type! ¤  Made accessible to the community for re-use by inclusion in the IEDA data collections (EarthChem, MGDS, SESAR)! ¤  $7000 award to support proper compilation, documentation, transfer! ¤  3 awardees chosen from 11 entries over a wide range of geochemical and geophysical data! ! IEDA
  6. 6. 1: Geologic samples and geochemistry! ¤  WHAT: Compilation of sample metadata and geochemical analyses from three areas – Fiji, Izu Arc, and Endeavour segment. (James B. Gill)! Maps made with GeoMapApp ¤  WHY: study of intra-ocean arcs and spreading centers! ¤  HOW: Check and add incomplete data, digitize data, add persistent identifiers. Link between related resources! ¤  Major challenge: Physical sample management! IEDA
  7. 7. The importance of Sample identification! ¤  Individual samples can play a large role in scientific conclusions, so accurate documentation of sample metadata is critical.! ¤  The key measurement was the one backarc basalt called "PPTUW”... Subsequent efforts to confirm the observation ran into problems. The apparently-same sample was variously called PPTU, PPTUW/5, PPTUW-1, and TVZ19 in four other papers. None of those papers gave its latitude and longitude… (J. Gill and E. Todd)! IEDA
  8. 8. 2: Near-bottom magnetics! ¤  WHAT: Compilation of near-bottom magnetometer data, including raw, merged, processed, and navigation metadata (Maurice Tivey)! ¤  WHY: study of magnetic reversals, effect of tectonics on magnetic field! ¤  HOW: gather data from different formats, add complete metadata and workflow! ¤  Challenge: over three decades of technology and file formats! IEDA
  9. 9. Evolution of equipment: 1985, 1992, 2004, 2011 ! IEDA
  10. 10. Evolution of storage media! M. Tivey IEDA
  11. 11. Addition of “sufficient” metadata! IEDA
  12. 12. 3: Lunar sample geochemistry! ¤  WHAT: Compilation of lunar sample geochemistry (John W. Delano et al.)! ¤  WHY: composition of the Moon! ¤  HOW: Digitize photos, label specific grains, compile geochemistry in data templates! ¤  Challenge: nothing was digital! ! LPI IEDA
  13. 13. Use of IEDA EarthChem templates! IEDA
  14. 14. Common needs addressed! ¤ Accessibility – web access, links between systems! ¤ Documentation – README files, additional descriptions! ¤ Standardization – IEDA EarthChem geochemical templates ! ¤ Persistent links – DOIs and IGSNs! ¤ Citability – DOIs, example citations! ¤ Guidance/Training – calls and emails with disciplinary repository staff! IEDA
  15. 15. IEDA
  16. 16. Lessons learned: investigator! ¤ Take ownership of your own legacy! ¤  Data curation by others may not be complete or correct! ¤ Data rescue of an entire career does not need to be overwhelming ! ¤  Start with small steps! ¤  Disciplinary repositories will help and guide you to what is needed! ¤ Despite the time investment, data rescue is worth it! ¤  Others will now be able to re-use the data! ¤  Notes taken years ago actually explain anomalies! ! IEDA
  17. 17. Lessons learned: repository! ¤ For Long Tail Data, every project is different ! ¤  There is not an established workflow – just past experience! ¤  Time commitment from staff is nontrivial! ¤ Disciplinary training helps a great deal! ¤  Investigators need help determining the best products! ¤ A small incentive will motivate investigators! ¤ Data Rescue missions help the repository determine next steps for development of tools and services! IEDA
  18. 18. Summary of Long-tail Data Rescue! ¤ Three Data Rescue efforts this past year by IEDA have made data that were at risk! ¤  digitized from analog data and near-obsolete media! ¤  sufficiently described for reuse! ¤  in formats that permit full electronic access! ¤  Citable, with persistent identifiers, and ready for reuse! ¤ The projects also helped IEDA identify improvements in data rescue workflow, and future tools and services! IEDA
  19. 19. More Data Rescue Activities! ¤ Elsevier-IEDA Data Rescue Process Study! ¤  A data entry tool for lunar geochemistry: MoonDB! ¤ Elsevier-IEDA International Data Rescue Award! ¤  Winner announced at reception tonight, Monday Dec 9th, 2013! ¤  Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm! IEDA