Successfully reported this slideshow.

Metadata for Data Rescue and Data at Risk


Published on

A presentation I gave at PV 2011 in Toulouse, France on behalf of CODATA's Data-at-Risk Task Group.

Published in: Education, Technology
  • Be the first to comment

Metadata for Data Rescue and Data at Risk

  1. 1. Metadata for Data Rescue and Data at Risk William L. Anderson, John L. Faundeen, Jane Greenberg, Fraser Taylor PV2011, Toulouse, 17 November 2011 Presented by Nico Carver In collaboration with the DARi SILS Student Learning Circle
  2. 2. Outline <ul><li>Major Questions </li></ul><ul><li>Metadata Scheme Design </li></ul><ul><li>Case Study </li></ul><ul><li>Next Steps </li></ul><ul><li>Acknowledgements </li></ul><ul><li>Questions/Comments </li></ul>
  3. 3. Major Questions informing Research Where is at-risk data? How are scientists using historic data? How do we define at-risk? “ 8 inch floppy” Retrieved from: How do others define at-risk? What must be done to rescue data-at-risk?
  4. 4. Major Question informing Scheme Design What is essential metadata for describing data-at-risk and aiding in data rescue?
  5. 5. Metadata requirements • Be applicable across a range of disciplines and scientific research areas. • Sufficiently support the data rescue mission.
  6. 6. Functions of the Inventory Function Initial Metadata Properties Describe data of scientific value that is at-risk of being lost, unused, or destroyed. 1. Science area 2. Nature of data 3. Date or date-span 4. Location of original 5. Present location Act as a starting point for the data rescue mission. 6. Expected future 7. Risk level
  7. 7. Metadata Frameworks Useful for Data-at-Risk DARTG Chair Elizabeth Griffin’s initial proposed DARTG metadata properties Metadata Property <ul><li>Science area </li></ul>2. Nature of data 3. Date or date-span 4. Location of original 5. Present location 6. Expected future 7. Risk level
  8. 8. Metadata Frameworks Useful for Data-at-Risk U.S.Geological Service: “Create a Rescue Request”, URL:
  9. 9. Metadata Frameworks Useful for Data-at-Risk “ Growing the Vocabuary”
  10. 10. Metadata Frameworks Useful for Data-at-Risk “ The PREMIS Data Dictionary”
  11. 11. Data-at-Risk Inventory (DARI) Metadata Scheme: guiding principles • Simple • Broadly applicable • Extensible
  12. 12. DARI Metadata Scheme (current) DARTG DARI Metadata, Version 1.0 Metadata Element Name Element Description Research Area(s) The domains represented by DARTG experts and the more general category of “Other”. Title The name associated with the collection. Physical form of the data Paper, photograph, specimen, record book, magnetic tape, etc. Content and context of the data History, topic, etc. -- if known Name of current holder Institution, organization or individual. Dates associated with data Time period when data were collected. Size Extent, volume, size. Data condition Stable, deteriorating, etc. Risk level Poor storage conditions, limited storage time, etc. Known access and restrictions Public domain, private collection, etc. Notes Any additional information. Contact information Address or other contact information for the institution, organization or individual.
  13. 13. Case Study: introduction
  14. 14. Case Study: implementation
  15. 15. Case Study: Results <ul><li>7 Dataset Descriptions total. 5 out of 7 were completed unassisted using the metadata template </li></ul><ul><li>13.5 out of 16 metadata elements considered useful on average (85%) </li></ul><ul><li>4 out of 5 scientists said they would use the inventory again </li></ul>
  16. 16. Case Study: conclusions <ul><li>The purpose of the inventory had to be more clearly stated on the website </li></ul><ul><li>Instructions for filling out the web form had to be simple, but clear </li></ul><ul><li>3 metadata properties were determined unnecessary, 4 properties were altered for clarity </li></ul><ul><li>The remaining metadata properties were successful in their ability to cut across scientific disciplines while fully describing data-at-risk </li></ul>
  17. 17. Next Steps <ul><li>Complete focus groups and surveys at UNC- Chapel Hill and elsewhere to determine possible use cases </li></ul><ul><li>Disseminate information and generate interest for the inventory and the Data-at-Risk project </li></ul><ul><li>Finalize the inventory design and start populating it </li></ul>
  18. 18. Submit a description:
  19. 19. Questions/ Comments? <ul><li>Acknowledgements: </li></ul><ul><li>The University of North Carolina Center for Global Initiatives’ support of the Data At Risk Inventory SILS Student Learning Circle </li></ul><ul><li>The Council for Scientific and Technical Data </li></ul><ul><li>And the following people for their leadership, guidance, and assistance: Bill Anderson, School of Information, University of Texas at Austin; Jane Greenberg, School of Information and Library Science; Elizabeth Griffin, Herzberg Institute of Astrophysics; Dav Robertson, National Institute of Environmental Health Sciences, NIH; and Paul Jones & John Reuning, ibiblio, University of North Carolina at Chapel Hill. </li></ul>