Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Metadata


Published on

The presentation gives an overview of what metadata is and why it is important. It also addresses the benefits that metadata can bring and offers advice and tips on how to produce good quality metadata and, to close, how EUDAT uses metadata in the B2FIND service.
November 2016

Published in: Science

Introduction to Metadata

  1. 1. EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Introduction to metadata Version 2 August 2016 This work is licensed under the Creative Commons CC-BY 4.0 licence
  2. 2. What is metadata and why do we need it? How to produce good quality metadata? EUDAT and metadata Overview
  3. 3. WHAT IS METADATA? Image CC-BY ‘Metadata is a love note to the future’ by Cea+ centralasian/8071729256
  4. 4. Commonly defined as ‘data about data’, metadata helps to make data findable and understandable Metadata can be: Descriptive: information about the content and context of the data Structural: information about the structure of the data Administrative: information about the file type, rights management and preservation processes What is metadata?
  5. 5. Comprehensive metadata will: Facilitate data discovery Help users determine the applicability of the data Enable interpretation and reuse Allow any limitations to be understood Clarify ownership and restrictions on reuse Offer permanence as it transcends people and time Provide interoperability Why use metadata?
  6. 6. Metadata and documentation Think about what will be needed in order to find, evaluate, understand, and reuse the data. Have you documented what you did and how? Did you develop code to run analyses? If so, this should be kept and shared too. Is it clear what each bit of your dataset means? Make sure the units are labelled and abbreviations explained. Record all the information needed for you and others to understand the data in the future
  7. 7. Information entropy The Loss of Information about Data (Metadata) Over Time, Michener et al, 1997
  8. 8. Create metadata at the time of data creation Information will be forgotten and there won’t be time or effort left to capture it later. Metadata benefits from quality control at an early stage too. Time matters! Image CC-BY-SA ‘egg timer – hour glass running out’ by OpenDemocracy
  9. 9. GOOD QUALITY METADATA Image CC-BY ‘Quality’ by Elizabeth Hahn
  10. 10. Use of standards Controlled vocabularies for unambiguous keywords Simple, complete and consistent information Appropriate description Explanation of limitations to support reuse Avoid special characters e.g. !@<~ etc... Provide persistent identifiers such as DOIs What makes metadata good?
  11. 11. The good and the bad Metres / seconds 2015-09-10T15:00:01+01:00 Longitudinal wind speed PDF 1.7 2008 US Population statistics Barcelona, Venezuela Furlongs and fortnight 10th Sept. 2015 15:00:01 U PDF Population statistics Barcelona More precise and standardised Ambiguous
  12. 12. Metadata standards Metadata standards provide a structured way to describe the data Information is presented in a reliable and predictable format which allows for computer interpretation Use of standards enables data interoperability
  13. 13. Metadata Standards Directory Catalogue initiated by the Digital Curation Centre (DCC) now maintained as a community initiative via the Research Data Alliance
  14. 14. There are a number of factors to consider: Data type – look for standards to suit your data Community norms – what is accepted and common practice in your field? Organisational policies – is one recommended? Instruments being used – any automated metadata? What resources are available? – there are tools to create metadata in certain standards, more instructional materials and support How to choose a metadata standard?
  15. 15. How to write quality metadata Organise your information and reuse where possible e.g. project abstracts, lab notebooks, citations Write your metadata using a metadata tool Review for accuracy and completeness Have someone else read your record Revise based on comments from your reviewer Review once more before you publish Draft ReviewRevise Review
  16. 16. Tips to follow when creating metadata Do not use jargon Define technical terms and acronyms: – CA, LA, GPS, GIS : what do these mean? Clearly state data limitations – E.g. data set omissions, completeness of data – Express considerations for appropriate re-use Use “none” or “unknown” meaningfully – None usually means that you knew about data and nothing existed (e.g., a “0” cubic feet per second discharge value) – Unknown means that you don’t know whether that data existed or not (e.g., a null value)
  17. 17. Dataset titles Titles are critical in helping readers find your data – While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs. – Treat the title as the opportunity to sell your dataset. A complete title includes: What, Where, When, Who, and Scale An informative title includes: topic, timeliness of the data, specific information about place and geography
  18. 18. Which is the better title? Rivers OR Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983) Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961- 1983) (when)
  19. 19. Write for machines, not just humans Remember: a computer will read your metadata Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / < > ~ Don’t use tabs, indents, or line feeds/carriage returns When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters
  20. 20. Could someone use an automatic search to locate the data? Can others assess the usefulness of the data? Could a novice understand it? Is the metadata specific enough? Is there enough information to re-use the data? Is the information unambiguous – are all codes, abbreviations and variables explained? Remember to review your metadata!
  21. 21. EUDAT AND METADATA Image CC-BY ‘University of Michigan Library Card Catalog’ by David Fulmer
  22. 22. B2FIND is based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other repositories It allows researchers or data users to find relevant data, and supports communities and data providers to increase visibility of their data B2FIND provides a simple and user-friendly discovery service on metadata steadily harvested from a wide range of research communities The B2FIND service
  23. 23. The same term can be used by different disciplines Species for chemists and zoologists Andromeda for astronomers and historians Some domain knowledge is therefore necessary The EUDAT B2FIND service needs to suit a wide range of different communities The interdisciplinary problem
  24. 24. Metadata is harvested from different communities, usually using the OAI-PMH protocol The metadata (in a wide variety of standards) are processed to map and transform them to the B2FIND schema How the B2FIND service works INPUT Metadata in community standards e.g. DDI, Dublin Core, CMDI, ISO 19115 OUTPUT Homogenised metadata in the B2FIND schema
  25. 25. Metadata records in B2FIND
  26. 26. For more info: User documentation: integration
  27. 27. Authors Contributors This work is licensed under the Creative Commons CC-BY 4.0 licence EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Sarah Jones, Digital Curation Centre Shaun de Witt, STFC Sara Garavelli, Trust-IT Thank you Content has also been repurposed from the DataONE Educational modules, ‘Metadata’ and ‘How to Write Good Quality Metadata’ Retrieved from