Introduction to Metadata

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu
Introduction to metadata
Version 2
August 2016
This work is licensed under the Creative
Commons CC-BY 4.0 licence

What is metadata and why do we need it?
How to produce good quality metadata?
EUDAT and metadata
Overview

WHAT IS
METADATA?
Image CC-BY ‘Metadata is a love note to the future’ by
Cea+ www.flickr.com/photos/ centralasian/8071729256

Commonly defined as ‘data about data’, metadata helps to
make data findable and understandable
Metadata can be:
Descriptive: information about the content and context
of the data
Structural: information about the structure of the data
Administrative: information about the file type, rights
management and preservation processes
What is metadata?

Comprehensive metadata will:
Facilitate data discovery
Help users determine the applicability of the data
Enable interpretation and reuse
Allow any limitations to be understood
Clarify ownership and restrictions on reuse
Offer permanence as it transcends people and time
Provide interoperability
Why use metadata?

Metadata and documentation
Think about what will be needed in order to find, evaluate,
understand, and reuse the data.
Have you documented what you did and how?
Did you develop code to run analyses? If so, this should
be kept and shared too.
Is it clear what each bit of your dataset means? Make
sure the units are labelled and abbreviations explained.
Record all the information needed for you and others to
understand the data in the future

Information entropy
The Loss of Information about Data (Metadata) Over Time, Michener et al, 1997

Create metadata at the time of data creation
Information will be forgotten and there won’t be time or
effort left to capture it later.
Metadata benefits from quality control at an early stage
too.
Time matters!
Image CC-BY-SA ‘egg timer – hour glass running out’ by OpenDemocracy
www.flickr.com/photos/opendemocracy/523438942

GOOD QUALITY METADATA
Image CC-BY ‘Quality’ by Elizabeth Hahn www.flickr.com/photos/128185330@N03/17517769750

Use of standards
Controlled vocabularies for unambiguous keywords
Simple, complete and consistent information
Appropriate description
Explanation of limitations to support reuse
Avoid special characters e.g. !@<~ etc...
Provide persistent identifiers such as DOIs
What makes metadata good?

The good and the bad
Metres / seconds
2015-09-10T15:00:01+01:00
Longitudinal wind speed
PDF 1.7
2008 US Population statistics
Barcelona, Venezuela
Furlongs and fortnight
10th Sept. 2015 15:00:01
U
PDF
Population statistics
Barcelona
More precise and
standardised Ambiguous

Metadata standards
Metadata standards provide a structured way to describe
the data
Information is presented in a reliable and predictable
format which allows for computer interpretation
Use of standards enables data interoperability

Metadata Standards Directory
Catalogue initiated by the Digital Curation Centre (DCC)
now maintained as a community initiative via the
Research Data Alliance
www.dcc.ac.uk/resources/metadata-standards

There are a number of factors to consider:
Data type – look for standards to suit your data
Community norms – what is accepted and common
practice in your field?
Organisational policies – is one recommended?
Instruments being used – any automated metadata?
What resources are available? – there are tools to create
metadata in certain standards, more instructional
materials and support
How to choose a metadata standard?

How to write quality metadata
Organise your information and reuse where possible e.g.
project abstracts, lab notebooks, citations
Write your metadata using a metadata tool
Review for accuracy and completeness
Have someone else read your record
Revise based on comments from your reviewer
Review once more before you publish Draft
ReviewRevise
Review

Tips to follow when creating metadata
Do not use jargon
Define technical terms and acronyms:
– CA, LA, GPS, GIS : what do these mean?
Clearly state data limitations
– E.g. data set omissions, completeness of data
– Express considerations for appropriate re-use
Use “none” or “unknown” meaningfully
– None usually means that you knew about data and nothing
existed (e.g., a “0” cubic feet per second discharge value)
– Unknown means that you don’t know whether that data
existed or not (e.g., a null value)

Dataset titles
Titles are critical in helping readers find your data
– While individuals are searching for the most appropriate
data sets, they are most likely going to use the title as the
first criteria to determine if a dataset meets their needs.
– Treat the title as the opportunity to sell your dataset.
A complete title includes: What, Where, When, Who, and
Scale
An informative title includes: topic, timeliness of the data,
specific information about place and geography

Which is the better title?
Rivers
OR
Greater Yellowstone Rivers from 1:126,700 U.S. Forest
Service Visitor Maps (1961-1983)
Greater Yellowstone (where) Rivers (what) from 1:126,700
(scale) U.S. Forest Service (who) Visitor Maps (1961-
1983) (when)

Write for machines, not just humans
Remember: a computer will read your metadata
Do not use symbols that could be misinterpreted:
Examples: ! @ # % { } | / < > ~
Don’t use tabs, indents, or line feeds/carriage returns
When copying and pasting from other sources, use a
text editor (e.g., Notepad) to eliminate hidden characters

Could someone use an automatic search to locate the
data?
Can others assess the usefulness of the data?
Could a novice understand it?
Is the metadata specific enough?
Is there enough information to re-use the data?
Is the information unambiguous – are all codes,
abbreviations and variables explained?
Remember to review your metadata!

EUDAT AND METADATA
Image CC-BY ‘University of Michigan Library Card Catalog’ by David Fulmer
www.flickr.com/photos/annarbor/4350629792

B2FIND is based on a comprehensive joint metadata
catalogue of research data collections stored in EUDAT
data centres and other repositories
It allows researchers or data users to find relevant data,
and supports communities and data providers to increase
visibility of their data
B2FIND provides a simple and user-friendly discovery
service on metadata steadily harvested from a wide
range of research communities
The B2FIND service
b2find.eudat.eu

The same term can be used by different disciplines
Species for chemists and zoologists
Andromeda for astronomers and historians
Some domain knowledge is therefore necessary
The EUDAT B2FIND service needs to suit a wide range of
different communities
The interdisciplinary problem

Metadata is harvested from different communities,
usually using the OAI-PMH protocol
The metadata (in a wide variety of standards) are
processed to map and transform them to the B2FIND
schema
How the B2FIND service works
INPUT
Metadata in community
standards e.g. DDI,
Dublin Core, CMDI, ISO
19115
OUTPUT
Homogenised metadata
in the B2FIND schema

Metadata records in B2FIND
http://b2find.eudat.eu/dataset/3a063891-6952-5bcf-a5ed-46f8a681c1c9

For more info: https://eudat.eu/services/b2find
User documentation: https://www.eudat.eu/services/userdoc/b2find-
integration
b2find.eudat.eu

www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Sarah Jones, Digital Curation Centre
Shaun de Witt, STFC
Sara Garavelli, Trust-IT
Thank you
Content has also been repurposed from the DataONE Educational
modules, ‘Metadata’ and ‘How to Write Good Quality Metadata’ Retrieved
from https://www.dataone.org/education-modules

Introduction to Metadata

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Metadata

More from EUDAT

Recently uploaded

Introduction to Metadata

Editor's Notes