This presentation was delivered by National Open Research Coordinator Dr Daniel Bangert as part of a Digital Repository of Ireland (DRI) Introductory Training seminar aimed at the University College Cork (UCC) research community on 14 June 2021. The presentation covers an introduction to the basics of metadata.
DRI Introductory Training: Introduction to Metadata
1. Introduction to Metadata
DRI introductory training
14 June 2021
Dr Daniel Bangert, National Open Research Coordinator
@enigmaticocean d.bangert@ria.ie
2. Outline
What is metadata?
Why use standardised and structured metadata?
Metadata, FAIR and Open Research
Metadata in a project context
3. What is metadata?
Data about data
A subset of documentation
Standardised
Structured
Human and machine readable
4. What is metadata?
Technical metadata: hardware, software, file formats, resolution, size
Preservation metadata: provenance, authenticity, preservation actions,
responsibility (e.g. PREMIS)
Structural metadata: physical/logical structure of digital resources (e.g.
METS)
Descriptive metadata: describes the digital resource; catalogue
records/finding aids
5. Human readable
A handwritten or typewritten
listing or finding aid
Can be easily read and understood
Can be accessible in physical or
digital medium
Can be free-text searched
6. Human readable
Study title
Persistent identifier
Authors
Abstract
Keywords
Access
Data collector
Collection dates
Geographical coverage
Analysis/Observation unit type
Sampling procedure
Data processing
...
7. Machine readable
In a format that can be understood
by computers
Structured representation of
information
Described using particular
standards (e.g. XML, RDF)
Allows processing, exchange and
analysis
8. Why use standardised and structured metadata?
Using standardised descriptive metadata means adhering to best practices in
your domain
Standardised metadata allows you to control how records are described
within your organisation
Enforcing standards allows greater searchability of your records
Metadata sharing and interoperability is only possible when a standard is used
Quality metadata enables analysis, manipulation and ‘value added services’
10. Simple Dublin Core Metadata Element Set
1. Title
2. Creator
3. Subject
4. Description
5. Publisher
6. Contributor
7. Date
8. Type
9. Format
10. Identifier
11. Source
12. Language
13. Relation
14. Coverage
15. Rights
12. Controlled vocabularies
A controlled vocabulary is an organised arrangement of words and phrases used
to index content and/or to retrieve content through browsing or searching. It
typically includes preferred and variant terms and has a defined scope or
describes a specific domain (Harpring, 2010).
Accepted, managed and defined by a community
Concepts/terms are related to each other via explicit relationships
Enables consistency in metadata for accurate search and retrieval
Tend to be domain/discipline specific
14. https://id.loc.gov/authorities/names/n50033023.html
Variants
Holliday, Billie, 1915-1959
Fagan, Eleanora, 1915-1959
Holiday, Eleanora, 1915-1959
McKay, Eleanora, 1915-1959
Holiday, Billy, 1915-1959
Lady Day, 1915-1959
Library of Congress
Name Authority File
Title Portrait of Billie Holiday and
Mister, Downbeat, New
York, N.Y., ca. Feb. 1947
Creator Gottlieb, William P.
Subject Holiday, Billie, 1915-1959
16. Title Portrait of Billie Holiday and
Mister, Downbeat, New
York, N.Y., ca. Feb. 1947
Creator Gottlieb, William P.
Subject Holiday, Billie, 1915-1959
Mister
Women jazz musicians
Jazz singers
Boxers (Dogs)
Fifty-second Street (New
York, N.Y.)
Downbeat
http://hdl.loc.gov/loc.music/gottlieb.04241
18. Licences
Signifies what the user is allowed to do with the data
Creative Commons provides standardised licences to allow reuse
Licensing information should be included in the metadata
“As open as possible, as closed as necessary”
19. Metadata, FAIR and Open Research
https://open-science-training-handbook.gitbook.io/book/open-science-basics/open-research-data-and-materials
20. Metadata, FAIR and Open Research
Findable
Data are described with rich metadata
Reusable
Meta(data) are richly described with a plurality of accurate and relevant attributes
(Meta)data are released with a clear and accessible data usage license
(Meta)data meet domain-relevant community standards
https://www.go-fair.org/fair-principles/
21. Metadata in a project context
Consider and collect metadata over the lifecycle of a project - not just at the
point of deposit:
What data will be created?
What standards and methodologies will be used?
What are the plans for data sharing, access and long-term preservation?
22. Metadata in a project context
Multidisciplinary drifting Observatory for the Study of the Arctic Climate (MOSAiC)
Metadata shall make data findable and provide additional contextual information about measurement details, methods,
relevance, lineage, quality, usage and access restrictions of the data. It shall allow coupling users, software, and computing
resources to the data. Hence, metadata must be machine-readable and interpretable as well as human-understandable.
Furthermore, metadata for each data set should follow the FAIR data principles in terms of fitness for purpose and
fitness for re-use. The metadata should be agreed on, listed, and explained within the MSOPs [MOSAiC Standard
Operating Procedures].
Includes recommendations for metadata and vocabularies, listing examples for oceanography,
climatology, and modelling; biology; provenance.
https://mosaic-expedition.org/science/mosaic-data/; http://doi.org/10.5281/zenodo.4537178
23. Summary
Metadata is a subset of documentation that uses standardised terms/concepts
and is presented in a structured manner
Using standardised descriptive metadata means adhering to best practices in
your domain
Data described with rich metadata and released with a clear usage license
aligns with the FAIR data principles
Consider metadata over the research lifecycle