EDI Training Module 12: An Introduction to Metadata and Data Repositories

An Introduction to Metadata
and Data Repositories
1
(Phase 3)

2
Background
Data are not inherently self describing. An understanding of what the data are and
how they can be used requires quality metadata (data about data). The level of
metadata quality varies considerably and is a distinguishing feature among data
repositories.

3
Here is the greenish title slide
Objectives
Define metadata and discuss why they are important
Tips for writing quality metadata
Describe the functions of a data repository

4
What are metadata?
Table 1: Average temperature of observation for each species
Courtesy: Viv Hutchison

5
What are metadata?
Table 1: Average temperature of observation for each species
Courtesy: Viv Hutchison
What do temps
represent?
How?
Where?
Units?

What are metadata?
Metadata are data about data
WHO created the data?
WHAT is the content of the data?
WHEN were the data created?
WHERE were they collected?
WHY were the data collected?
6

Value of Metadata
Essential for making data FAIR
● Findable: Keywords, good title, DOI
● Accessible: Tell user how to access the data or provide direct link to it
● Interoperable: Accurate and well-described methods and attributes
● Reusable: Understandable
7

Metadata for EDI (1)
Title and Abstract
Investigators: Synonymous with
"authors" of a paper, where the
investigator is the persons (or in
some case institutions) that have
made an intellectual contribution to
design of the data
collection/creation effort.
License: Tells future data users how
they can re-use the data
8

Keywords:
● Important for data discovery.
● Select from an existing
controlled vocabulary or
thesaurus.
Funding:
● Include award number
Timeframe & Location
Taxonomic species
Methods 9

Describe each data table:
Column Name
Description
● Standard units: EML metadata has
a set of predefined variable units
(EML unit dictionary).
○ Kg/m2 =
kilogramPerMeterSquared
● Custom units: Any unit not defined
in the dictionary can be included as
custom unit.
Unit/Code Explanation/Date format
Empty Value Code
10

Example table description
11
The Data Table

EDI Metadata (4)
12
Scripts/code (software): Data
processing and analysis scripts can
be included in a data package.
Data provenance: A record trail
that accounts for the origin of a
dataset.

Titles, titles, titles
Titles are critical in helping readers find your data
○ While individuals are searching for the most appropriate datasets, they are
most likely going to use the title as the first criterion to determine if a
dataset meets their needs.
A complete title includes: What, Where, and When (and Who, if relevant)
13

Titles, titles, titles
Which title is better?
● Periphyton
● Periphyton Abundance data collected by FCE LTER from Northeast Shark
River Slough, Florida Everglades National Park, from September 2006 to
September 2008
14

Repercussions of bad titles ...
15

Select keywords wisely
Use a thesaurus or a controlled vocabulary for keywords whenever possible
● LTER Controlled Vocabulary
16

Gazetteers: Standardized place names
17

Ecological Metadata Language (EML)
Metadata standard used widely in US ecological community
Implemented in the Extensible Markup Language (XML)
18
<title>Water Quality Data from Shark River
Slough, Everglades National Park</title>
<originator>
<firstName>Evelyn</lastName>
<lastName>Gaiser</lastName>
</originator>
<method>Grab samples of water were
collected monthly </method>
<date>
<begin>2000-06-01</begin>
<end>2017-03-30</begin>
</date>

What does one do with an EML document?
Deposit metadata and data in a data repository!
A data repository is a service operated by research organizations, where research
materials are stored, managed and made accessible
19

Data Repositories ensure
● Long-term security of the data
● Long-term accessibility of the data
● Data integrity
● Data discovery
● Datasets are citable
● Most repositories provide a DOI
20

Where to deposit ecological data?
Domain specific repositories
● Environmental Data Initiative Repository
● Knowledge Network for Biocomplexity
● Arctic Data Center
Generalist repositories
● Dryad
● Figshare
● Zenodo
Institutional repositories
21

Lots of repositories to choose from….
Repositories differ:
● Amount of metadata required
● Support of provenance
● Immutability
● Domains supported
22

EDI Data Repository
23
Data Citation

26
Here is the greenish title slide
Summary
A metadata record captures critical information about the content of a dataset
Metadata allow data to be discovered, accessed, integrated and re-used
Data repositories support Findability, Accessibility, Interoperability, and Reusability
(FAIR) of research data

EDI Training Module 12: An Introduction to Metadata and Data Repositories

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to EDI Training Module 12: An Introduction to Metadata and Data Repositories

Similar to EDI Training Module 12: An Introduction to Metadata and Data Repositories (20)

Recently uploaded

Recently uploaded (20)

EDI Training Module 12: An Introduction to Metadata and Data Repositories

Editor's Notes