YAMZ Metadata Vocabulary Builder

YAMZ: better, faster, cheaper
vocabulary standardization
John Kunze
California Digital Library

2
The metadata mess
• Why does metadata interoperability (MI) seem to
fail each time a new initiative addresses it?
• Why does each attempt not even come close to
delivering MI?
• Why are these failures so expensive?
2

3
Traditional metadata standards have failed
• Thinking outside the box to stop failing.
• YAMZ is Yet Another Metadata Zoo; it is not Yet
Another Metadata Standard.
• Instead it is a dictionary of terms, some fixed and
others still evolving.
• Terms meant to be selectively referenced by future
standards, but are otherwise decoupled from them.
• Each term has a unique persistent identifier that tracks
it from evolving to mature stability to deprecated.
3

4
Session structure
• Introduction to YAMZ
• Four disciplinary uses
• Trying it out
• Invitation for feedback, discussion, and
participation.
4

5
Metadata (un)happiness
• Are you happy with your metadata?
• Are you happy with others’ metadata?
• Are you and your meta/cataloguer happy with day-
to-day experience complying with your chosen
standard? Have you asked them what they think?
• Which standard(s) do you use? Are you compliant?
5

6
Metadata in theory vs practice
The façade
• We use standard X, anyone using X can work with us.
Behind the façade (see Roy Tennant’s “Bitter Harvest”)
• We use standard X with local modifications.
• Our mods evolve and depend on the specific
collection, so no one using X can work with us.
• Few people know what our local mods are.
6

7
Option 1: lobby for changes in X
• Use formal commenting mechanism
• Wait 2-5 years for revision to appear, during which
• a small number of busy experts evaluate
• in largely closed discussions
• no testing (since there’s no implementation yet)
• What happens to legacy metadata every time a 30 –
year-old standard does have a new release?
7

8
Option 2: Semantic Web with RDF
• Spend a few years modeling all your present and
future assets in RDF
• Reference, for better or worse, existing terms from
existing vocabularies
• Pro: get unique, unambiguous concept identifiers
• Con: expensive, and no one uses RDF except libraries
8

9
Option 3: think locally, act locally
If your organization doesn't have much time or staff
• This is the common case
• Document your local mods to Standard X
• Effectively, this is a secret metadata standard
9

10
Option 4: think globally, act globally
if your organization does have the time and staff
• Create your own standard or profile
• Create your own committees and work with your
own partner organizations
• Import snapshots of other vocabularies
• A missing terms to your liking
• Publish your terms and definitions
10

11
(New) Option 5: think locally, act globally
At a minimum, use YAMZ to
• get a persistent identifier for your term
• use it so everyone knows what you mean by it
Everything else is gravy
• track comments, upvotes and downvotes
• notice what related terms others are using
Otherwise, we’re stuck with the current blizzard of
cross-walking between hundreds of vocabularies...
11

The Metadata Universe
Jenn Riley,
IU

17
Let’s do something different
• Instead of yet another ontology, how about a
dictionary?
• … a dictionary that tracks terms over time?
• … a dictionary whose terms standards will reference?
17

18
Summarizing key desiderata
• Natural language strings with persistent concept
identifiers
• Avoiding largely closed discussion
• Support for testing and rapid prototyping
• Support for unambiguous term referencing, where
• some terms may change
• other terms may not change
• Ability to add missing terms
• Publishing your own terms
• Dealing with historical terminology
18

19
An alternate metadata universe
• Vision: one dictionary, one namespace
• All research domains, any part of “metadata speech”
• Names, values, units, relationships, ...
19
SimonRobertson@flickr

20
YAMZ.net (Yet Another Metadata Zoo)
20

21
Crowdsourced, but with voting
21
vernacular
canonical
deprecated
3 classes
of term
 all terms are born here
 these don’t evolve
 so terms never go away
Each term gets a unique persistent id. Example:
identifier: http://n2t.net/ark:/99152/h1193
term: oba
definition: other (origin: from Tagalog)

22
Reputation-based voting resists “gaming”
• Meritocracy: strong terms rise, weak terms decline
• Lessons from StackOverflow, Internet standards, and
Wikipedia processes
22
Karunakar Rayker @flickr

24
YAMZ usage patterns
24
Search for
terms
(words and
definitions)
find a term you love
great – use it
find a term you kind of love try it out, comment,
engage with author
no workable term found instantly enter own term
and watch for comments
find a word you love “I want that word!”, so
enter a competing term
but a definition you hate

28
Discipline-specific subsets in YAMZ
• Global Cryosphere Watch (GCW)
• Citizen Science (Sloan)
• DesignSafe (UTA)
• Persistence statements (CDL, UCLA, TACC)
28

29
People
• Vision: Jane Greenberg, John Kunze (NSF DataONE)
• Nassib Nassar, Angela Murillo, Greg Janee, et al.
• First implementation: Chris Patton (summer intern)
• Other interns: Dillon Arevalo, Manoj Tuguru
• LEADS fellow 2018: Mark Phillips
• LEADS fellow 2019: Bridget Disney, Hanlin Zhang
• LEADS fellow 2020: Chris Rauch
29

30
Trying it out
• Try out YAMZ at yamz.net ( caveat: it’s read-only
since login is not working!  )
• browse
• search
30

31
Getting involved
• Newest code at github.com/metadata-
research/yamz
31

YAMZ Metadata Vocabulary Builder

Recommended

Recommended

More Related Content

Similar to YAMZ Metadata Vocabulary Builder

Similar to YAMZ Metadata Vocabulary Builder (20)

More from John Kunze

More from John Kunze (20)

Recently uploaded

Recently uploaded (20)

YAMZ Metadata Vocabulary Builder

Editor's Notes