The traditional process of achieving metadata standards has failed, and I know what I’m talking about because of Dublin Core, BagIt, Z39.50, URLs, and ARKs.
We must think outside the box or we will keep failing. YAMZ (Yet Another Metadata Zoo) is not a standard. Instead it is a dictionary of terms, some fixed and others still evolving, that are meant to be selectively referenced by future standards. Terms are otherwise decoupled from standards that reference them. Each term is a kind of nano-specification with a unique persistent identifier that tracks the term from evolving to mature to deprecated.
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
YAMZ Metadata Vocabulary Builder
1. YAMZ: better, faster, cheaper
vocabulary standardization
John Kunze
California Digital Library
2. 2
The metadata mess
• Why does metadata interoperability (MI) seem to
fail each time a new initiative addresses it?
• Why does each attempt not even come close to
delivering MI?
• Why are these failures so expensive?
2
3. 3
Traditional metadata standards have failed
• Thinking outside the box to stop failing.
• YAMZ is Yet Another Metadata Zoo; it is not Yet
Another Metadata Standard.
• Instead it is a dictionary of terms, some fixed and
others still evolving.
• Terms meant to be selectively referenced by future
standards, but are otherwise decoupled from them.
• Each term has a unique persistent identifier that tracks
it from evolving to mature stability to deprecated.
3
4. 4
Session structure
• Introduction to YAMZ
• Four disciplinary uses
• Trying it out
• Invitation for feedback, discussion, and
participation.
4
5. 5
Metadata (un)happiness
• Are you happy with your metadata?
• Are you happy with others’ metadata?
• Are you and your meta/cataloguer happy with day-
to-day experience complying with your chosen
standard? Have you asked them what they think?
• Which standard(s) do you use? Are you compliant?
5
6. 6
Metadata in theory vs practice
The façade
• We use standard X, anyone using X can work with us.
Behind the façade (see Roy Tennant’s “Bitter Harvest”)
• We use standard X with local modifications.
• Our mods evolve and depend on the specific
collection, so no one using X can work with us.
• Few people know what our local mods are.
6
7. 7
Option 1: lobby for changes in X
• Use formal commenting mechanism
• Wait 2-5 years for revision to appear, during which
• a small number of busy experts evaluate
• in largely closed discussions
• no testing (since there’s no implementation yet)
• What happens to legacy metadata every time a 30 –
year-old standard does have a new release?
7
8. 8
Option 2: Semantic Web with RDF
• Spend a few years modeling all your present and
future assets in RDF
• Reference, for better or worse, existing terms from
existing vocabularies
• Pro: get unique, unambiguous concept identifiers
• Con: expensive, and no one uses RDF except libraries
8
9. 9
Option 3: think locally, act locally
If your organization doesn't have much time or staff
• This is the common case
• Document your local mods to Standard X
• Effectively, this is a secret metadata standard
9
10. 10
Option 4: think globally, act globally
if your organization does have the time and staff
• Create your own standard or profile
• Create your own committees and work with your
own partner organizations
• Import snapshots of other vocabularies
• A missing terms to your liking
• Publish your terms and definitions
10
11. 11
(New) Option 5: think locally, act globally
At a minimum, use YAMZ to
• get a persistent identifier for your term
• use it so everyone knows what you mean by it
Everything else is gravy
• track comments, upvotes and downvotes
• notice what related terms others are using
Otherwise, we’re stuck with the current blizzard of
cross-walking between hundreds of vocabularies...
11
17. 17
Let’s do something different
• Instead of yet another ontology, how about a
dictionary?
• … a dictionary that tracks terms over time?
• … a dictionary whose terms standards will reference?
17
18. 18
Summarizing key desiderata
• Natural language strings with persistent concept
identifiers
• Avoiding largely closed discussion
• Support for testing and rapid prototyping
• Support for unambiguous term referencing, where
• some terms may change
• other terms may not change
• Ability to add missing terms
• Publishing your own terms
• Dealing with historical terminology
18
19. 19
An alternate metadata universe
• Vision: one dictionary, one namespace
• All research domains, any part of “metadata speech”
• Names, values, units, relationships, ...
19
SimonRobertson@flickr
21. 21
Crowdsourced, but with voting
21
vernacular
canonical
deprecated
3 classes
of term
all terms are born here
these don’t evolve
so terms never go away
Each term gets a unique persistent id. Example:
identifier: http://n2t.net/ark:/99152/h1193
term: oba
definition: other (origin: from Tagalog)
22. 22
Reputation-based voting resists “gaming”
• Meritocracy: strong terms rise, weak terms decline
• Lessons from StackOverflow, Internet standards, and
Wikipedia processes
22
Karunakar Rayker @flickr
24. 24
YAMZ usage patterns
24
Search for
terms
(words and
definitions)
find a term you love
great – use it
find a term you kind of love try it out, comment,
engage with author
no workable term found instantly enter own term
and watch for comments
find a word you love “I want that word!”, so
enter a competing term
but a definition you hate
Here’s a one-slide summary of YAMZ.
The traditional process of achieving metadata standards has failed, and I know what I’m talking about because of DC, BagIt, Z39.50, URLs, and ARKs.
We must think outside the box or we will keep failing.
YAMZ is not Yet Another Metadata Standard, but something different.
Instead it is a dictionary of terms, some fixed and others still evolving, that are meant to be selectively referenced by future standards.
Terms are otherwise decoupled from standards that reference them.
Each term is a kind of nano-specification with a unique persistent identifier that tracks the term from evolving to mature to deprecated.
In this session we will introduce YAMZ, offer a limited demo and invite feedback, discussion, and participation. It will also describe four separate disciplines that have used it to assist in finding the balance between local descriptive needs and the desire for MI.
Taking the temperature of metadata contentment.
Are you happy with your metadata and how it plays with the metadata of others?
Are you happy with your ability to find useful, relevant stuff based on search of other people’s metadata?
Are you and your cataloguer happy with day-to-day experience complying with your chosen standard? Have you asked your metalogers what they think?
Which metadata standard do you use? Do you know if you're in compliance?
// The façade
- We use standard X, so everyone using X works with us. We don’t pretend to interoperate with other standards.
Behind the façade
- We use standard X with local mods tailored to our needs. Our mods change over time and depend on the collection, so no one using X works with us, even between our own collections. We don't interoperate with anyone or anything external.
Few people outside or even inside our organization know what our local mods are.
Don't believe me? Try the simplest, and oldest of all metadata ontologies, Dublin Core, and see Roy Tennant’s “Bitter Harvest” article on the interoperability problems with DC from multiple institutions. It gets worse the more complex the ontology.
Option 1:
try to change a traditional std X that you’re using
- use commenting mechanism, along with many other orgs
- wait 2-5 years for revision to appear, during which time ** a small number of busy experts evaluate and try to resolve all comments based on ** largely closed discussions and ** no testing
** what about managing changes in terminology over time? (eg, half of the LEADS project we heard about this morning)
Option 2:
spend a few years modeling all your present and future assets in RDF, and plan to reference terms from existing vocabularies made unique with namespace designations in front of them
- ** benefit: unique, unambiguous concept identifiers
- problem: expensive, and no one uses RDF except libraries
Option 3: think locally, act locally
if your org doesn't have much time and staff,
** create local doc describing your mdata, eg, Std X with the following mods
- very common: effectively, this is a secret metadata std
Option 4: think globally, act globally
if your org does have the time and staff, try the more expensive route:
- create your own std or profile
- create your own committees
- work with your own partner orgs
Usually this means ** importing a snapshot of other vocabularies (which will now evolve separately), then ** add missing terms to your liking and ** publish a document listing your terms and definitions
** and this takes us to the present situation…
// Whew. As we've seen from all the work being done with historical terms in existing archives, this complicate snapshot hides an even larger legacy problem, which is that traditional metadata standards don't have unambiguous ways of referencing terms, either in the present or over time as terms evolve. **
- Even the experts sitting on the stds committees are frustrated with the traditional approach.
That's a lot of crummy news, and Jane and I know it well, having both served as experts in the development of Dublin Core, PREMIS, and other ontologies.
Finally we had a chance to do something different, and we did.
Instead of an ontology, we built a dictionary.
- One thing remains the same: ontologies can be created as before, but we encourage terms to reference definitions from a dictionary. This decouples definitions from all the rules about mandatory vs optional vs conditional, and allows disciplines to more easily share terms.
// Looking back at key points just made, let's summarize them:
- natural language strings without persistent concept identifiers
- avoid largely closed discussion (exception: public comment periods) ( except: mention Ted Haberman's thing)
support testing and rapid prototyping
Support unambiguous referencing of terms from elsewhere, where
-- some terms may change
-- other terms may not change
- add missing terms
- publish your own terms
- dealing with historical terminology
An alternative vision
One dictionary, one namespace
All research domains, any part of “metadata speech”
Names, values, units, relationships, ...
Instead the typical practice of banishing all words that don’t fit in your ontology, leave them in.
So we created this metadictionary called yamz (yet another metadata zoo).
We needed modifications, from proper paging through search, to logging in with your own ORCID id, to term import and export.
Depending on your risk tolerance, you may cite a balance of stable (canonical) terms and evolving (vernacular) terms, such as those controlled by you or your community. A term is a combination of a label and a definition.
Learn from wikipedia, internet-drafts/RFCs, StackOverflow, and American Heritage Dictionary
Variation: find a word you love, a definition you’re fine with, but you want to add a new, non-competing definition, eg, foo.1, foo.2, foo.3
Very simple (maybe too simple).
Example of a term and definition
[] you can add a comment
[] at different times
[] supporting a conversation
[] and you can tag things if you want to restrict your view to just,
say, your working group’s tags, and not see the rest of the dictionary
Note that you get automatic emails for any terms that you’re “watching”, to notify you when someone comments on your terms. By default, you’re watching any terms you own.