DPLA• What is it?• Where are its materials coming from?• Where is its metadata coming from?• What does that tell us about the metadata?• How do you think they’ll collect the metadata?• What will they need to do with the metadata, oncecollected?• What problems will they run into, do you think?
Some eternal verities• What’s in our catalogs isn’t all themetadata (broad sense) we have.• BLASPHEMY: a lot of that catalog metadata probablyisn’t even the most important metadata academiclibraries have! Why might that be?• Possibly not the most proliﬁc source of metadataeither. This will be truer as time passes. Why?• What about public libraries? Archives?• The rest of our metadata exists in manyforms and formats.• The major, often only, form of interactionwith our metadata is computer-mediated.• Other people have metadata too!
Practical implications• We need to design standards and practices aroundwhat computers do well, and what they need inorder to do what they do.• We need to design for being PART of the datauniverse, not all of it.• “open world assumption:” no one body has all the data! or all theanswers!• And nobody can impose their view of the world on everybodyelse. (Fortunately, nobody necessarily has to.)• Designing for consistency, ﬂexibility andextensibility without sacriﬁcing comprehensibility• (this is a tall order; we’re not there yet. is anyone?)
Things computers like• Unique identiﬁers• for anything you plan to discuss or refer to• that NEVER CHANGE OR DISAPPEAR. (Sorry, name-authority strings.)• How do we do this given the open-world assumption?• Consistent, predictable, human-language-independentdata• Free text (including punctuation) makes computers sad. They aren’thuman. They don’t understand it. They can be cued to PRODUCE it, butonly based on rules they’re given about the underlying data.• Computers produce typography and layout, but don’t understandthose, either.• Controlled vocabularies• (If they’re well-provisioned with identiﬁers; see above.)
We haveand we both love and hate them.Photo: Doc Searls, “silos,” http://www.ﬂickr.com/photos/docsearls/5500714140/ CC-BY
Possibility 1:One standard to rule them all• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
Possibility 2:Metasearch• Issues with this?• Technical issues• Quality issues• Sociological issues• Who’s trying this? On what level?Diagram: Angela Pratesi and Kalsang (by permission)
Possibility 2:Metasearch• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
Possibility 3:Big metadata bucket• Issues with this?• Technical issues• Quality issues• Sociological issues• Who’s trying this? On what level?Diagram: Angela Pratesi and Kalsang (by permission)
Possibility 3:Big metadata bucket• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
How do you make a bigmetadata bucket?• Given...• Diﬀerent ﬁle formats (XML, relational-database,Excel, plain-text, etc)• Diﬀerent structures with diﬀerent granularity• Diﬀerent standards... or no standard at all• Diﬀerent controlled vocabularies... or none• One option: the Google route• But what do we lose there?
Crosswalking: the nxn problem• As you build your bucket, you ﬁnd thatpeople are using n metadata standards.• You decide you want to be able to translateany of them into any of the others.• Guess what? You need to write nxn-n(nearly n2) crosswalks.• This gets impossibly unwieldy very quickly. How manymetadata standards do you know about, just fromthis class?• And how compatible will the standards be, anyway?
Okay, okay, masterstandard, then!• Crosswalk everything you take in to onestandard. Then you only need to write ncrosswalks!• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues
Five stars of linked data(the ﬁrst three, at least)Sir Tim Berners-Lee:
Review: URLs as identiﬁers• Where have we seen this already?• Why URLs?• What library-type stuﬀ has already beenidentiﬁed with URLs?• What would need to be, do you think?
So, seriously...• Every term in every controlled vocabulary, everyelement in every metadata standard, every“document” we might ever talk about (in all itsFRBRish permutations) needs its own URL?• SERIOUSLY?• ... basically, yep.• Not every time. (Dates are dates. Human names are strings.)• It gets worse, though: XML-based languages use elementnesting to carry meaning, and relational databases use tablemembership and data typing. How do you translate THOSEto URLs?
The fundamental strategy• Break down everything we can say aboutthe world into the smallest units ofmeaning we can manage.• That’s smaller than you’d think, as we’ll see!• Build up search indexes, user displays,and machine interactions from there.• I’m being vague about “machine interactions.” Don’ttake that to mean they aren’t important! They’rejust a bit more than I can explain here and now.• Try not to reinvent wheels.• But if you must, make sure to link new and old.
Okay, so we have abunch of URIs.What do we actually DO with them?We plug them into RDF.
... vocabulary note• “Semantic Web:” Tim Berners-Lee disappearinginto his own navel.• Term is a bit out-of-favor these days.• “Linked data:” a real-world eﬀort to make largedatastores more interoperable• RDF: invented by the SemWebbers, now acornerstone for linked data• Does this mean that all data will be stored as RDF? NO, ITDOES NOT (and you have my permission to slap anybodywho says it will).• Totally possible to provide an RDF view onto non-RDF data,IF AND ONLY IF the data structures and meanings arethought through in an RDFfy way.
What to do with URIs• RDF’s answer: “We say things about stuﬀ.”• At base, RDF really is that simple!• Base unit of RDF: “triple”• Subject, property, value/object. Much like subject-verb-object in English sentence.• Example: “Dorothea Salo is the author of ‘Innkeeper at theRoach Motel.’”Dorothea Salo“Innkeeper at theRoach Motel”isAuthorOf... wait. Where’d all the URLs go?
http://digital.library.wisc.edu/1793/22088http://viaf.org/viaf/21599115/URL-izing a triple“Innkeeper at theRoach Motel”isAuthorOfDorothea Salovocabularies! with URIs!
dcterms:creator http://digital.library.wisc.edu/1793/22088http://viaf.org/viaf/21599115/URL-izing a tripleisAuthorOf
Building up from triplesDiagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
... which can get tangledDiagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
But... but...• What if the same thing has two URIs?• Foreseen problem! There are ways for linked data to expressURI equivalences... though there are huge arguments aboutwhen two URIs are really-truly equivalent.• My sense is that this decision is contextual. (AKA: “willAmazon.com use FRBR?”) What’s equivalent for yourpurposes may not be for mine. And that’s okay!• Where do we get URIs from?• This will be part of the new cataloging infrastructure a-borning, but the answer works out to “a lot of the sameplaces we already get authority information and catalogrecords from,” e.g. VIAF.• But we’re no longer LIMITED to just those! Key point. Thinkabout ORCID!
But... but...• Where’s the record? And standards forthe record?• The record is what we make it! What’s useful to us,we use. What isn’t, we ignore. That’s how the openworld assumption works.• If we need to impose rules on the data we’ll beputting out there (and we probably do!), there areways to do that.• We just can’t expect to impose those ways onanybody else. (Though we can put our rules outthere for others to follow, and we probably should!)
Trust: an unsolved problem• Review: what happened with <meta> tagson the web?• Right. What’s to stop the same thinghappening in a linked-data environment?• What’s to stop me from writing a triple that saysI’m Tchaikovsky?• For our purposes? We’ll pick and choose thevocabularies and domains we trust, I expect, justas we already do.
Fine. Whatever.So is anybody actuallyDOINGthis linked-data stuﬀ?
Thanks!• Copyright 2013 by Dorothea Salo.• This lecture and slide deck are licensedunder a Creative Commons Attribution3.0 United States License.• Please respect ownership and licensingof included materials. Thanks!
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.