Library Linked Data
Upcoming SlideShare
Loading in...5
×
 

Library Linked Data

on

  • 1,380 views

Lecture for LIS 551, "Organization of Information."

Lecture for LIS 551, "Organization of Information."

Statistics

Views

Total Views
1,380
Views on SlideShare
1,202
Embed Views
178

Actions

Likes
6
Downloads
9
Comments
0

4 Embeds 178

http://lonewolflibrarian.wordpress.com 148
https://twitter.com 21
http://www.library.ceu.hu 8
http://news.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Library Linked Data Library Linked Data Presentation Transcript

    • Library linked dataLIS 551Dorothea Salo
    • DPLA• What is it?• Where are its materials coming from?• Where is its metadata coming from?• What does that tell us about the metadata?• How do you think they’ll collect the metadata?• What will they need to do with the metadata, oncecollected?• What problems will they run into, do you think?
    • Some eternal verities• What’s in our catalogs isn’t all themetadata (broad sense) we have.• BLASPHEMY: a lot of that catalog metadata probablyisn’t even the most important metadata academiclibraries have! Why might that be?• Possibly not the most prolific source of metadataeither. This will be truer as time passes. Why?• What about public libraries? Archives?• The rest of our metadata exists in manyforms and formats.• The major, often only, form of interactionwith our metadata is computer-mediated.• Other people have metadata too!
    • Practical implications• We need to design standards and practices aroundwhat computers do well, and what they need inorder to do what they do.• We need to design for being PART of the datauniverse, not all of it.• “open world assumption:” no one body has all the data! or all theanswers!• And nobody can impose their view of the world on everybodyelse. (Fortunately, nobody necessarily has to.)• Designing for consistency, flexibility andextensibility without sacrificing comprehensibility• (this is a tall order; we’re not there yet. is anyone?)
    • Things computers like• Unique identifiers• for anything you plan to discuss or refer to• that NEVER CHANGE OR DISAPPEAR. (Sorry, name-authority strings.)• How do we do this given the open-world assumption?• Consistent, predictable, human-language-independentdata• Free text (including punctuation) makes computers sad. They aren’thuman. They don’t understand it. They can be cued to PRODUCE it, butonly based on rules they’re given about the underlying data.• Computers produce typography and layout, but don’t understandthose, either.• Controlled vocabularies• (If they’re well-provisioned with identifiers; see above.)
    • We haveand we both love and hate them.Photo: Doc Searls, “silos,” http://www.flickr.com/photos/docsearls/5500714140/ CC-BY
    • So how can wede-silo-ize library data?
    • Possibility 1:One standard to rule them all• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
    • Possibility 2:Metasearch• Issues with this?• Technical issues• Quality issues• Sociological issues• Who’s trying this? On what level?Diagram: Angela Pratesi and Kalsang (by permission)
    • Possibility 2:Metasearch• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
    • Possibility 3:Big metadata bucket• Issues with this?• Technical issues• Quality issues• Sociological issues• Who’s trying this? On what level?Diagram: Angela Pratesi and Kalsang (by permission)
    • Possibility 3:Big metadata bucket• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues• Who’s trying this? On what level?
    • How do you make a bigmetadata bucket?• Given...• Different file formats (XML, relational-database,Excel, plain-text, etc)• Different structures with different granularity• Different standards... or no standard at all• Different controlled vocabularies... or none• One option: the Google route• But what do we lose there?
    • Crosswalking: the nxn problem• As you build your bucket, you find thatpeople are using n metadata standards.• You decide you want to be able to translateany of them into any of the others.• Guess what? You need to write nxn-n(nearly n2) crosswalks.• This gets impossibly unwieldy very quickly. How manymetadata standards do you know about, just fromthis class?• And how compatible will the standards be, anyway?
    • Okay, okay, masterstandard, then!• Crosswalk everything you take in to onestandard. Then you only need to write ncrosswalks!• Issues with this?• Technical issues• Quality issues• Language issues• Sociological issues
    • Is there a better way?... Maybe?
    • Five stars of linked data(the first three, at least)Sir Tim Berners-Lee:
    • Review: URLs as identifiers• Where have we seen this already?• Why URLs?• What library-type stuff has already beenidentified with URLs?• What would need to be, do you think?
    • So, seriously...• Every term in every controlled vocabulary, everyelement in every metadata standard, every“document” we might ever talk about (in all itsFRBRish permutations) needs its own URL?• SERIOUSLY?• ... basically, yep.• Not every time. (Dates are dates. Human names are strings.)• It gets worse, though: XML-based languages use elementnesting to carry meaning, and relational databases use tablemembership and data typing. How do you translate THOSEto URLs?
    • Example 1: Authority URIs
    • Example 2: Dublin Coreconcepts
    • Use URIs in MODS!
    • The fundamental strategy• Break down everything we can say aboutthe world into the smallest units ofmeaning we can manage.• That’s smaller than you’d think, as we’ll see!• Build up search indexes, user displays,and machine interactions from there.• I’m being vague about “machine interactions.” Don’ttake that to mean they aren’t important! They’rejust a bit more than I can explain here and now.• Try not to reinvent wheels.• But if you must, make sure to link new and old.
    • Smallest units of meaning:are these them?
    • Okay, so we have abunch of URIs.What do we actually DO with them?We plug them into RDF.
    • ... vocabulary note• “Semantic Web:” Tim Berners-Lee disappearinginto his own navel.• Term is a bit out-of-favor these days.• “Linked data:” a real-world effort to make largedatastores more interoperable• RDF: invented by the SemWebbers, now acornerstone for linked data• Does this mean that all data will be stored as RDF? NO, ITDOES NOT (and you have my permission to slap anybodywho says it will).• Totally possible to provide an RDF view onto non-RDF data,IF AND ONLY IF the data structures and meanings arethought through in an RDFfy way.
    • What to do with URIs• RDF’s answer: “We say things about stuff.”• At base, RDF really is that simple!• Base unit of RDF: “triple”• Subject, property, value/object. Much like subject-verb-object in English sentence.• Example: “Dorothea Salo is the author of ‘Innkeeper at theRoach Motel.’”Dorothea Salo“Innkeeper at theRoach Motel”isAuthorOf... wait. Where’d all the URLs go?
    • http://digital.library.wisc.edu/1793/22088http://viaf.org/viaf/21599115/URL-izing a triple“Innkeeper at theRoach Motel”isAuthorOfDorothea Salovocabularies! with URIs!
    • dcterms:creator http://digital.library.wisc.edu/1793/22088http://viaf.org/viaf/21599115/URL-izing a tripleisAuthorOf
    • Building up from triplesDiagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
    • ... which can get tangledDiagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
    • But... but...• What if the same thing has two URIs?• Foreseen problem! There are ways for linked data to expressURI equivalences... though there are huge arguments aboutwhen two URIs are really-truly equivalent.• My sense is that this decision is contextual. (AKA: “willAmazon.com use FRBR?”) What’s equivalent for yourpurposes may not be for mine. And that’s okay!• Where do we get URIs from?• This will be part of the new cataloging infrastructure a-borning, but the answer works out to “a lot of the sameplaces we already get authority information and catalogrecords from,” e.g. VIAF.• But we’re no longer LIMITED to just those! Key point. Thinkabout ORCID!
    • But... but...• Where’s the record? And standards forthe record?• The record is what we make it! What’s useful to us,we use. What isn’t, we ignore. That’s how the openworld assumption works.• If we need to impose rules on the data we’ll beputting out there (and we probably do!), there areways to do that.• We just can’t expect to impose those ways onanybody else. (Though we can put our rules outthere for others to follow, and we probably should!)
    • Trust: an unsolved problem• Review: what happened with <meta> tagson the web?• Right. What’s to stop the same thinghappening in a linked-data environment?• What’s to stop me from writing a triple that saysI’m Tchaikovsky?• For our purposes? We’ll pick and choose thevocabularies and domains we trust, I expect, justas we already do.
    • Fine. Whatever.So is anybody actuallyDOINGthis linked-data stuff?
    • Yes.And we’ll talk about that next week!
    • Thanks!• Copyright 2013 by Dorothea Salo.• This lecture and slide deck are licensedunder a Creative Commons Attribution3.0 United States License.• Please respect ownership and licensingof included materials. Thanks!