RDF, RDA,                          and other TLAs                                     Dorothea SaloMonday, January 2, 2012
Captatio benevolentiae                •I am not a cataloger.                          •Not even working as a librarian the...
We built MARC when  stood between us and patron.                               Photo: Deborah Fitchett, “Catalogue cards” ...
We built MARC when       the world was clearly bounded.                              Photo: NASA Goddard Photo and Video, ...
These days,stands between us and patron.                             Photo: Declan Jewell, “My Desk” http://www.flickr.com/...
These days,       world’s looking a bit fractal!                          Photo: NASA Goddard Photo and Video, “Still cent...
Review:                •Where are the less-than-perfect fits                 between library practice and the current      ...
Problems with MARC/AACR2/ISBD                                      (if you’re a networked computer)      •Globally-unique ...
Practical implications    •Designing standards and practices around what     computers do well, and what they need in orde...
... vocabulary note             •“Semantic Web:” Tim Berners-Lee              disappearing into his own navel.            ...
Pragmatically: the five                           stars of linked data                               (Tim Berners-Lee)Monda...
Linked Data principles                                           http://www.w3.org/DesignIssues/LinkedData.html           ...
Things computers like          •Unique identifiers                 •for anything you plan to discuss or refer to           ...
Globally unique identifiers          •Astonishingly, we already have a relatively           easy way to do this. The Web is...
URI wins                •Internationalization                          •We can present http://viaf.org/viaf/99258155/ as “...
What to do with URIs        •RDF’s answer: “We say things about stuff.”               •At base, RDF really is that simple!...
What to do with URIs        •RDF’s answer: “We say things about stuff.”               •At base, RDF really is that simple!...
What to do with URIs        •RDF’s answer: “We say things about stuff.”               •At base, RDF really is that simple!...
A pause: just URIs?                •Not strictly, according to RDF.                          •“Literals,” that is, text st...
URI-izing a triple                               isAuthorOf   “Innkeeper at the            Dorothea Salo                  ...
URI-izing a triple         http://viaf.org/viaf/   isAuthorOf   “Innkeeper at the             21599115/                   ...
URI-izing a triple                                 isAuthorOf              http://         http://viaf.org/viaf/          ...
URI-izing a triple                                       isAuthorOf                  http://         http://viaf.org/viaf/...
URI-izing a triple                                 isAuthorOf              http://         http://viaf.org/viaf/          ...
URI-izing a triple                                 dcterms:creator              http://         http://viaf.org/viaf/     ...
... wait, Dublin Core has URIs?                          Yep.Monday, January 2, 2012
MODS, too.                          Hey, look, URIs!                                     (this is new in MODS version 3.4)...
MODS, too.                          Hey, look, URIs!                                     (this is new in MODS version 3.4)...
(you should be able to read                   these diagrams now)                             Diagram: Stephen J. Miller, ...
(even these)                               Diagram: Stephen J. Miller, “Teaching RDA after the National Implementation Dec...
But... but...                •What if the same thing has two URIs?                          •Foreseen problem! There are w...
Monday, January 2, 2012
Monday, January 2, 2012
But... but...               •Where’s the record? And standards for                the record?                      •The re...
Trust: an unsolved problem                •Review: what happened with <meta>                 tags on the web?             ...
RDF in XML             •RDF has its own namespace, but no              schema (it’s an openended universe!).              ...
Retooling tools: GRDDL,                     SKOS, and OWL                •Gleaning Resource Descriptions from             ...
So what’s this “RDA Vocabularies”   work that Diane and Karen et al.   are doing?                          Assigning URIs ...
RDFizing RDA                •What does RDA actually talk about?                          •FRBR model: Group 1, 2, and 3 en...
Model friction            •FRBR: entity-relationship model                   •... like relational databases, which is nice...
RDA properties     •Expressed (URLized) without reference to FRBR.             •This is also the variant the linked-data w...
Diagram: Hillmann et al., “RDA Vocabularies: Process, Outcome, Use” D-Lib Magazine. http://www.dlib.org/dlib/january10/hil...
The ugliest case:                     Diagram: Hillmann et al., “RDA Vocabularies: Process, Outcome, Use” D-Lib Magazine. ...
... wait, where did                             Dublin Core go?             •Dublin Core, as we all know, is              ...
Disentangling                      aggregated statements               •The last refuge of the text string!               ...
There’s more modeling                pilpul.                                          A lot of it.                        ...
What is actually happening?                •We’re figuring out what we’re talking                 about.                •We...
Summary by Diane Hillmann                   http://managemetadata.org/blog/2011/09/08/fine-wine-and-old-fish/       • Data s...
Library workflows                •Given what you now know about RDF                 and linked data, and your experiences  ...
SPARQL                •With XML data, you generally just dump                 it on the web and let people figure out      ...
Why it’s worth doingMonday, January 2, 2012
Your data ages like                                 Photo: Matthew, “red wine bottle 1” http://www.flickr.com/photos/falcon...
Your software                          applications age like                                 Photo: amanda mandy, “peixe p...
... did that help?Monday, January 2, 2012
I worked from...             •Hillmann, Diane et al. “RDA Vocabularies:              Process, Outcome, Use.” D-Lib Magazin...
Upcoming SlideShare
Loading in...5
×

RDF, RDA, and other TLAs

6,294

Published on

Published in: Technology, Education
10 Comments
25 Likes
Statistics
Notes
  • Thanks, everybody! I really appreciate the thoughtful engagement with this; I'm still learning myself.

    I'm also wondering how to take this show on the road. If anybody has ideas, I'd love to hear.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice one!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Great stuff! Hadn't seen this before. Thanks for the ref to http://managemetadata.org/blog/2011/09/08/fine-wine-and-old-fish/ ... I'd missed that as well!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Presentation slides have been replaced, partly to expand them and partly to correct a MAJOR misattribution. My apologies to Diane Hillmann for mistakenly attributing her Metadata Matters blog post 'Fine Wine and Old Fish' http://managemetadata.org/blog/2011/09/08/fine-wine-and-old-fish/ to Karen Coyle!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The same question could be asked of MARC, really. Catalogers don't interact directly with MARC, and I seriously doubt they'll interact directly with linked data. I expect that the cataloger of the future will do a lot less typing, a good deal more verifying and looking-up, and a LOT more relationship-drawing. Time will tell whether I'm correct...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
6,294
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
101
Comments
10
Likes
25
Embeds 0
No embeds

No notes for slide

RDF, RDA, and other TLAs

  1. 1. RDF, RDA, and other TLAs Dorothea SaloMonday, January 2, 2012
  2. 2. Captatio benevolentiae •I am not a cataloger. •Not even working as a librarian these days! •I am not a developer, either. •(I am doing a bit of standards work. Not in this area, though.) •What I am? An educator and sometime tech translator. I hope that’s enough.Monday, January 2, 2012
  3. 3. We built MARC when stood between us and patron. Photo: Deborah Fitchett, “Catalogue cards” http://www.flickr.com/photos/deborahfitchett/2970373235/ CC-BYMonday, January 2, 2012
  4. 4. We built MARC when the world was clearly bounded. Photo: NASA Goddard Photo and Video, “NASA Blue Marble” http://www.flickr.com/photos/gsfc/4392965590/ CC-BYMonday, January 2, 2012
  5. 5. These days,stands between us and patron. Photo: Declan Jewell, “My Desk” http://www.flickr.com/photos/declanjewell/2743737312 CC-BYMonday, January 2, 2012
  6. 6. These days, world’s looking a bit fractal! Photo: NASA Goddard Photo and Video, “Still centered over the Atlantic” http://www.flickr.com/photos/gsfc/4409800816/ CC-BYMonday, January 2, 2012
  7. 7. Review: •Where are the less-than-perfect fits between library practice and the current information landscape? •What does this mean for library systems of information organization?Monday, January 2, 2012
  8. 8. Problems with MARC/AACR2/ISBD (if you’re a networked computer) •Globally-unique identifiers for what’s in our bibliographic universe? •And what IS in our bibliographic universe, anyway? •Interoperability? Who speaks MARC outside libraries? •This is a problem on both ends of the pipeline, these days! •FREE TEXT (for anything not transcribed) MUST DIE. •It is the LEAST consistent, internationalizable, interoperable way to record information on a computer. •Put another way: we haven’t controlled all the cataloging practices we usefully could. http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/Monday, January 2, 2012
  9. 9. Practical implications •Designing standards and practices around what computers do well, and what they need in order to do what they do. •Designing for being PART of the data universe, not all of it. •“open world assumption:” no one body has all the data! or all the answers! •And nobody can impose their view of the world on everybody else. (Fortunately, nobody necessarily has to.) •Designing for consistency, flexibility and extensibility without sacrificing comprehensibility •(this is a tall order; we’re not there yet. is anyone?)Monday, January 2, 2012
  10. 10. ... vocabulary note •“Semantic Web:” Tim Berners-Lee disappearing into his own navel. •Term is a bit out-of-favor these days. •“Linked data:” a real-world effort to make large datastores more interoperable •RDF: invented by the SemWebbers, now a cornerstone for linked data •Does this mean that all data will be stored as RDF? NO, IT DOES NOT (and you have my permission to slap anybody who says it will). •Totally possible to provide an RDF view onto non-RDF data, IF AND ONLY IF the data structure and meaning are thought through in an RDFfy way.Monday, January 2, 2012
  11. 11. Pragmatically: the five stars of linked data (Tim Berners-Lee)Monday, January 2, 2012
  12. 12. Linked Data principles http://www.w3.org/DesignIssues/LinkedData.html •use URIs as names for things •use HTTP URIs so that people can look up those things •(this is one of Linked Data’s concessions to pragmatism, compared to the original SemWebbers) •when someone looks up a URI, provide useful information, using the standards •include links to other URIs so that they can discover more thingsMonday, January 2, 2012
  13. 13. Things computers like •Unique identifiers •for anything you plan to discuss or refer to •that NEVER CHANGE OR DISAPPEAR. (Sorry, name-authority strings.) •How do we do this given the open-world assumption? •Consistent, predictable, human-language- independent data •Free text (including punctuation) makes computers sad. They aren’t human. They don’t understand it. They can be cued to PRODUCE it, but only based on rules they’re given about the underlying data. •Computers produce typography and layout, but don’t understand those, either. •Controlled vocabularies •(If they’re well-provisioned with identifiers; see above.)Monday, January 2, 2012
  14. 14. Globally unique identifiers •Astonishingly, we already have a relatively easy way to do this. The Web is an infinitely extensible information space: all the globally-unique identifiers we can dream up! •Term of art: “URI.” •In practice, 99 times out of 100 this will be a plain old ordinary URL. •The 100th time, it’ll mostly look like a URL, just with a different prefix. •EVERYTHING in linked-data-land revolves around URIs. They’re plumbing. •And like plumbing, we usually don’t have to look at them. Just know that they’re there.Monday, January 2, 2012
  15. 15. URI wins •Internationalization •We can present http://viaf.org/viaf/99258155/ as “Tchaikovsky, Peter Ilich, 1840-1893.” A Russian library can present the same URI as “Чайковский, Петр Ильич, 1840-1893.” •Both libraries can exchange information about Tchaikovsky and his works (e.g. holdings) without language barriers due to the URI intermediary. •Interoperability •Websites with Tchaikovsky information? Finding aids? Metadata for digitized images? Can all use this URI to refer to Tchaikovsky. This makes it painless for computers to aggregate Tchaikovsky-related information, with minimal if any human intervention!Monday, January 2, 2012
  16. 16. What to do with URIs •RDF’s answer: “We say things about stuff.” •At base, RDF really is that simple! •Base unit of RDF: “triple” •Subject, property, value/object. Much like subject-verb-object in English sentence. •Example: “Dorothea Salo is the author of ‘Innkeeper at the Roach Motel.’”Monday, January 2, 2012
  17. 17. What to do with URIs •RDF’s answer: “We say things about stuff.” •At base, RDF really is that simple! •Base unit of RDF: “triple” •Subject, property, value/object. Much like subject-verb-object in English sentence. •Example: “Dorothea Salo is the author of ‘Innkeeper at the Roach Motel.’” isAuthorOf “Innkeeper at the Dorothea Salo Roach Motel”Monday, January 2, 2012
  18. 18. What to do with URIs •RDF’s answer: “We say things about stuff.” •At base, RDF really is that simple! •Base unit of RDF: “triple” •Subject, property, value/object. Much like subject-verb-object in English sentence. •Example: “Dorothea Salo is the author of ‘Innkeeper at the Roach Motel.’” isAuthorOf “Innkeeper at the Dorothea Salo Roach Motel” ... wait. Where’d all the URIs go?Monday, January 2, 2012
  19. 19. A pause: just URIs? •Not strictly, according to RDF. •“Literals,” that is, text strings, are also OK as objects. (Don’t tell catalogers this!) But they’re STRONGLY discouraged. •“Blank nodes” can also happen -- usually when a triple wants to use an entire RDF statement as object. In lieu of giving the entire statement its own URI, you get a “blank node” in the graph. Which is ugly, but so it goes.Monday, January 2, 2012
  20. 20. URI-izing a triple isAuthorOf “Innkeeper at the Dorothea Salo Roach Motel”Monday, January 2, 2012
  21. 21. URI-izing a triple http://viaf.org/viaf/ isAuthorOf “Innkeeper at the 21599115/ Roach Motel”Monday, January 2, 2012
  22. 22. URI-izing a triple isAuthorOf http:// http://viaf.org/viaf/ digital.library.wisc.edu/ 21599115/ 1793/22088Monday, January 2, 2012
  23. 23. URI-izing a triple isAuthorOf http:// http://viaf.org/viaf/ digital.library.wisc.edu/ 21599115/ 1793/22088 vocabularies! with URIs!Monday, January 2, 2012
  24. 24. URI-izing a triple isAuthorOf http:// http://viaf.org/viaf/ digital.library.wisc.edu/ 21599115/ 1793/22088Monday, January 2, 2012
  25. 25. URI-izing a triple dcterms:creator http:// http://viaf.org/viaf/ digital.library.wisc.edu/ 21599115/ 1793/22088Monday, January 2, 2012
  26. 26. ... wait, Dublin Core has URIs? Yep.Monday, January 2, 2012
  27. 27. MODS, too. Hey, look, URIs! (this is new in MODS version 3.4)Monday, January 2, 2012
  28. 28. MODS, too. Hey, look, URIs! (this is new in MODS version 3.4)Monday, January 2, 2012
  29. 29. (you should be able to read these diagrams now) Diagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”Monday, January 2, 2012
  30. 30. (even these) Diagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”Monday, January 2, 2012
  31. 31. But... but... •What if the same thing has two URIs? •Foreseen problem! There are ways for linked data to express URI equivalences... though there are huge arguments about when two URIs are really-truly equivalent. •My sense is that this decision is contextual. (AKA: “will Amazon.com use FRBR?”) What’s equivalent for your purposes may not be for mine. And that’s okay! •Where do we get URIs from? •This will be part of the new cataloging infrastructure a-borning, but the answer works out to “a lot of the same places we already get authority information and catalog records from,” e.g. VIAF. •But we’re no longer LIMITED to just those! Key point. Think about ORCID!Monday, January 2, 2012
  32. 32. Monday, January 2, 2012
  33. 33. Monday, January 2, 2012
  34. 34. But... but... •Where’s the record? And standards for the record? •The record is what you make it! There’ll be metric tons of data about Tchaikovsky linking to (and thus reachable through) his URL. (Somebody’ll make a list of his pet dogs’ names. Guaranteed. People are funny about dogs.) What’s useful to you, you use. What isn’t, you ignore. That’s how the open world works. •If we need to impose rules on the data we’ll be putting out there (and we probably do!), there are ways to do that. We just can’t expect to impose those ways on anybody else. (Though we can put our rules out there for others to follow, and we probably should!)Monday, January 2, 2012
  35. 35. Trust: an unsolved problem •Review: what happened with <meta> tags on the web? •Right. What’s to stop the same thing happening in a linked-data environment? •What’s to stop me from saying I’m Tchaikovsky? •The SemWeb people handwaved this for a long time. •For our purposes? We’ll pick and choose the vocabularies and domains we trust, I expect, just as we already do.Monday, January 2, 2012
  36. 36. RDF in XML •RDF has its own namespace, but no schema (it’s an openended universe!). •Root element: <rdf:RDF> •Vocabulary in any other XML namespace can be shoehorned into RDF triples. •But don’t fool yourself: RDF triples and graphs and standard XML vocabulary hierarchies do NOT map cleanly or automatically to each other. •So MARC/AACR2 is FAR from the only metadata expression that’s looking at a retooling! •Typical triple expression in XML: •<rdf:Description about=”{subject}”> <predicate /> <object /> </rdf:Description> •XML is NOT the only syntax for RDF.Monday, January 2, 2012
  37. 37. Retooling tools: GRDDL, SKOS, and OWL •Gleaning Resource Descriptions from Dialects of Languages •W3C standard for providing a transformation of an existing XML vocabulary into an RDF expression. •Once there’s a GRDDL transform, users of the vocabulary need change (almost) nothing! Vocabulary instance + GRDDL transform = RDF! •Simple Knowledge Organization System •RDF data model (plus URIs, of course) for commonly-used controlled- vocabulary structures such as thesauri and subject-heading lists. •Web Ontology Language (yes, I know) •SEMWEB NERDS ONLY. Ontologies are serious business.Monday, January 2, 2012
  38. 38. So what’s this “RDA Vocabularies” work that Diane and Karen et al. are doing? Assigning URIs to stuff in RDA, so that systems expecting URI-linked data get it. Seriously. That’s what all the fuss is about.Monday, January 2, 2012
  39. 39. RDFizing RDA •What does RDA actually talk about? •FRBR model: Group 1, 2, and 3 entities •(though Group 1 is still kind of squidgy, really, and some application developers are questioning its usefulness) •DCMI model (because life can NEVER be simple) •Relationships among entities •What do we want to say about them? •Are there existing ways to say these things that are good enough for our purposes? Can we reuse them, or at least map to them? •When there aren’t, how do we say what we need to in ways that are most useful for the rest of the world? •Assigning URIs to it allMonday, January 2, 2012
  40. 40. Model friction •FRBR: entity-relationship model •... like relational databases, which is nice •not entirely RDFish, which is not quite so nice and has caused head-scratching •But head-scratching is normal in this space! Modeling is hard! •FRBR does give us some abstractions to model and assign URIs to. •And IFLA was supposed to do that... but they haven’t. •So the RDA folks have provisionally done it: FRBRoo. •When IFLA gets back in the game, formal equivalences will be defined and published between FRBRoo and whatever IFLA comes up with. •FRBR isn’t perfect. (Gasp. I know, right?) •So sticking strictly to FRBR as we model (relationships particularly) causes problems for music and multimedia catalogers, among others.Monday, January 2, 2012
  41. 41. RDA properties •Expressed (URLized) without reference to FRBR. •This is also the variant the linked-data web will generally see and use. •Which makes a certain amount of sense, because it’s quite possible to understand a lot of bibliographic data intuitively without reference to FRBR. •And we’ll never get the whole world to agree on FRBR; we can’t even agree ourselves! •Given “subproperties” which are the same thing, only FRBRized (and with their own URLs). •So the linked-data web sees a URL for “Book format.” •But we, because we are librarians and our systems understand us, understand that “Book format” is intrinsically tied up with a Manifestation. •This also covers us when an RDA property may apply to more than one FRBR entity, e.g. Extent: it’s the same property, but two subproperties!Monday, January 2, 2012
  42. 42. Diagram: Hillmann et al., “RDA Vocabularies: Process, Outcome, Use” D-Lib Magazine. http://www.dlib.org/dlib/january10/hillmann/01hillmann.htmlMonday, January 2, 2012
  43. 43. The ugliest case: Diagram: Hillmann et al., “RDA Vocabularies: Process, Outcome, Use” D-Lib Magazine. http://www.dlib.org/dlib/january10/hillmann/01hillmann.htmlMonday, January 2, 2012
  44. 44. ... wait, where did Dublin Core go? •Dublin Core, as we all know, is annoyingly vague. •Sadly, there’s an awful lot of DC data that we’ll have to map into this model. •Ironic but true: librarians invented DC for the larger web, and then became nearly the only people to use it extensively. •“Superproperties:” DC terms that map to several RDA properties. (E.g. “creator”) •Probably the worst way to solve the problem... except for all the other ways.Monday, January 2, 2012
  45. 45. Disentangling aggregated statements •The last refuge of the text string! •E.g. publication statements, which aggregate place, publisher name, and date of publication. •What if you only WANT one of those three bits of information? ARGH. •RDA doesn’t fix this. So RDA Vocabs is trying to. •First, URLize each piece separately. Cool. Done. No problem. •Then define a “Syntax Encoding Scheme” for the aggregate. Yuck. •I have to tell you, this is a heinously ugly “fix.” Given legacy data, though, hard to imagine better.Monday, January 2, 2012
  46. 46. There’s more modeling pilpul. A lot of it. I’ll spare you. Like I said, it’s plumbing.Monday, January 2, 2012
  47. 47. What is actually happening? •We’re figuring out what we’re talking about. •We’re figuring out what we want to say about it. •We’re assigning URIs to all those things (abstractions included!) so that we can exchange information with the rest of the web.Monday, January 2, 2012
  48. 48. Summary by Diane Hillmann http://managemetadata.org/blog/2011/09/08/fine-wine-and-old-fish/ • Data should be able to be encoded in a variety of ways, to suit a variety of functions, uses, and systems. • Data should be managed at a granular, statement level, but also be available in a variety of record ‘formats.’ (with records being understood as primarily an on-the-fly method of aggregating data for a variety of downstream users) • Although current data is expressed mostly as text strings, data improvement strategies will be designed to change most of them to URIs as soon as practicable. • Data definitions and specifications will be easily available on the web, allowing mapping to be simpler and easier to tweak.Monday, January 2, 2012
  49. 49. Library workflows •Given what you now know about RDF and linked data, and your experiences with cataloging, how do you think the practice of cataloging will change in an RDF-based environment?Monday, January 2, 2012
  50. 50. SPARQL •With XML data, you generally just dump it on the web and let people figure out what (if anything) to do with it. •This means a lot of translator-writing and bandwidth cost. •(There’s an XML query language called XQuery, but nobody uses it.) •You can do this with RDF too (and some do), but it’s not really ideal. •SPARQL: query language for RDF. •Looks a LOT like SQL, intentionally so. The hardest thing to get to grips with is namespace declarations, and that’s not really all that hard. •“SPARQL endpoint:” URL for a given set of RDF data that you can send queries to and get answers from. •How does this change your answer about library workflows?Monday, January 2, 2012
  51. 51. Why it’s worth doingMonday, January 2, 2012
  52. 52. Your data ages like Photo: Matthew, “red wine bottle 1” http://www.flickr.com/photos/falcon1961/3408961521/ CC-BYMonday, January 2, 2012
  53. 53. Your software applications age like Photo: amanda mandy, “peixe pelo todo.” http://www.flickr.com/photos/polaina/3128038858/ CC-BYMonday, January 2, 2012
  54. 54. ... did that help?Monday, January 2, 2012
  55. 55. I worked from... •Hillmann, Diane et al. “RDA Vocabularies: Process, Outcome, Use.” D-Lib Magazine 16:1/2 (Jan/Feb 2010). http:// www.dlib.org/dlib/january10/hillmann/ 01hillmann.htmlMonday, January 2, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×