On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
MARC and BIBFRAME; Linking libraries and archivesPresentation Transcript
MARC and BIBFRAMELinking libraries and archivesLIS 551Dorothea Salo
We built MARC whenstood between us and patron.Photo: Deborah Fitchett, “Catalogue cards” http://www.ﬂickr.com/photos/deborahﬁtchett/2970373235/ CC-BY
We built MARC whenthe world was clearly bounded.Photo: NASA Goddard Photo and Video, “NASA Blue Marble” http://www.ﬂickr.com/photos/gsfc/4392965590/ CC-BY
These days,stands between us and patron.Photo: Declan Jewell, “My Desk” http://www.ﬂickr.com/photos/declanjewell/2743737312 CC-BY
These days,world’s looking a bit fractal!Photo: NASA Goddard Photo and Video, “Still centered over the Atlantic” http://www.ﬂickr.com/photos/gsfc/4409800816/ CC-BY
From what you read...•What difﬁculties are programmersﬁnding in the MARC/AACR2/ISBD(G)way of doing things?•Especially keep in mind what the programmers are trying to accomplish!•What solutions are they recommending?•(yes, I know a lot of what I had you read is pure kvetching)•How do humans differ from computers?What does that mean for cataloging?
Problems with MARC/AACR2/ISBD(if you’re a networked computer)•Globally-unique identiﬁers for what’s in ourbibliographic universe?•And what IS in our bibliographic universe, anyway?•Interoperability? Who speaks MARC outsidelibraries?•This is a problem on both ends of the pipeline, these days!•FREE TEXT (for anything not transcribed) MUST DIE.•It is the LEAST consistent, internationalizable, interoperable way to recordinformation on a computer.•Put another way: we haven’t controlled all the cataloging practices we usefully could.http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/
Speaking of free text...•What kinds of problems did you run into as youwere working with MODS?•Case•Enumerations•Remembering end-tags!•What does that suggest to you about humans,text, and accuracy?•Seriously, people, PARSE AND VALIDATE YOUR XML. Just do it.•I have to, too, and I’ve been XMLing for a decade and a half, almost.•Linked-data rallying cry: “Things not strings!”•And now you begin to understand why.•(Known psychological phenomenon: humans RECOGNIZE text far more accurately thanthey can PRODUCE or REPRODUCE it. Wouldn’t it be nice if our cataloging andmetadata tools respected that?)
Problems•The MARC format is old, obscure, and difﬁcultto parse.•There are programming libraries in many popular languages that read MARC. If youever have to work with MARC records programmatically, use them!•Despite those, MARC’s obscurity isolates library data from other informationcommunities, especially on the Web.•Free-text ﬁelds/parts in MARC are uncontrolled,and often inconsistent in practice. Fixed ﬁeldsdon’t contain all the info they perhaps should.•Use of strings (text) rather than identiﬁers.•Review: What makes a good identiﬁer for a computer?•Errors. Errors! ERRORS!!!!!
More problems•Some MARC tags are ambiguous.•What kind of URL is in an 856?•Sometimes the same information (from acomputer’s point of view) is scattered acrossa bunch of MARC ﬁelds.•Some MARC tags relate to other MARC tags,but are not explicitly correlated in any way.•Very hard to ﬁgure out a work’s carrier(content type).•Want to give your patrons a list of DVDs? You are OUT OF LUCK.
Fundamental problems•“Machine readable” vs. “machine actionable.”•This problem breaks down into, um, where data get broken down. With MARC, thathappens after record creation, and isn’t reliable. With linked data, it happens up-front.•Every minute a programmer spends coping withnon-computer-friendly practices is a minute NOTspent enhancing library experiences for patrons.•Every minute a cataloger spends creatingcomputer-unfriendly data is a minute wasted.•We are cataloging for computers now, not justcatalog-card-reading humans.•Put another way, computers are today’s intermediary between catalogers and patrons,as catalog cards once were. We were card-friendly. We must become computer-friendly.
Have you noticed?•Programmer: “Trying to solve PatronProblem X. MARC causes Complication Y.”•Cataloger: “LALALALA MARC LALALATRADITION LALALALALA I CAN’THEEEEEEEEEEAR YOOOOOOOOOU!”•wait, where did the patron and her problem go?•Seriously, librarianship? Seriously? We haveto stop this.•If programmers demonstrate it’s a problem, IT IS A PROBLEM. I don’t care aboutthe tradition or the standard that caused the problem. Solve the problem!
The “open” in LOD•Who owns metadata?•Who thinks they own it?•If metadata are both ownable and owned,what does that mean for linked data?•Conversely, can linked data provide a lever against owned metadata?•How is OCLC treating this issue?•How is DPLA treating it?
SPARQL•With XML data, you generally just dumpit on the web and let people ﬁgure outwhat (if anything) to do with it.•This means a lot of translator-writing and bandwidth cost.•(There’s an XML query language called XQuery, but nobody uses it.)•You can do this with RDF too (and some do), but it’s not really ideal.•SPARQL: query language for RDF.•Looks a LOT like SQL, intentionally so. The hardest thing to get to gripswith is namespace declarations, and that’s not really all that hard.•“SPARQL endpoint:” URL for a given set of RDF data that you can sendqueries to and get answers from.
How?•If linked data is where the world is moving...•If data need to be open to be linked...•If libraries and archives are sitting on a massof unlinked and possibly unlinkable data...•How do we get there from here?•And where’s “there” anyway?
Linked Data principleshttp://www.w3.org/DesignIssues/LinkedData.html•use URIs as names for things•use HTTP URIs (aka URLs) so that people canlook up those things•(this is one of Linked Data’s concessions to pragmatism, compared to theoriginal SemWebbers)•when someone looks up a URI, provideuseful information, using the standards•include links to other URIs so that theycan discover more things
Review: the ﬁve stars oflinked data(Tim Berners-Lee)
The road ahead1.Model our universe of things in a linked-data-friendly fashion.•This is what Coyle and Hillmann, BIBFRAME, SKOS, various national-libraryefforts, VIAF, Dublin Core, EAC-CPF, and to some extent RDA are working on.•(“Things” != “just things.” For linked data, people, places, and subjects are alsothings.)2.Atomize our existing data as best we can.3.Assign URL identiﬁers to everything in sight.4.Publish, link out, and link up!5.(Squelch OCLC’s ownership claims. We can’thave that if we want LOD or even just LD.)
See this process in action•http://www.slideshare.net/philjohn/linked-library-data-in-the-wild-8593328
Modeling“what are the thingies in my neighborhood?”
BIBFRAME•LoC got tired of all the wafﬂing about howto replace MARC.•10/31/2011: “We’re going to just DO THIS. Join in or don’t.”•NISO (which owns MARC), ILS vendors, catalogers: *have kittens*•LoC: “Cope.” (You can tell where my sympathies lie, yes?)•Not the ﬁrst or the only; the British Libraryhas been working on its own LDinfrastructure for a couple years now.•Spain’s national library has an interesting RDFized FRBRishimplementation.
SKOS•Simple Knowledge Organization System•from our friends at the W3C; builds on prior work•http://www.w3.org/2004/02/skos/•Representation of common controlled-vocabulary structures in RDF syntax•Review: What does a thesaurus entrylook like?
Things in SKOS•Concepts (we know them as “terms”)•And “Concept Schemes,” which represent CVs as we’re used to thinking of them•Labels•Review: why are these distinct from Concepts?•Relationships among concepts•Broader/narrower•Associative (“see also”)•Equivalencies and near-equivalencies (here there be dragons)•Notes (of various kinds)•Really pretty straightforward, for RDF!•And has made inroads outside libraries for that very reason
EAD•Attempt to model EAD “things”•http://archiveshub.ac.uk/locah/2010/09/28/model-a-ﬁrst-cut/•Things•Unit of Description•Archival Finding Aid•Repository (an Agent)•Origination (an Agent)•“Things” (access points, index terms). Review: What are archival access points?•Best I can tell, this work hasn’t been taken up yet.
LOCAH Project, “The ‘things’ in EAD” http://archiveshub.ac.uk/locah/2010/09/28/model-a-ﬁrst-cut/
RDFizing RDA•What does RDA actually talk about?•FRBR model: Group 1, 2, and 3 entities•(though Group 1 is still kind of squidgy, really, and some applicationdevelopers are questioning its usefulness)•DCMI model (because life can NEVER be simple)•Relationships among entities•What do we want to say about them?•Are there existing ways to say these things that are good enough for ourpurposes? Can we reuse them, or at least map to them?•When there aren’t, how do we say what we need to in ways that are mostuseful for the rest of the world?•Assigning URIs to it all
Model friction•FRBR: entity-relationship model•... like relational databases, which is nice•not entirely RDFish, which is not quite so nice and is causing head-scratching•But head-scratching is normal in this space! Modeling is hard!•FRBR does give us some abstractions tomodel and assign URIs to.•And IFLA was supposed to do that... but they haven’t.•So the RDA folks have provisionally done it: FRBRoo.•Should IFLA get back in the game, formal equivalences will be deﬁned andpublished between FRBRoo and whatever IFLA comes up with.•FRBR isn’t perfect. (Gasp. I know, right?)•So sticking strictly to FRBR as we model (relationships particularly) causesproblems for music and multimedia catalogers, among others.
RDA Vocabularies•Hillmann, Dunsire, et al. try to work throughRDFizing RDA.•It’s crazy complicated. And weird. So I’m noteven trying to explain it here.•If you REALLY WANT TO KNOW: http://www.dlib.org/dlib/january10/hillmann/01hillmann.html•OR Karen Coyle’s Library Technology Report “RDA Vocabularies for aTwenty-First-Century Data Environment”•Would a cataloger need to know this?•Probably not. It’s plumbing, really.
We’ve seen some already...•URLized Dublin Core concepts•MODS accepting URLs•VIAF•id.loc.gov•One more I won’t talk about today:Dewey Decimal (dewey.info)
Archivists!•Why isn’t VIAF enough for you?
Science librarians!•Why isn’t VIAF enough for you?
EAC-CPF and SNAC•EAC-CPF: Encoded Archival Context—Corporations, Persons, and Families•Archival authority standard, from the kindly folks who brought us EAD!•Not linked-data-ized. Yet.•SNAC: The Social Networks and ArchivalContext Project•Using EAC-CPF to link people across archival collections•Aha! Starting to sound more linked-data! (It’s not RDFfy, yet, but should berelatively easy to RDFize later.)•http://socialarchive.iath.virginia.edu/•Moral: You don’t have to use or even knowRDF to start getting ready for linked data!
ORCID and ISNI•Open Researcher and Contributor ID•Authority control for scholars who don’t write books•http://orcid.org/•International Standard Name Identiﬁer•Thinks of itself as a superset of VIAF, ORCID, etc.•Remains to be seen whether they can pull this off, of course...•http://isni.org/•Review: what does linked data do abouttwo URLs identifying the same thing?
Putting it together
Europeana•Review: what is Europeana?•Built a “Europeana Semantic Elements” set•Dublin Core Application Proﬁle•Kludgy, as anything with Dublin Core inevitably is•Moving to “Europeana Data Model”•Publishing linked open data at data.europeana.eu•For more: http://www.niso.org/publications/isq/2012/v24no2-3/isaac/
Missouri History Museum•Needed to create a search portal to disparatemetadata silos•Gee, where have we heard THAT before?•Decided to crosswalk to RDF.•Work in progress, but initial results encouraging.•http://www.museumsandtheweb.com/mw2012/papers/using_an_rdf_data_pipeline_to_implement_cross_
Had enough? Okay.•Copyright 2013 by Dorothea Salo.•This lecture and slide deck are licensedunder a Creative Commons Attribution3.0 United States License.•Several diagrams reproduced under fairuse.