MARC and BIBFRAME; Linking libraries and archives

MARC and BIBFRAME
Linking libraries and archives
LIS 551
Dorothea Salo

We built MARC when
stood between us and patron.
Photo: Deborah Fitchett, “Catalogue cards” http://www.ﬂickr.com/photos/deborahﬁtchett/2970373235/ CC-BY

We built MARC when
the world was clearly bounded.
Photo: NASA Goddard Photo and Video, “NASA Blue Marble” http://www.ﬂickr.com/photos/gsfc/4392965590/ CC-BY

These days,
stands between us and patron.
Photo: Declan Jewell, “My Desk” http://www.ﬂickr.com/photos/declanjewell/2743737312 CC-BY

These days,
world’s looking a bit fractal!
Photo: NASA Goddard Photo and Video, “Still centered over the Atlantic” http://www.ﬂickr.com/photos/gsfc/4409800816/ CC-BY

From what you read...
•What difﬁculties are programmers
ﬁnding in the MARC/AACR2/ISBD(G)
way of doing things?
•Especially keep in mind what the programmers are trying to accomplish!
•What solutions are they recommending?
•(yes, I know a lot of what I had you read is pure kvetching)
•How do humans differ from computers?
What does that mean for cataloging?

Problems with MARC/AACR2/ISBD
(if you’re a networked computer)
•Globally-unique identiﬁers for what’s in our
bibliographic universe?
•And what IS in our bibliographic universe, anyway?
•Interoperability? Who speaks MARC outside
libraries?
•This is a problem on both ends of the pipeline, these days!
•FREE TEXT (for anything not transcribed) MUST DIE.
•It is the LEAST consistent, internationalizable, interoperable way to record
information on a computer.
•Put another way: we haven’t controlled all the cataloging practices we usefully could.
http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/

Speaking of free text...
•What kinds of problems did you run into as you
were working with MODS?
•Case
•Enumerations
•Remembering end-tags!
•What does that suggest to you about humans,
text, and accuracy?
•Seriously, people, PARSE AND VALIDATE YOUR XML. Just do it.
•I have to, too, and I’ve been XMLing for a decade and a half, almost.
•Linked-data rallying cry: “Things not strings!”
•And now you begin to understand why.
•(Known psychological phenomenon: humans RECOGNIZE text far more accurately than
they can PRODUCE or REPRODUCE it. Wouldn’t it be nice if our cataloging and
metadata tools respected that?)

Problems
•The MARC format is old, obscure, and difficult
to parse.
•There are programming libraries in many popular languages that read MARC. If you
ever have to work with MARC records programmatically, use them!
•Despite those, MARC’s obscurity isolates library data from other information
communities, especially on the Web.
•Free-text fields/parts in MARC are uncontrolled,
and often inconsistent in practice. Fixed fields
don’t contain all the info they perhaps should.
•Use of strings (text) rather than identifiers.
•Review: What makes a good identifier for a computer?
•Errors. Errors! ERRORS!!!!!

More problems
•Some MARC tags are ambiguous.
•What kind of URL is in an 856?
•Sometimes the same information (from a
computer’s point of view) is scattered across
a bunch of MARC ﬁelds.
•Some MARC tags relate to other MARC tags,
but are not explicitly correlated in any way.
•Very hard to ﬁgure out a work’s carrier
(content type).
•Want to give your patrons a list of DVDs? You are OUT OF LUCK.

Fundamental problems
•“Machine readable” vs. “machine actionable.”
•This problem breaks down into, um, where data get broken down. With MARC, that
happens after record creation, and isn’t reliable. With linked data, it happens up-front.
•Every minute a programmer spends coping with
non-computer-friendly practices is a minute NOT
spent enhancing library experiences for patrons.
•Every minute a cataloger spends creating
computer-unfriendly data is a minute wasted.
•We are cataloging for computers now, not just
catalog-card-reading humans.
•Put another way, computers are today’s intermediary between catalogers and patrons,
as catalog cards once were. We were card-friendly. We must become computer-friendly.

Have you noticed?
•Programmer: “Trying to solve Patron
Problem X. MARC causes Complication Y.”
•Cataloger: “LALALALA MARC LALALA
TRADITION LALALALALA I CAN’T
HEEEEEEEEEEAR YOOOOOOOOOU!”
•wait, where did the patron and her problem go?
•Seriously, librarianship? Seriously? We have
to stop this.
•If programmers demonstrate it’s a problem, IT IS A PROBLEM. I don’t care about
the tradition or the standard that caused the problem. Solve the problem!

The “open” in LOD
•Who owns metadata?
•Who thinks they own it?
•If metadata are both ownable and owned,
what does that mean for linked data?
•Conversely, can linked data provide a lever against owned metadata?
•How is OCLC treating this issue?
•How is DPLA treating it?

SPARQL
•With XML data, you generally just dump
it on the web and let people ﬁgure out
what (if anything) to do with it.
•This means a lot of translator-writing and bandwidth cost.
•(There’s an XML query language called XQuery, but nobody uses it.)
•You can do this with RDF too (and some do), but it’s not really ideal.
•SPARQL: query language for RDF.
•Looks a LOT like SQL, intentionally so. The hardest thing to get to grips
with is namespace declarations, and that’s not really all that hard.
•“SPARQL endpoint:” URL for a given set of RDF data that you can send
queries to and get answers from.

How?
•If linked data is where the world is moving...
•If data need to be open to be linked...
•If libraries and archives are sitting on a mass
of unlinked and possibly unlinkable data...
•How do we get there from here?
•And where’s “there” anyway?

Linked Data principles
http://www.w3.org/DesignIssues/LinkedData.html
•use URIs as names for things
•use HTTP URIs (aka URLs) so that people can
look up those things
•(this is one of Linked Data’s concessions to pragmatism, compared to the
original SemWebbers)
•when someone looks up a URI, provide
useful information, using the standards
•include links to other URIs so that they
can discover more things

Review: the ﬁve stars of
linked data
(Tim Berners-Lee)

The road ahead
1.Model our universe of things in a linked-data-
friendly fashion.
•This is what Coyle and Hillmann, BIBFRAME, SKOS, various national-library
efforts, VIAF, Dublin Core, EAC-CPF, and to some extent RDA are working on.
•(“Things” != “just things.” For linked data, people, places, and subjects are also
things.)
2.Atomize our existing data as best we can.
3.Assign URL identiﬁers to everything in sight.
4.Publish, link out, and link up!
5.(Squelch OCLC’s ownership claims. We can’t
have that if we want LOD or even just LD.)

See this process in action
•http://www.slideshare.net/philjohn/
linked-library-data-in-the-wild-8593328

Modeling
“what are the thingies in my neighborhood?”

BIBFRAME
•LoC got tired of all the wafﬂing about how
to replace MARC.
•10/31/2011: “We’re going to just DO THIS. Join in or don’t.”
•NISO (which owns MARC), ILS vendors, catalogers: *have kittens*
•LoC: “Cope.” (You can tell where my sympathies lie, yes?)
•Not the ﬁrst or the only; the British Library
has been working on its own LD
infrastructure for a couple years now.
•Spain’s national library has an interesting RDFized FRBRish
implementation.

First-cut
data model
•Look familiar?
What’s changed?
Eric Miller, “BIBFRAME Transition Update,” http://www.slideshare.net/zepheiraorg/bibliographic-14207718

First-cut
data model
Eric Miller, “BIBFRAME Transition Update,” http://www.slideshare.net/zepheiraorg/bibliographic-14207718

SKOS
•Simple Knowledge Organization System
•from our friends at the W3C; builds on prior work
•http://www.w3.org/2004/02/skos/
•Representation of common controlled-
vocabulary structures in RDF syntax
•Review: What does a thesaurus entry
look like?

Things in SKOS
•Concepts (we know them as “terms”)
•And “Concept Schemes,” which represent CVs as we’re used to thinking of them
•Labels
•Review: why are these distinct from Concepts?
•Relationships among concepts
•Broader/narrower
•Associative (“see also”)
•Equivalencies and near-equivalencies (here there be dragons)
•Notes (of various kinds)
•Really pretty straightforward, for RDF!
•And has made inroads outside libraries for that very reason

EAD
•Attempt to model EAD “things”
•http://archiveshub.ac.uk/locah/2010/09/28/model-a-ﬁrst-cut/
•Things
•Unit of Description
•Archival Finding Aid
•Repository (an Agent)
•Origination (an Agent)
•“Things” (access points, index terms). Review: What are archival access points?
•Best I can tell, this work hasn’t been taken up yet.

LOCAH Project, “The ‘things’ in EAD” http://archiveshub.ac.uk/locah/2010/09/28/model-a-ﬁrst-cut/

RDFizing RDA
•What does RDA actually talk about?
•FRBR model: Group 1, 2, and 3 entities
•(though Group 1 is still kind of squidgy, really, and some application
developers are questioning its usefulness)
•DCMI model (because life can NEVER be simple)
•Relationships among entities
•What do we want to say about them?
•Are there existing ways to say these things that are good enough for our
purposes? Can we reuse them, or at least map to them?
•When there aren’t, how do we say what we need to in ways that are most
useful for the rest of the world?
•Assigning URIs to it all

Model friction
•FRBR: entity-relationship model
•... like relational databases, which is nice
•not entirely RDFish, which is not quite so nice and is causing head-scratching
•But head-scratching is normal in this space! Modeling is hard!
•FRBR does give us some abstractions to
model and assign URIs to.
•And IFLA was supposed to do that... but they haven’t.
•So the RDA folks have provisionally done it: FRBRoo.
•Should IFLA get back in the game, formal equivalences will be deﬁned and
published between FRBRoo and whatever IFLA comes up with.
•FRBR isn’t perfect. (Gasp. I know, right?)
•So sticking strictly to FRBR as we model (relationships particularly) causes
problems for music and multimedia catalogers, among others.

RDA Vocabularies
•Hillmann, Dunsire, et al. try to work through
RDFizing RDA.
•It’s crazy complicated. And weird. So I’m not
even trying to explain it here.
•If you REALLY WANT TO KNOW: http://www.dlib.org/dlib/january10/
hillmann/01hillmann.html
•OR Karen Coyle’s Library Technology Report “RDA Vocabularies for a
Twenty-First-Century Data Environment”
•Would a cataloger need to know this?
•Probably not. It’s plumbing, really.

We’ve seen some already...
•URLized Dublin Core concepts
•MODS accepting URLs
•VIAF
•id.loc.gov
•One more I won’t talk about today:
Dewey Decimal (dewey.info)

Archivists!
•Why isn’t VIAF enough for you?

Science librarians!
•Why isn’t VIAF enough for you?

EAC-CPF and SNAC
•EAC-CPF: Encoded Archival Context—
Corporations, Persons, and Families
•Archival authority standard, from the kindly folks who brought us EAD!
•Not linked-data-ized. Yet.
•SNAC: The Social Networks and Archival
Context Project
•Using EAC-CPF to link people across archival collections
•Aha! Starting to sound more linked-data! (It’s not RDFfy, yet, but should be
relatively easy to RDFize later.)
•http://socialarchive.iath.virginia.edu/
•Moral: You don’t have to use or even know
RDF to start getting ready for linked data!

ORCID and ISNI
•Open Researcher and Contributor ID
•Authority control for scholars who don’t write books
•http://orcid.org/
•International Standard Name Identiﬁer
•Thinks of itself as a superset of VIAF, ORCID, etc.
•Remains to be seen whether they can pull this off, of course...
•http://isni.org/
•Review: what does linked data do about
two URLs identifying the same thing?

Europeana
•Review: what is Europeana?
•Built a “Europeana Semantic Elements” set
•Dublin Core Application Proﬁle
•Kludgy, as anything with Dublin Core inevitably is
•Moving to “Europeana Data Model”
•Publishing linked open data at data.europeana.eu
•For more: http://www.niso.org/publications/isq/
2012/v24no2-3/isaac/

Missouri History Museum
•Needed to create a search portal to disparate
metadata silos
•Gee, where have we heard THAT before?
•Decided to crosswalk to RDF.
•Work in progress, but initial results encouraging.
•http://www.museumsandtheweb.com/mw2012/papers/
using_an_rdf_data_pipeline_to_implement_cross_

Had enough? Okay.
•Copyright 2013 by Dorothea Salo.
•This lecture and slide deck are licensed
under a Creative Commons Attribution
3.0 United States License.
•Several diagrams reproduced under fair
use.

MARC and BIBFRAME; Linking libraries and archives

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MARC and BIBFRAME; Linking libraries and archives

Similar to MARC and BIBFRAME; Linking libraries and archives (20)

More from Dorothea Salo

More from Dorothea Salo (20)

Recently uploaded

Recently uploaded (20)

MARC and BIBFRAME; Linking libraries and archives